Formation of groups of identical objects

Capa

Citar

Texto integral

Acesso aberto Acesso aberto
Acesso é fechado Acesso está concedido
Acesso é fechado Acesso é pago ou somente para assinantes

Resumo

An approach to improving the structural consistency is considered. The purpose of the study is to select a method for combining identical objects into groups, since it is identical objects that can effectively exchange information and use the information obtained as a result of this exchange. To achieve this goal, a number of experiments with different methods were conducted, after which the best one was selected in terms of the target quality measure and latency. The proposed approach allows taking into account various characteristics of objects and the relationships between them. This ensures accurate determination of identical objects. The proposed approach also has an efficient implementation for distributed computing systems. This makes it fast even on large amounts of data. The comparison of the approaches under consideration is made using the example of the problem of searching for identical products for managing assortment and supplies.

Texto integral

Acesso é fechado

Sobre autores

I. Antipov

Volgograd State University

Autor responsável pela correspondência
Email: antipov.ivan.f@gmail.com
Rússia, Volgograd

S. Dulin

Federal Research Center “Computer Science and Control” of the RAS

Email: skdulin@mail.ru
Rússia, Moscow

A. Ryabtsev

Research and Design Institute of Informatization, Automation and Communications in Railway Transport (JSC NIIAS); Moscow Institute of Physics and Technology

Email: ryabtsev.ab@phystech.edu
Rússia, Moscow; Moscow

Bibliografia

  1. Creps R., Polzer H., Yanosy J. Systems, capabilities, operations, programs, and enterprises (SCOPE). Model for interoperability assessment // NetworkCentric Operations Industry Consortium, 2008. P. 154.
  2. GOST R 55062-2012. Information Technology (IT). Industrial Automation Systems and Their Integration. Interoperability. Basic Provisions // Standartinform. 2014. P. 12.
  3. Baas J., Dastani M., Feelders J. Exploiting Transitivity for Entity Matching // The Semantic Web: ESWC Satellite Events: Virtual Event. Revised Selected Papers 18. Cham: Springer International Publishing, 2021. P. 109–114.
  4. Dulin S.K. Introduction to the Theory of Structural Coherence. M.: Computing Center of the Russian Academy of Sciences, 2005. P. 135.
  5. Rosenberg I.N., Dulin S.K., Dulina N.G. Modeling the Structure of Interoperability by Means of Structural Consistency // Computer Science and its Applications. 2023. V. 17. P. 57–65.
  6. Papadakis G., Svirsky J., Gal A. et al. Comparative Analysis of Approximate Blocking Techniques for Entity Resolution // Proc. VLDB Endowment. 2016. V. 9. P. 684–695.
  7. Miao Z., Li Y., Wang X. Rotom: A meta-learned data augmentation framework for entity matching, data cleaning, text classification, and beyond // Proc. Intern. Conf. on Management of Data. Xi’an. 2021. P. 1303–1316.
  8. Thirumuruganathan S., Li. H, Tang N. et al. Deep Learning for Blocking in Entity Matching: a Design Space Exploration // Proc. VLDB Endowment. 2021. V. 14. P. 2459–2472.
  9. Dulin S.K., Ryabtsev A.B. Algorithm for Improving the Consistency of Structural Interoperability. Dependability, 2024. P. 8–15.
  10. Zhu X., Zoubin G. Learning from labeled and unlabeled data with label propagation // Tech. Rep., Technical Report CMU-CALD-02–107. Carnegie Mellon University, 2002. https://mlg.eng.cam.ac.uk/zoubin/papers/CMUCALD-02-107.pdf https://github.com/graphframes/graphframes
  11. Kaufman L., Rousseeuw P.J. Finding groups in data: an introduction to cluster analysis. John Wiley & Sons, 2009.
  12. Rosenberg A., Hirschberg J. V-Measure: a conditional entropy-based external cluster evaluation measure, 2007.
  13. Fowkles E., Mallows C. A method for comparing two hierarchical clusterings // J. American Statistical Association. 1983. V. 78. P. 553–569.
  14. Rand W. Objective criteria for the evaluation of clustering methods // J. American Statistical Association. 1971. V. 66. P. 846–850.
  15. Hubert L., Arabie P. Comparing partitions // J. Classification. 1985. V. 2. P. 193–218.

Arquivos suplementares

Arquivos suplementares
Ação
1. JATS XML
2. Fig. 1. Transitive closure approach.

Baixar (106KB)
3. Fig. 2. Three groups of identical products, mistakenly connected by a small number of edges.

Baixar (155KB)
4. Fig. 3. Distribution of group sizes.

Baixar (134KB)

Declaração de direitos autorais © Russian Academy of Sciences, 2025