Embedding-based data matching for disparate data sources - Systèmes d’Informations Généralisées
Communication Dans Un Congrès Année : 2024

Embedding-based data matching for disparate data sources

Résumé

Dealing with heterogeneous sources is an important chal- lenge in the field of knowledge discovery and management. Schema match- ing methods are employed to solve this problem using three approaches: schema-based, instance-based, or a combination. This paper focuses on mapping between a schema-available (only) data source and a data source containing both schema and instance (both). Given the lack of suit- able methods for aligning these two types of sources, we propose an ap- proach using embedding models to provide vector modelling of sources and calculate similarities between data. Our solution consists in com- bining domain-specific embedding models and cross-domain embedding models to make data matching possible and efficient between the above- mentioned data sources. We have conducted several experiments using the Valentine datasets to evaluate our data matching method on sev- eral disparate tabular data. The result indicate effectiveness in terms of stability and ablation handling.
Fichier principal
Vignette du fichier
Embedding_based_data_matching_for_disparate_data_sources.pdf (457.4 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04612345 , version 1 (14-06-2024)

Identifiants

Citer

Nour Elhouda Kired, Franck Ravat, Jiefu Song, Olivier Teste. Embedding-based data matching for disparate data sources. The 26th International Conference on Big Data Analytics and Knowledge Discovery (DAWAK 2024), Aug 2024, Naples, Italy. pp.66-71, ⟨10.1007/978-3-031-68323-7_5⟩. ⟨hal-04612345⟩
206 Consultations
87 Téléchargements

Altmetric

Partager

More