Deduplication algorithms and models for efficient data storage

Laura Conde-Canencia; Belaid Hamoum

doi:10.1109/CSCC49995.2020.00013

Communication Dans Un Congrès Année : 2020

Deduplication algorithms and models for efficient data storage

(1, 2) , (1, 2)

1
2

Laura Conde-Canencia

Fonction : Auteur
PersonId : 866201

Université de Bretagne Sud - Lorient

Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance

Belaid Hamoum

Fonction : Auteur
PersonId : 1060297

Université de Bretagne Sud - Lorient

Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance

Résumé

This paper is dedicated to data deduplication algorithms and models that lead to efficient solutions to reduce the amount of data both transmitted over the network and stored in data systems. To be specific, we consider the case where replicas of an original file are generated by edit errors and adopt a theoretical approach to explore data files. Our study can apply to primary, backup or archival storage. We introduce a new variable-length block-level deduplication algorithm that outperforms prior work and reduces the computational complexity by focusing on pivots. We provide a theoretical comparative analysis of the algorithm computational costs and experimental results to evaluate its performance. The proposed deduplication solution enhances prior approaches in terms of cost and achieves the same rates as brute force or naive methods.

Mots clés

Data deduplication inline data processing edit channel insertions/deletions/substitutions deduplication ratio brute force methods

Domaines

Théorie de l'information [cs.IT] Informatique [cs]

Fichier principal

CSCC2020_deduplication.pdf (230.55 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Belaid Hamoum : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03976889

Soumis le : mardi 7 février 2023-12:33:40

Dernière modification le : jeudi 29 février 2024-08:35:21

Archivage à long terme le : lundi 8 mai 2023-18:56:37

Dates et versions

hal-03976889 , version 1 (07-02-2023)

Identifiants

HAL Id : hal-03976889 , version 1
DOI : 10.1109/CSCC49995.2020.00013

Citer

Laura Conde-Canencia, Belaid Hamoum. Deduplication algorithms and models for efficient data storage. 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), Jul 2020, Chania (Virtuel), Greece. pp.23-28, ⟨10.1109/CSCC49995.2020.00013⟩. ⟨hal-03976889⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BREST INSTITUT-TELECOM CNRS LAB-STICC_UBO ENIB LAB-STICC LAB-STICC_UBS_2

4 Consultations

74 Téléchargements

Deduplication algorithms and models for efficient data storage

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager