Deduplication algorithms and models for efficient data storage
Résumé
This paper is dedicated to data deduplication algorithms and models that lead to efficient solutions to reduce the amount of data both transmitted over the network and stored in data systems. To be specific, we consider the case where replicas of an original file are generated by edit errors and adopt a theoretical approach to explore data files. Our study can apply to primary, backup or archival storage. We introduce a new variable-length block-level deduplication algorithm that outperforms prior work and reduces the computational complexity by focusing on pivots. We provide a theoretical comparative analysis of the algorithm computational costs and experimental results to evaluate its performance. The proposed deduplication solution enhances prior approaches in terms of cost and achieves the same rates as brute force or naive methods.
Origine : Fichiers produits par l'(les) auteur(s)