Belfort Birth Records Transcription: Preprocessing, and Structured Data Generation - Université de Technologie de Belfort-Montbeliard
Communication Dans Un Congrès Année : 2024

Belfort Birth Records Transcription: Preprocessing, and Structured Data Generation

Résumé

Historical documents are invaluable windows into the past. They play a critical role in shaping our perception of the world and its rich tapestry of stories. This paper presents techniques to facilitate the transcription of the French Belfort Civil Registers of Births, which are valuable historical resources spanning from 1807 to 1919. The methodology focuses on preprocessing steps such as binarization, skew correction, and text line segmentation, tailored to address the challenges posed by these documents including various text styles, marginal annotations, and a hybrid mix of printed and handwritten text. The paper also introduces this archive as a new database by developing a structured strategy for the components of the documents using XML tags, ensuring accurate formatting and alignment of transcriptions with image components at both the paragraph and text line levels for further enhancements to handwritten text recognition models. The results of the preprocessing phase show an accuracy rate of 96%, facilitating the preservation and study of this rich cultural heritage. a https://orcid.

Fichier principal
Vignette du fichier
Belfort_Birth_Records_Transcription__Preprocessing__and_Structured_Data_Generation.pdf (16.41 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04846034 , version 1 (18-12-2024)

Licence

Identifiants

Citer

Wissam Alkendi, Franck Gechter, Laurent Heyberger, Christophe Guyeux. Belfort Birth Records Transcription: Preprocessing, and Structured Data Generation. 4th International Conference on Image Processing and Vision Engineering, May 2024, Angers, France. pp.32-43, ⟨10.5220/0012715600003720⟩. ⟨hal-04846034⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

More