Corpus generation for voice command in smart home and the effect of speech synthesis on End-to-End SLU

Massive amounts of annotated data greatly contributed to the advance of the machine learning field. However such large data sets are often unavailable for novel tasks performed in realistic environments such as smart homes. In this domain, semantically annotated large voice command corpora for Spoken Language Understanding (SLU) are scarce, especially for non-English languages. We present the automatic generation process of a synthetic semantically-annotated corpus of French commands for smart-home to train pipeline and End-to-End (E2E) SLU models. SLU is typically performed through Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) in a pipeline. Since errors at the ASR stage reduce the NLU performance, an alternative approach is End-to-End (E2E) SLU to jointly perform ASR and NLU. To that end, the artificial corpus was fed to a text-to-speech (TTS) system to generate synthetic speech data. All models were evaluated on voice commands acquired in a real smart home. We show that artificial data can be combined with real data within the same training set or used as a stand-alone training corpus. The synthetic speech quality was assessed by comparing it to real data using dynamic time warping (DTW).

Mots clés

Spoken language understanding automatic speech recognition natural language understanding corpora and language resources ambient intelligence voice-user interface text-to-speech dynamic time warping

Domaines

Intelligence artificielle [cs.AI] Interface homme-machine [cs.HC]

Fichier principal

2020_LREC_Desot_4_proceedings.pdf (487.89 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Michel Vacher : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02861770

Soumis le : mardi 9 juin 2020-10:59:04

Dernière modification le : lundi 9 décembre 2024-03:32:34

Dates et versions

hal-02861770 , version 1 (09-06-2020)

Identifiants

HAL Id : hal-02861770 , version 1

Citer

Thierry Desot, François Portet, Michel Vacher. Corpus generation for voice command in smart home and the effect of speech synthesis on End-to-End SLU. 12th Conference on Language Resources and Evaluation (LREC 2020), ELRA, May 2020, Marseille, France. pp.6395-6404. ⟨hal-02861770⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP ANR LIG_SIDCH LIVINGLAB_DOMUS

189 Consultations

118 Téléchargements