SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES

Thierry Desot; François Portet; Michel Vacher

Communication Dans Un Congrès Année : 2019

SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES

(1) , (1) , (1)

Thierry Desot

Fonction : Auteur
PersonId : 182146
IdHAL : thierry-desot
ORCID : 0000-0003-3568-2374
IdRef : 254633722

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

François Portet

Fonction : Auteur
PersonId : 1069
IdHAL : francois-portet
ORCID : 0000-0003-2542-0661
IdRef : 098179160

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Michel Vacher

Fonction : Auteur
PersonId : 709
IdHAL : michel-vacher
ORCID : 0000-0001-7770-9171
IdRef : 181831430

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Résumé

Spoken Language Understanding (SLU) is typically performed through automatic speech recognition (ASR) and natural language understanding (NLU) in a pipeline. However, errors at the ASR stage have a negative impact on the NLU performance. Hence, there is a rising interest in End-to- End (E2E) SLU to jointly perform ASR and NLU. Although E2E models have shown superior performance to modular approaches in many NLP tasks, current SLU E2E models have still not definitely superseded pipeline approaches. In this paper, we present a comparison of the pipeline and E2E approaches for the task of voice command in smart homes. Since there are no large non-English domain-specific data sets available, although needed for an E2E model, we tackle the lack of such data by combining Natural Language Generation (NLG) and text-to-speech (TTS) to generate French training data. The trained models were evaluated on voice commands acquired in a real smart home with several speakers. Results show that the E2E approach can reach performances similar to a state-of-the art pipeline SLU despite a higher WER than the pipeline approach. Furthermore, the E2E model can benefit from artificially generated data to exhibit lower Concept Error Rates than the pipeline baseline for slot recognition.

Mots clés

Spoken language understanding automatic speech recognition natural language understanding ambient intelligence voice-user interface

Domaines

Autre [cs.OH] Intelligence artificielle [cs.AI]

Fichier principal

2019_ASRU_Desot_final.pdf (172.4 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Michel Vacher : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02464393

Soumis le : vendredi 13 mars 2020-16:06:50

Dernière modification le : lundi 9 décembre 2024-03:23:37

Archivage à long terme le : dimanche 14 juin 2020-14:26:35

Dates et versions

hal-02464393 , version 1 (13-03-2020)

Identifiants

HAL Id : hal-02464393 , version 1

Citer

Thierry Desot, François Portet, Michel Vacher. SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES. IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2019, Sentosa, Singapore, Singapore. ⟨hal-02464393⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP ANR LIG_SIDCH LIVINGLAB_DOMUS

180 Consultations

383 Téléchargements

SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager