SLU FOR VOICE COMMAND IN SMART HOME: COMPARISON OF PIPELINE AND END-TO-END APPROACHES
Résumé
Spoken Language Understanding (SLU) is typically performed
through automatic speech recognition (ASR) and
natural language understanding (NLU) in a pipeline. However,
errors at the ASR stage have a negative impact on the
NLU performance. Hence, there is a rising interest in End-to-
End (E2E) SLU to jointly perform ASR and NLU. Although
E2E models have shown superior performance to modular
approaches in many NLP tasks, current SLU E2E models
have still not definitely superseded pipeline approaches.
In this paper, we present a comparison of the pipeline
and E2E approaches for the task of voice command in smart
homes. Since there are no large non-English domain-specific
data sets available, although needed for an E2E model, we
tackle the lack of such data by combining Natural Language
Generation (NLG) and text-to-speech (TTS) to generate
French training data. The trained models were evaluated
on voice commands acquired in a real smart home with several
speakers. Results show that the E2E approach can reach
performances similar to a state-of-the art pipeline SLU despite
a higher WER than the pipeline approach. Furthermore,
the E2E model can benefit from artificially generated data to
exhibit lower Concept Error Rates than the pipeline baseline
for slot recognition.
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Loading...