Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-Syntax and Pragmatics for Transcription - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Chapitre D'ouvrage Année : 2008

Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-Syntax and Pragmatics for Transcription

Stéphane Huet
Gwénolé Lecorvé
Guillaume Gravier
Pascale Sébillot

Résumé

In the framework of multimedia analysis and interaction, speech and language processing plays a major role. Many multimedia documents contain speech from which high level semantic information can be extracted, as in broadcast news or sports videos, with typical applications such as spoken document indexing, topic tracking and summarization. Hence, many multimedia document analysis applications require a collaboration between speech recognition and natural language processing (NLP) techniques. As NLP techniques are traditionally designed for text analysis, this combination can be seen as a mul-timodal fusion issue where the two modalities are audio and text. However, most of the time, both modalities are considered sequentially. A typical approach consists in automatically transcribing the audio track before analyzing the output-here considered as a regular text-with NLP methods. Independently processing the two modalities clearly seems suboptimal. This chapter focuses on recent research work toward a better integration between automatic speech recognition (ASR) and NLP for the analysis of spoken multime-dia documents with the goal of achieving a better transcription of multimedia streams.
Fichier principal
Vignette du fichier
chapter-Multimodal-Processing-and-Interaction-Audio-Video-Text-08.pdf (241.62 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02021921 , version 1 (16-02-2019)

Identifiants

Citer

Stéphane Huet, Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-Syntax and Pragmatics for Transcription. Petros Maragos, Alexandros Potamianos, Patrick Gros. Multimodal Processing and Interaction: Audio, Video, Text, Springer US, pp.201-218, 2008, 978-0-387-76316-3. ⟨10.1007/978-0-387-76316-3_9⟩. ⟨hal-02021921⟩
48 Consultations
88 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More