Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French - GETALP Access content directly
Conference Papers Year : 2024

Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French

Abstract

Many papers on speech processing use the term ’spontaneous speech’ as a catch-all term for situations like speaking with a friend, being interviewed on radio/TV or giving a lecture. However, Automatic Speech Recognition (ASR) systems performance seems to exhibit variation on this type of speech: the more spontaneous the speech, the higher the WER (Word Error Rate). Our study focuses on better understanding the elements influencing the levels of spontaneity in order to evaluate the relation between categories of spontaneity and ASR systems performance and improve the recognition on those categories. We first analyzed the literature, listed and unraveled those elements, and finally identified four axes: the situation of communication, the level of intimacy between speakers, the channel and the type of communication. Then, we trained ASR systems and measured the impact of instances of face-to-face interaction labeled with the previous dimensions (different levels of spontaneity) on WER. We made two axes vary and found that both dimensions have an impact on the WER. The situation of communication seems to have the biggest impact on spontaneity: ASR systems give better results for situations like an interview than for friends having a conversation at home.
Fichier principal
Vignette du fichier
2024_Evain_UnravelingSpontSpeech.pdf (455.23 Ko) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-04533965 , version 1 (06-04-2024)

Identifiers

  • HAL Id : hal-04533965 , version 1

Cite

Solène Evain, Solange Rossato, François Portet. Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy. ⟨hal-04533965⟩
30 View
58 Download

Share

Gmail Mastodon Facebook X LinkedIn More