Communication Dans Un Congrès Année : 2025

Beyond Static Emotions: Leveraging Multitask Learning to Model Dynamics of Dimensional Affect in Speech

Résumé

Dimensional affect prediction from speech has traditionally relied on acoustic features to estimate continuous affect representations (e.g., arousal, valence) at each time step. However, affect evolves dynamically over time, and incorporating temporal information may improve prediction accuracy. This study investigates emotional dynamics in speech emotion recognition using multitask learning, where a model jointly predicts both the affect state and its temporal derivative. Experiments on the RECOLA and SEWA datasets show that incorporating dynamic information improves affect state prediction, particularly for valence, known to be challenging to model from audio alone. While CCC scores for affect dynamic predictions remain lower than those for affect state predictions, results indicate that learning dynamics as an auxiliary task enhances affect state estimation over time. These findings underscore the importance of modelling emotional dynamics to capture the temporal evolution of affect.

Fichier principal
Vignette du fichier
Beyond_Static_Emotions__Leveraging_Multitask_Learning_to_Model_Dynamics_of_Dimensional_Affect_in_Speech_camera_ready.pdf (515.64 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
Licence

Dates et versions

hal-05375921 , version 1 (21-11-2025)

Licence

Identifiants

Citer

Yuxuan Zhang, Hippolyte Fournier, Ruslan Kalitvianski, Marco Dinarelli, Fabien Ringeval. Beyond Static Emotions: Leveraging Multitask Learning to Model Dynamics of Dimensional Affect in Speech. 28th International Conference on Text, Speech and Dialogue, Aug 2025, Erlangen-Nürnberg, Germany. pp.109-120, ⟨10.1007/978-3-032-02548-7_10⟩. ⟨hal-05375921⟩
135 Consultations
28 Téléchargements

Altmetric

Partager

  • More