Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.

Domaines

Fichier principal

garnerin-etal-camera-ready.pdf (247 Ko)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Autorisation HAL

Connectez-vous pour contacter le contributeur

https://hal.univ-grenoble-alpes.fr/hal-03472117

Soumis le : jeudi 9 décembre 2021-10:42:22

Dernière modification le : samedi 27 septembre 2025-19:59:47

Dates et versions

hal-03472117 , version 1 (09-12-2021)

Licence

Autorisation HAL

Identifiants

HAL Id : hal-03472117 , version 1
DOI : 10.18653/v1/2021.gebnlp-1.10

Citer

Mahault Garnerin, Solange Rossato, Laurent Besacier. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech. 3rd Workshop on Gender Bias in Natural Language Processing, Aug 2021, Online, France. pp.86-92, ⟨10.18653/v1/2021.gebnlp-1.10⟩. ⟨hal-03472117⟩