Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
Résumé
Convolutional Neural Networks have been proven to successfully capture spatial aspects of the speech signal and eliminate spectral variations across speakers for Automatic Speech Recognition. In this study, we investigate the Convolutional Neural Network with Time Delay Neural Network for an acoustic model to deal with large vocabulary continuous speech recognition for Khmer. Our idea is to use Convolutional Neural Networks to extract local features of the speech signal, whereas Time Delay Neural Networks capture long temporal correlations between acoustic events. The experimental results show that the suggested network outperforms the Time Delay Neural Network and achieves an average relative improvement of 14% across test sets.
Origine | Fichiers produits par l'(les) auteur(s) |
---|