P. Boersma and D. Weenink, PRAAT, a system for doing phonetics by computer, Glot international, vol.5, pp.341-345, 2001.

A. Contesse and A. Pinchaud, vocal grammatics. Web page, www.vocalgrammatics.fr. Last, pp.2019-2027, 2019.

A. Graves, A. R. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, in: 2013 IEEE international conference on acoustics, speech and signal processing, pp.6645-6649, 2013.

A. Hazan, Towards automatic transcription of expressive oral percussive performances, Proceedings of the 10th international conference on Intelligent User Interfaces, pp.296-298, 2005.

K. Hipke, M. Toomim, R. Fiebrink, and J. Fogarty, Beat-Box: End-user Interactive Definition and Training of Recognizers for Percussive Vocalizations, pp.121-124, 2014.

A. Kapur, G. Tzanetakis, and M. Benning, Query-by-Beat-Boxing: Music Retrieval For The DJ, 2004.

T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, Audio augmentation for speech recognition, 2015.

B. Picart, S. Brognaux, and S. Dupont, Analysis and automatic recognition of Human BeatBox sounds: A comparative study, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4255-4259, 2015.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The Kaldi speech recognition toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, p.11, 2011.

D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar et al., Purely sequence-trained neural networks for ASR, 2016.

M. Proctor, E. Bresch, D. Byrd, K. Nayak, and S. Narayanan, Paralinguistic mechanisms of production in human "beatboxing": A real-time magnetic resonance imaging study, The Journal of the Acoustical Society of America, vol.133, pp.1043-1054, 2013.

E. Sinyor, C. Mckay, R. Fiebrink, D. Mcennis, and I. Fujinaga, Beatbox classification using ACE, p.4, 2005.

D. Stowell and M. D. Plumbley, Characteristics of the beatboxing vocal style, 2008.

V. Tiwari, MFCC and its applications in speaker recognition, International Journal on Emerging Technologies, pp.19-22, 2010.

S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba et al., Espnet: End-to-end speech processing toolkit, 2018.