An Aligned French-Chinese corpus of 10K segments from university educational material

Abstract : This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the MACAU project, by native Chinese students. The quality, as judged by native speakers, is adequate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as sup-plemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.
Type de document :
Communication dans un congrès
The 3rd Workshop on Natural Language Processing Techniques for Educational Applications, Dec 2016, Osaka, Japan. 2016, Proceedings Of The 3rd Workshop on Natural Language Processing Techniques for Educational Applications
Liste complète des métadonnées

http://hal.univ-grenoble-alpes.fr/hal-01430828
Contributeur : Ruslan Kalitvianski <>
Soumis le : mardi 10 janvier 2017 - 12:10:33
Dernière modification le : jeudi 11 janvier 2018 - 06:22:06

Identifiants

  • HAL Id : hal-01430828, version 1

Collections

Citation

Ruslan Kalitvianski, Lingxiao Wang, Valérie Bellynck, Christian Boitet. An Aligned French-Chinese corpus of 10K segments from university educational material. The 3rd Workshop on Natural Language Processing Techniques for Educational Applications, Dec 2016, Osaka, Japan. 2016, Proceedings Of The 3rd Workshop on Natural Language Processing Techniques for Educational Applications. 〈hal-01430828〉

Partager

Métriques

Consultations de la notice

142