An Aligned French-Chinese corpus of 10K segments from university educational material

Ruslan Kalitvianski; Lingxiao Wang; Valérie Bellynck; Christian Boitet

Communication Dans Un Congrès Année : 2016

An Aligned French-Chinese corpus of 10K segments from university educational material

(1) , (2) , (1) , (1)

1
2

Ruslan Kalitvianski

Fonction : Auteur
PersonId : 997757

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Lingxiao Wang

Fonction : Auteur
PersonId : 10084
IdHAL : lingxiao-wang
IdRef : 193281767

Laboratoire d'Informatique de Grenoble

Valérie Bellynck

Fonction : Auteur
PersonId : 10334
IdHAL : valerie-bellynck
IdRef : 164853545

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Christian Boitet

Fonction : Auteur
PersonId : 957195

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Résumé

This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the MACAU project, by native Chinese students. The quality, as judged by native speakers, is adequate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as sup-plemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.

Domaines

Informatique et langage [cs.CL]

Ruslan Kalitvianski : Connectez-vous pour contacter le contributeur

https://hal.univ-grenoble-alpes.fr/hal-01430828

Soumis le : mardi 10 janvier 2017-12:10:33

Dernière modification le : jeudi 4 avril 2024-20:56:19

Dates et versions

hal-01430828 , version 1 (10-01-2017)

Identifiants

HAL Id : hal-01430828 , version 1

Citer

Ruslan Kalitvianski, Lingxiao Wang, Valérie Bellynck, Christian Boitet. An Aligned French-Chinese corpus of 10K segments from university educational material. The 3rd Workshop on Natural Language Processing Techniques for Educational Applications, Dec 2016, Osaka, Japan. ⟨hal-01430828⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP LIG_SIDCH

84 Consultations

0 Téléchargements

An Aligned French-Chinese corpus of 10K segments from university educational material

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager