Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora

Abstract : Researches in the field of Named Entity recognition and alignment are of strong interest for various applications of natural language processing, such as Cross Lingual Information Retrieval, document management, question-answering systems, data mining etc. But in the processing of Arabic language, the task is particularly difficult and few resources are available to cope with these difficulties. In this paper, we present a simple method of character transcoding-a kind of transliteration that we call character reduction-which could improve an aligning system for Named Entities such as anthroponyms and toponyms. This system has been applied and evaluated on a French-Arabic parallel corpus that has been used during the Arcade 2 evaluation campaign. The purpose of this method is to bring the graphic forms of both languages close together as much as possible, in order to increase aligning precision. An outcome of such aligning is the ability to project on the target language (Arabic) annotations that has been done on the source language, for which more tools and resources are available (French, English, etc.).
Type de document :
Communication dans un congrès
ACIT 2008, 2008, Hammamet, Tunisia. ACIT 2008, pp.1-8, 2008
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01073705
Contributeur : Olivier Kraif <>
Soumis le : jeudi 14 mars 2019 - 20:35:58
Dernière modification le : vendredi 15 mars 2019 - 10:21:00

Fichier

ACIT_2008.Abdoulhay.Kraif.fina...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01073705, version 1

Collections

Citation

Authoul Abdulhay, Olivier Kraif. Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora. ACIT 2008, 2008, Hammamet, Tunisia. ACIT 2008, pp.1-8, 2008. 〈hal-01073705〉

Partager

Métriques

Consultations de la notice

45

Téléchargements de fichiers

3