Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora

Abstract : Researches in the field of Named Entity recognition and alignment are of strong interest for various applications of natural language processing, such as Cross Lingual Information Retrieval, document management, question-answering systems, data mining etc. But in the processing of Arabic language, the task is particularly difficult and few resources are available to cope with these difficulties. In this paper, we present a simple method of character transcoding-a kind of transliteration that we call character reduction-which could improve an aligning system for Named Entities such as anthroponyms and toponyms. This system has been applied and evaluated on a French-Arabic parallel corpus that has been used during the Arcade 2 evaluation campaign. The purpose of this method is to bring the graphic forms of both languages close together as much as possible, in order to increase aligning precision. An outcome of such aligning is the ability to project on the target language (Arabic) annotations that has been done on the source language, for which more tools and resources are available (French, English, etc.).
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01073705
Contributor : Olivier Kraif <>
Submitted on : Thursday, March 14, 2019 - 8:35:58 PM
Last modification on : Friday, March 15, 2019 - 10:21:00 AM

File

ACIT_2008.Abdoulhay.Kraif.fina...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01073705, version 1

Collections

Citation

Authoul Abdulhay, Olivier Kraif. Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora. ACIT 2008, 2008, Hammamet, Tunisia. pp.1-8. ⟨hal-01073705⟩

Share

Metrics

Record views

51

Files downloads

8