Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora

Authoul Abdulhay; Olivier Kraif

Communication Dans Un Congrès Année : 2008

Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora

(1) , (1)

Authoul Abdulhay

Fonction : Auteur

LInguistique et DIdactique des Langues Étrangères et Maternelles

Olivier Kraif

Fonction : Auteur
PersonId : 20769
IdHAL : olivier-kraif
IdRef : 067256759

LInguistique et DIdactique des Langues Étrangères et Maternelles

Résumé

Researches in the field of Named Entity recognition and alignment are of strong interest for various applications of natural language processing, such as Cross Lingual Information Retrieval, document management, question-answering systems, data mining etc. But in the processing of Arabic language, the task is particularly difficult and few resources are available to cope with these difficulties. In this paper, we present a simple method of character transcoding-a kind of transliteration that we call character reduction-which could improve an aligning system for Named Entities such as anthroponyms and toponyms. This system has been applied and evaluated on a French-Arabic parallel corpus that has been used during the Arcade 2 evaluation campaign. The purpose of this method is to bring the graphic forms of both languages close together as much as possible, in order to increase aligning precision. An outcome of such aligning is the ability to project on the target language (Arabic) annotations that has been done on the source language, for which more tools and resources are available (French, English, etc.).

Mots clés

Bilingual aligning transliteration anthroponyms toponyms Named Entities

Domaines

Informatique et langage [cs.CL]

Fichier principal

ACIT_2008.Abdoulhay.Kraif.final.pdf (98.74 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Olivier Kraif : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01073705

Soumis le : jeudi 14 mars 2019-20:35:58

Dernière modification le : jeudi 4 avril 2024-21:18:00

Archivage à long terme le : samedi 15 juin 2019-12:25:03

Dates et versions

hal-01073705 , version 1 (14-03-2019)

Identifiants

HAL Id : hal-01073705 , version 1

Citer

Authoul Abdulhay, Olivier Kraif. Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora. ACIT 2008, 2008, Hammamet, Tunisia. pp.1-8. ⟨hal-01073705⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA LIDILEM

56 Consultations

36 Téléchargements

Alignment Of Bilingual Named Entities In French– Arabic Parallel Corpora

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager