FRACAS: a FRench Annotated Corpus of Attribution relations in newS

Quotation extraction is a widely useful task both from a sociological and from a Natural Language Processing perspective. However, very little data is available to study this task in languages other than English. In this paper, we present FRACAS, a manually annotated corpus of 1,676 newswire texts in French for quotation extraction and source attribution. We first describe the composition of our corpus and the choices that were made in selecting the data. We then detail the annotation guidelines, the annotation process and give relevant statistics about our corpus. We give results for the inter-annotator agreement which is substantially high for such a difficult linguistic phenomenon. We use this new resource to test the ability of a neural state-of-the-art relation extraction system to extract quotes and their source and we compare this model to the latest available system for quotation extraction for the French language, which is rule-based. Experiments using our dataset on the state-of-the-art system show very promising results considering the difficulty of the task at hand.

Mots clés

attribution relation extraction corpus

Domaines

Informatique et langage [cs.CL]

Fichier principal

2024.lrec-main.654.pdf (756.11 Ko)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Ange Richard : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04534046

Soumis le : jeudi 30 mai 2024-12:49:10

Dernière modification le : mercredi 18 décembre 2024-09:25:11

Dates et versions

hal-04534046 , version 1 (14-05-2024)

hal-04534046 , version 2 (30-05-2024)

Identifiants

HAL Id : hal-04534046 , version 2

Citer

Ange Richard, Laura Alonzo Canul, François Portet. FRACAS: a FRench Annotated Corpus of Attribution relations in newS. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Turin, Italy. pp.7417-7428. ⟨hal-04534046v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS PACTE LIG LIG_TDCGE_GETALP MIAI ANR LIG_SIDCH MEMO-SHS

130 Consultations

89 Téléchargements