MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations - GETALP
Communication Dans Un Congrès Année : 2024

MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations

1 Universität Wien = University of Vienna
2 Department of Informatics Engineering - DEI, University of Coimbra
3 UC - Universidade de Coimbra = University of Coimbra [Portugal]
4 Universidad de Zaragoza = University of Zaragoza [Saragossa University] = Université de Saragosse
5 UPB - University Politehnica of Bucharest [Romania]
6 DIT-UPPSALA - Department of Information Technology
7 University of Prishtina
8 UNISI - Università degli Studi di Siena = University of Siena
9 CLUNL - Centro de Linguística da Universidade Nova de Lisboa
10 CLLC - Centro de Línguas, Literaturas e Culturas
11 CNR-ILC - Istituto di Linguistica Computazionale "Antonio Zampolli"
12 SAS - Slovak Academy of Sciences
13 I3A - Aragón Institute of Engineering Research [Zaragoza]
14 UniOr - Università di Napoli L'Orientale = University of Naples
15 University of Ljubljana
16 ILSP - Institute for Language and Speech Processing
17 JCT - Jerusalem College of Technology
18 Institute for the Croatian Language, Croatia
19 Mykolas Romeris University
20 CISUC - Centre for Informatics and Systems
21 GETALP - Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole
22 Universidade do Porto = University of Porto
23 UNIMIB - Università degli Studi di Milano-Bicocca = University of Milano-Bicocca
24 University College Beder
25 University of Belgrade, Faculty of Mining and Geology
26 University of Belgrade [Belgrade]
27 Mykolas Romeris University
28 UKIM - Ss. Cyril and Methodius University in Skopje
Timotej Knez
  • Fonction : Auteur
  • PersonId : 1373349
Sigita Rackevičienė
  • Fonction : Auteur
  • PersonId : 1373353
Ricardo Rodrigues
  • Fonction : Auteur
  • PersonId : 1373354
Linas Selmistraitis
  • Fonction : Auteur
  • PersonId : 1373355
Enriketa Sogutlu
  • Fonction : Auteur
  • PersonId : 1373358
Slavko Zitnik
  • Fonction : Auteur
  • PersonId : 1005361

Résumé

Understanding the relation between the meanings of words is an important part of comprehending natural language. Prior work has either focused on analysing lexical semantic relations in word embeddings or probing pretrained language models (PLMs), with some exceptions. Given the rarity of highly multilingual benchmarks, it is unclear to what extent PLMs capture relational knowledge and are able to transfer it across languages. To start addressing this question, we propose MultiLexBATS, a multilingual parallel dataset of lexical semantic relations adapted from BATS in 15 languages including low-resource languages, such as Bambara, Lithuanian, and Albanian. As experiment on cross-lingual transfer of relational knowledge, we test the PLMs’ ability to (1) capture analogies across languages, and (2) predict translation targets. We find considerable differences across relation types and languages with a clear preference for hypernymy and antonymy as well as romance languages.
Fichier principal
Vignette du fichier
2475_Paper.pdf (228.37 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04539892 , version 1 (09-04-2024)

Identifiants

  • HAL Id : hal-04539892 , version 1

Citer

Dagmar Gromann, Hugo Gonçalo Oliveira, Lucia Pitarch, Elena-Simona Apostol, Jordi Bernad, et al.. MultiLexBATS: Multilingual Dataset of Lexical Semantic Relations. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELDA; ICCL, May 2024, Torino, Italy. pp.11783--11793. ⟨hal-04539892⟩
71 Consultations
251 Téléchargements

Partager

More