Simple, Simpler and Beyond: A Fine-Tuning BERT-Based Approach to Enhance Sentence Complexity Assessment for Text Simplification - Université Grenoble Alpes Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Simple, Simpler and Beyond: A Fine-Tuning BERT-Based Approach to Enhance Sentence Complexity Assessment for Text Simplification

Résumé

Automatic text simplification models face the challenge of generating outputs that, while being indeed simpler, still retain some complexity. This stems from the inherently relative nature of simplification, wherein a given text is transformed into a relatively simpler version, which does not necessarily equate to simple. We thus aim to propose a finer-grained method to assess sentence complexity in French. Our solution comprises three models, in which two address absolute and relative sentence complexity assessment, while the third focuses on measuring simplicity gain. By employing this triad of models, we aim to offer a comprehensive approach to qualify and quantify sentence simplicity. Our approach utilizes FlauBERT, finetuned for classification and regression tasks. Based on our three-dimensional complexity analysis, we provide the WIVICO dataset, comprising 46,525 aligned complex-simpler pairs, which can be further leveraged to fine-tune large language models to automatically generate simplified texts, or to assess text complexity with greater granularity.
Fichier principal
Vignette du fichier
2023_ORMAECHEA_ICNLSP.pdf (651.08 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04359942 , version 1 (21-12-2023)

Identifiants

  • HAL Id : hal-04359942 , version 1

Citer

Lucía Ormaechea, Nikos Tsourakis, Didier Schwab, Pierrette Bouillon, Benjamin Lecouteux. Simple, Simpler and Beyond: A Fine-Tuning BERT-Based Approach to Enhance Sentence Complexity Assessment for Text Simplification. ICNLSP (International Conference on Natural Language and Speech Processing), University of Trento, Dec 2023, Trento, Italy. ⟨hal-04359942⟩
45 Consultations
43 Téléchargements

Partager

Gmail Mastodon Facebook X LinkedIn More