DIASER: A Unifying View On Task-oriented Dialogue Annotation - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
Communication Dans Un Congrès Année : 2022

DIASER: A Unifying View On Task-oriented Dialogue Annotation

DIASER : une unification d'annotation pour les dialogues orientés tâche

Vojtěch Hudeček
  • Fonction : Auteur
  • PersonId : 1102652
Daniel Štancl
  • Fonction : Auteur
  • PersonId : 1102653
Ondřej Dušek
  • Fonction : Auteur
  • PersonId : 1102654

Résumé

Every model is only as strong as the data that it is trained on. In this paper, we present a new dataset, obtained by merging four publicly available annotated corpora for task-oriented dialogues in several domains (MultiWOZ 2.2, CamRest676, DSTC2 and Schema-Guided Dialogue Dataset). This way, we assess the feasibility of providing a unified ontology and annotation schema covering several domains with a relatively limited effort. We analyze the characteristics of the resulting dataset along three main dimensions: language, information content and performance. We focus on aspects likely to be pertinent for improving dialogue success, e.g. dialogue consistency. Furthermore, to assess the usability of this new corpus, we thoroughly evaluate dialogue generation performance under various conditions with the help of two prominent recent end-to-end dialogue models: MarCo and GPT-2. These models were selected as popular open implementations representative of the two main dimensions of dialogue modelling. While we did not observe a significant gain for dialogue state tracking performance, we show that using more training data from different sources can improve language modelling capabilities and positively impact dialogue flow (consistency). In addition, we provide the community with one of the largest open dataset for machine learning experiments.
Fichier principal
Vignette du fichier
2022.lrec-1.137.pdf (624.42 Ko) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03713523 , version 1 (04-07-2022)

Identifiants

  • HAL Id : hal-03713523 , version 1

Citer

Vojtěch Hudeček, Léon-Paul Schaub, Daniel Štancl, Patrick Paroubek, Ondřej Dušek. DIASER: A Unifying View On Task-oriented Dialogue Annotation. Language Resources and Evaluation Conference (LREC2022), Jun 2022, Marseille, France. ⟨hal-03713523⟩
235 Consultations
93 Téléchargements

Partager

More