Multi3Generation

WG 3 – Dialogue, interaction and conversational language generation applications

WG3 focuses on on Human Computer Interaction (HCI) tasks in multilingual and multimodal scenarios applying LG models to distinct use cases, such as conversational agents, with 3 main directions:

  • New answer generation techniques where the (human) agent will receive suggestions
  • New techniques for conversational quality estimation and sentiment analysis
  • Creation of multilingual datasets for low resourced languages

Main challenges for WG3:

  • Scarcity of multimodal and multilingual datasets for chat in general
  • Scarcity of multilingual datasets for low resourced languages
  • Benchmarking the models applied
  • New metrics for multilingual conversational dialogues

This working group will be working on:

  • a survey on affective agents and answer generation adaptations to chat data
  • a survey on metrics for dialogue systems
  • report on available multilingual datasets for conversational data
  • creation of multilingual datasets for low resourced languages
  • contribution to the cross-task roadmap of the project
  • contribution to responsible AI initiatives, since we are working with modalities with severe ethic aspects 

If you would like to join this working group, please contact the WG Leader and co leader:  Helena Moniz (helena.moniz@campus.ul.pt) and Inguna Skadina (inguna.skadina@lu.lv)

Low resourced languages datasets

Available datasets:

  1. Latvian corpus: http://hdl.handle.net/20.500.12574/47

This multi-targeted dataset contains several datasets that allow to train goal-oriented dialogue systems for student service domain in Latvian. The dataset contains a manually annotated dataset of domain-specific dialog intents, a manually created and annotated dataset of generalised and formalised dialog scenarios based on corpus evidence, dataset for FAQ module training.

Publications and preprints

Skip to content