WG 1 – Grounded multi-modal reasoning and generation

Linguistic expressions are called grounded when they are linked to non-linguistic, especially perceptual data (such as information coming from modalities such as vision, audition, etc); grounding is in essence a key aspect of acquiring meaning. This is a long-standing challenge for Artificial Intelligence.

WG1 focusses on grounded representations for AI systems that, amongst other things, use multimodal information to reason, learn, and generate natural language. The central themes for WG1 are the following:

  • Explainability and transparency in multimodal models;
  • Complementarity / redundancy among data sources or modalities;
  • Interaction between symbolic & sub-symbolic (e.g. neural) representations in models;
  • The role of commonsense and other knowledge; 
  • Situated reasoning and language generation.

WG1 will be working towards:

  1. Drawing up standards for multimodal data sources
  2. Defining a research roadmap, through an appraisal of existing work and identification of gaps to be addressed in future work.

Individuals interested in joining WG1 should contact the chair (albert.gatt {at}

Skip to content