WG 2 – Efficient Machine Learning algorithms, methods, and applications to language generation

WG2 focuses on efficient ML machinery behind state-of-the-art Language Generation (LG) models, since much of the improvements we observe across different LG tasks nowadays are due to the application of varied deep neural network architectures. This involves, among others:

  • multi-task learning,
  • transfer learning,
  • representation learning,
  • structured prediction
  • generative models

Moreover, WG2 investigates integration strategies for multi-modal data, which is of critical importance for Multi3Generation.

This working group will be working on:

  • a repository of open source software for processing language and visual content, inc. a directory of sources of materials or components
  • a survey on efficient use of ML for LG
  • the organization of training schools

If you would like to join this working group, please contact the WG Leader and coleader: Aykut Erdem ( and Elena Lloret (

Publications and preprints

  • M. Sercan Amac , Semih Yagcioglu , Aykut Erdem, and Erkut Erdem. “Procedural Reasoning Networks for Understanding Multimodal Procedures”, In 23rd Conference on Computational Natural Language Learning (CoNLL)
  • J. Phang, I. Calixto, PM. Htut, Y. Pruksachatkun, H. Liu, C. Vania, K. Kann, SR. Bowman (2020). English intermediate-task training improves zero-shot cross-lingual transfer too. In AACL 2020.
  • V. Milewski , MF. Moens and I. Calixto (2020). Are scene graphs good enough to improve image captioning? In: AACL 2020.
  • Ales Zamuda and Elena Lloret . Optimizing Data Driven Models for Summarization as Parallel Tasks, Journal of Computational Science, Vol. 42, April 2020.
  • Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem and Lucia Specia. Cross-lingual Visual Pre-training for Multimodal Machine Translation, Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Short Papers, 2021.
  • Begum Citamak, Ozan Caglayan, Menekse Kuyu, Erkut Erdem, Aykut Erdem, Pranava Madhyastha, Lucia Specia. MSVD-Turkish: A Comprehensive Multimodal Video Dataset for Integrated Vision and Language Research in Turkish, Machine Translation, Vol. 35, 265-288.
  • Emre Boran, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem, Pranava Madhyastha, Lucia Specia. Leveraging auxiliary image descriptions for dense video captioning, Pattern Recognition Letters, Vol. 146, June 2021.
  • Rob van der Goot, Ahmet Üstün, Alan Ramponi, Ibrahim Sharaf and Barbara Plank. Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP.In EACL 2021. Received EACL 2021 oustanding paper award (demo track)


Dataset name and brief description, including purposeAuthors/creatorsLink
MSVD-Turkish: The first large scale video captioning dataset for Turkish languages, obtained by carefully translating the English descriptions of the videos in the MSVD (Microsoft Research Video Description Corpus) dataset into Turkish.Begum Citamak, Ozan Caglayan, Menekse Kuyu, Erkut Erdem, Aykut Erdem, Pranava Madhyastha, and Lucia Specia.MSVD-Turkish


Name and/or brief descriptionAuthors/creatorsLink
Procedural Reasoning NetworksAmac, M. Sercan, Yagcioglu, Semih, Erdem, Aykut, and Erdem, Erkut

Open source repository

Neural Natural Language Generation (GitHub)

Other outputs

Add anything here that doesn’t clearly fall under the other headings.

Skip to content