M3G @ EAMT 2023 - Multi3Generation

Workshop on Multilingual, Multimodal and Multitask Language Generation

CA18231- Multi3Generation

5 May 2023

Submission deadline

12 May 2023

Acceptance Notification Date

31 May 2023

Camera-ready copies

15 June 2023

Workshop

Aim and scope

As progress in artificial intelligence (AI) evolves, there has been public debate as to whether machine learning (ML) and the new trends in neural network (NN) techniques accomplish the goal of transforming current computational systems into technology where communication between humans and machines is seamlessly intelligible. While the technology enthusiasts seized this unique opportunity, reasonable skeptics do not share the rise of AI with the same enthusiasm and ask difficult questions: (i) Are there any major scientific breakthroughs brought by AI? (ii) Are people (AI users and technologists) learning anything from AI as for the process of the human brain in identifying, analyzing and generating human natural language? Although the reactions from different research actors and technological fields are different and often controversial, there is no doubt that AI has contributed to the community and proved to be useful in tasks, such as machine translation, summarization, among others. Rather than discussing on the arguable contribution of AI to science, and its utility to society, this workshop is initiated in the framework of Multi3Generation COST Action – European network for Multilingual, Multimodal and Multitask Language Generation (CA18231) and would put together researchers and companies interested in different aspects of language generation from multiple perspectives, as language generation is currently one of the cornerstones of natural language processing and AI. Since the workshop is co-located within EAMT, we prioritize work on the generation of multilingual texts and (machine) translation, such as the development of high quality linguistic resources, multimodal novelties, different tasks in multiple languages, and proposals on improving the overall field of translation.

Topics

In the 1st edition of the Multi3Generation workshop, we aim to bring to the forefront the challenges involved in Language Generation from a multidimensional perspective. We hope that the workshop will provide a common forum to consolidate the multidisciplinary efforts and foster discussions to identify the wide-ranging issues related to the language generation task and its derived applications. To this regard, we encourage the submission of high-quality and original submissions covering the following topics:

Multimodal language generation
Multilingual language generation
Multitask language generation
Creative language generation
Paraphrase generation
Translation of expressions, multiwords and phrasal units
Linguistic resources such as datasets of paraphrases
Association of expressions with identical or similar meaning
Effective ways to add value to the MT technology
Applications of language generation: machine translation, paraphrasing, text rewriting, cross-analysis of language varieties, summarisation, automatic feedback generation, text simplification, language teaching, etc.
Grounded multimodal reasoning and generation
Efficient machine learning algorithms, methods, and applications to language generation
Dialogue, interaction and conversational language generation applications
Large knowledge bases and graphs that can be used for language generation
Commonsense reasoning in language generation
Applications of language generation in industry and society
ChatGPT: opportunities, challenges, and threats
Large language models and their generative power

Registration

The Multi3Generation workshop will be free of charge for Multi3Generation COST Action members. Participants from some specific countries (ITC Countries) can apply for a grant to support travel and accommodation.

INCLUSIVENESS TARGET COUNTRIES (ITC) CONFERENCE GRANTS – User guide (.pdf)

Additionally, Multi3Generation will provide a number of grants to partially or fully cover the travel and accommodation expenses of the participants with accepted papers.

Submissions

We invite two kinds of submissions:

Full papers
(up to 7 pages + references): Original and high-quality unpublished contributions on the theory and practical aspects of the narrative generation task.
Full-papers should introduce existing approaches, describe the methodology and the experiments conducted in detail.
Work in progress, project, demos and dissemination papers
(up to 4 pages + references): Unpublished short papers describing work in progress; projects, demo and resource papers presenting research/industrial prototypes, datasets or software packages; position papers introducing a new point of view, a research vision or a reasoned opinion on the workshop topics; and dissemination papers describing project ideas, ongoing research lines, case studies or summarized versions of previously published papers in high-quality conferences/journals that are worthwhile sharing with the Multi3Generation community, but where novelty is not a fundamental issue.

Submissions should follow the EAMT 2023 guidelines and style templates (PDF, LaTeX, Word). It will be peer-reviewed by at least two members of the program committee. Accepted papers will be published online as proceedings included in the ACL Anthology and will be presented at the conference orally or via poster. Please find the templates in the following link under the section “Templates for Papers”: https://events.tuni.fi/eamt23/second-call-for-papers/

In addition, a selection of accepted papers will be proposed to be extended to be published at ORE (indexed in Scopus and DBLP) as long as they don’t conflict with previous publication rights.

Papers should be submitted by EasyChair platform.

Keynote talk

We will have one keynote related to the topics of the workshop.

Invited Speaker

Speaker Bio

André Martins (PhD 2012, Carnegie Mellon University and University of Lisbon) is an Associate Professor at Instituto Superior Técnico, University of Lisbon, researcher at Instituto de Telecomunicações, and the VP of AI Research at Unbabel.

His research, funded by a ERC Starting Grant (DeepSPIN) and Consolidator Grant (DECOLLAGE), among other grants, include machine translation, quality estimation, structure and interpretability in deep learning systems for NLP. His work has received best paper awards at ACL 2009 (long paper) and ACL 2019 (system demonstration paper). He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS), and he is a Fellow of the ELLIS society.

Keynote

Towards Explainable and Reliable Multilingual NLP

Abstract

Natural language processing systems are becoming increasingly accurate and powerful. However, in order to take full advantage of these advances, new capabilities are necessary for humans to understand model predictions and when to question or to bypass them. In this talk, I will present recent work from our group in two directions. In the first part, I will describe a new approach for selective rationalization based on sparse and structured transformations (sparsemax, alpha-entmax, and LP-SparseMAP), all drop-in replacements for softmax that permit handling constraints through differentiable layers. This leads to SPECTRA, a deterministic and structured rationalizer with favorable properties in terms of predictive power, quality of the explanations, and model variability. Then, I will present CREST (ContRastive Edits with Sparse raTionalization), which combines the above idea with a counterfactual text generator, leading to improvements in counterfactual quality, model robustness, and interpretability. We introduce a new loss function that leverages CREST counterfactuals to regularize selective rationales using SPECTRA and show that this regularization improves both model robustness and rationale quality, compared to methods that do not leverage CREST counterfactuals. In the second part, I will present several methods for detecting and correcting hallucinations in neural machine translation (NMT). We annotate a dataset of over 3.4k sentences indicating different kinds of critical errors and hallucinations. We compare several detection methods, both glass-box uncertainty-based detectors and model-based detectors. As hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Finally, we study hallucinations in massively multilingual models by conducting a comprehensive analysis on both the M2M family of conventional neural machine translation models and ChatGPT / GPT-4. Our investigation covers a broad spectrum of conditions, spanning over 100 translation directions across various resource levels and going beyond English-centric language pairs. We provide key insights regarding the prevalence, properties, and mitigation of hallucinations, paving the way towards more responsible and reliable machine translation systems.

This is joint work with Marcos Treviso, Nuno Guerreiro, Duarte Alves, Vlad Niculae, Ben Peters, Pierre Colombo, Alexis Ross, Elena Voita, Jonas Waldendorf, Barry Haddow, Alexandra Birch, Pablo Piantanida in the scope of the DeepSPIN, MAIA, and UTTER projects.

Programme

9:00 Welcome

9:30-10:30 Plenary talk

André Martins – Instituto Superior Técnico, Instituto de Telecomunicações, Unbabel, Lisbon, Portugal.
Towards Explainable and Reliable Multilingual NLP

10:30-11:00 coffee break

11:00-12:30 oral presentations (3 papers)

Anabela Barreiro (INESC-ID Lisboa, Portugal)
A Multilingual Paraphrasary of Multiwords
Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu and Sadao Kurohashi (Kyoto University, Japan)
Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Iván Martínez-Murillo, Paloma Moreda and Elena Lloret (University of Alicante, Spain)
Towards an Efficient Approach for Controllable Text Generation

12:30-14:00 lunch

14:00-15:00 oral presentations (2 papers)

Juuso Eronen, Michal Ptaszynski, Karol Nowakowski, Zheng Lin Chia and Fumito Masui (Prefectural University of Kumamoto, Japan)
Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity
Oleksii Turuta, Daniil Maksymenko, Nataliia Saichyshyna, Olena Turuta and Maksym Yerokhin (Kharkiv National University of Radio Electronics, Ukraine)
Controllability for English-Ukrainian Machine Translation Based on Specialized Corpora

15:00 - 15:30 Coffee break

15:30-16:30 oral presentations (2 papers)

Sara Amato and Kutz Arrieta (USA)
Natural Language Generation in the Logos Model
Mika Hämäläinen and Khalid Alnajjar (Rootroo Ltd, Finland)
RooAd: A Computationally Creative Online Advertisement Generator

16:30-17:00 closing remarks and end of the workshop

Workshop organizers

Anabela Barreiro (INESC-ID Lisboa, Portugal)
Max Silberztein (University of Franche-Comté, France)
Elena Lloret (University of Alicante, Spain)

Proceedings chair

Max Silberztein (Université de Franche-Comté, France)

Web and dissemination chair

Marcin Paprzycki (Systems Research Institute Polish Academy of Sciences, Poland)

Programme Committee

Mirela Alhasani (EPOKA University, Albania)
Isabelle Augenstein (University of Copenhagen, Denmark)
Mehul Bhatt (Örebro University, Sweden)
Anabela Barreiro (INESC-ID Lisboa, Portugal)
Iacer Calixto (Universiteit van Amsterdam, Netherlands)
José Camargo (Unbabel, Portugal)
Liviu Dinu (University of Bucharest, Romania)
Aykut Erdem (Koç University, Turkey)
Maria Ganzha (Warsaw University of Technology, Poland)
Albert Gatt (Universiteit Utrecht, Netherlands)

Fabio Kepler (Unbabel, Portugal)
Elena Lloret (University of Alicante, Spain)
Helena Moniz (University of Lisboa and INESC-ID Lisboa, Portugal)
Marcin Paprzycki (Systems Research Institute Polish Academy of Sciences, Poland)
Max Silberztein (University of Franche-Comté, France)
Inguna Skadina (University of Latvia, Latvia)
Irene Russo (ILC CNR, Italy)
Oleksii Turuta (Kharkiv National University of Radio Electronics – NURE, Ukraine)

Proceedings

Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation)

Presentations of the papers

A Multilingual Paraphrasary of Multiwords
Anabela Barreiro, Cristina Mota

Controllability for English-Ukrainian Machine Translation Based on Specialized Corpora
Daniil Maksymenko

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity
Juuso Eronen, Michal Ptaszynski, Karol Nowakowski, Cheng Lin Chia, Fumito Masui

Natural Language Generation in the Logos Model
Sara Amato, Kutz Arrieta

Towards an Efficient Approach for Controllable Text Generation
Iván Martinez Murillo

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

Contact

For general inquiries regarding the workshop, reach the organizers at: multi3generation@gmail.com