Multi3Generation

WG 4 – Exploiting large knowledge bases and graphs

WG4 focuses on using knowledge bases (KBs) and knowledge graphs in natural language generation, especially for the integration of common sense knowledge and world knowledge.
An expected result of WG4 is to increase the varieties of knowledge resources and language resources used in NLG.
WG4 will analyze how to efficiently integrate multimodal KBs, considering theoretical models of semantics and semantic processing that can accommodate linguistic and perceptual information.

WG4 members will work on:

  • increasing existing data-to-text NLG training sets with multilingual and multimodal content
  • testing neural NLG models performance on psycholinguistic datasets

If you’d like to join this working group, please get in touch with the WG Leader and co-leader Irene Russo: irene.russo(at)ilc.cnr.it and Liviu P. Dinu: ldinu(at)fmi.unibuc.ro

Open source repository

Data-to-text NLG training datasets

Data-to-text NLG systems require training data. Here we provide a list of freely available datasets that have been created with different methodologies (automatically, crowdsourcing etc.) and for different NLG sub-tasks.

NamePaperYearLink
WebNLG 2017Gardent, C., Shimorina, A., Narayan, S., & Perez-Beltrachini, L. (2017). Creating Training Corpora for NLG Micro-Planners. ACL.2017https://webnlg-challenge.loria.fr/challenge_2017/
WebNLG 2020Gardent, C., Shimorina, A., Narayan, S., & Perez-Beltrachini, L. (2017). Creating Training Corpora for NLG Micro-Planners. ACL.2020https://webnlg-challenge.loria.fr/challenge_2020/
KBGenBanik, E., Gardent, C., & Kow, E. (2013). The KBGen Challenge. ENLG.2013http://www.kbgen.org
E2E NLG ChallengeDusek, O., Novikova, J., & Rieser, V. (2020). Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge. Comput. Speech Lang., 59, 123-156.2017http://www.macs.hw.ac.uk/InteractionLab/E2E/
MultiWOZ 2.2Zang, X., Rastogi, A., Zhang, J., & Chen, J. (2020). MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines. ArXiv, abs/2007.12720.2020https://github.com/budzianowski/multiwoz
ToTToParikh, Ankur P., et al. “Totto: A controlled table-to-text generation dataset.” arXiv preprint arXiv:2004.14373 (2020).2020https://paperswithcode.com/dataset/totto 
RotoWireWiseman, Sam, Stuart M. Shieber, and Alexander M. Rush. “Challenges in Data-to-Document Generation.” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.2017https://github.com/harvardnlp/boxscore-data/blob/master/rotowire.tar.bz2
WikiBioLebret, Rémi, David Grangier, and Michael Auli. “Neural text generation from structured data with application to the biography domain.” arXiv preprint arXiv:1603.07771 (2016).2016https://paperswithcode.com/dataset/wikibio 
WEATHER GOV
ROBOCUP
Logic2TextChen, Zhiyu, et al. “Logic2Text: High-Fidelity Natural Language Generation from Logical Forms.” arXiv preprint arXiv:2004.14579 (2020).2020https://paperswithcode.com/dataset/logic2text 
DARTNan, Linyong, et al. “Dart: Open-domain structured data record to text generation.” arXiv preprint arXiv:2007.02871 (2020).2020https://paperswithcode.com/dataset/dart 
ENT-DESCCheng, Liying, et al. “ENT-DESC: Entity Description Generation by Exploring Knowledge Graph.” arXiv preprint arXiv:2004.14813 (2020).2020https://paperswithcode.com/dataset/ent-desc 
GEM (Generation, Evaluation, and Metrics)Gehrmann, Sebastian, et al. “The gem benchmark: Natural language generation, its evaluation and metrics.” arXiv preprint arXiv:2102.01672 (2021).2021https://paperswithcode.com/dataset/gem 

CA18231 Meeting

Skip to content