Applicant Details
Applicant | Ms Shailza Jolly |
Home institution | Technical University of Kaiserslautern (Kaiserslautern, Germany) |
Home institution address | Trippstadter Strasse 122 67663, Kaiserslautern, Germany |
STSM Details
Action | CA18231 – Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation |
Action General Information: CA18231 | |
STSM title | Generating fact-checking explanations in low-resource settings |
Period | AGA-CA18231-2: 2020-05-01 – 2021-10-31 |
Start date | 2021-01-01 |
End date | 2021-03-31 |
Motivation and Workplan summary | Aim and Motivation: A recent study on generating explanations for fact-checking systems [1] found that veracity prediction models perform better when they use human-like justifications instead of raw ruling comments (RCs). However, using human justifications defeats the overall purpose of building automatic fact-checking systems. During my research stay, I plan to investigate methods capable of synthesizing narrative and informative human-like justifications from RCs. A majority of SOTA language generation methods employ supervised neural models which require extensive amounts of training data. Since, obtaining human-generated data is expensive and time-consuming, the focus will be to develop a system using minimal amounts of training data. I plan to develop the justification generation system which consists of two phases. In the first phase, the target is to extract relevant RCs for generating justifications. It can be modeled as an extractive summarization task, similar to [1], which can be trained using fewer training samples. In the second phase, I will use an iterative score-based NLG algorithm, which has been successfully applied to various NLP tasks like paraphrasing [3] and text simplification [4], to synthesize narrative justifications from disconnected RCs selected in the first phase. I will design a new scoring function which will comprise multiple components like fluency module based on pre-trained language models like GPT-2, long-length penalty which encourages shorter justifications, evaluation metrics like BLEU preventing information loss, and preserving meaning using entity scores. The scoring function will be injected into an iterative algorithm which applies one out of several actions to improve the justification/summary quality in each iteration. I will use the LIAR-PLUS [2] dataset in my experiments since it includes the human-generated justifications which are required for the first phase. The research visit to Prof. Augenstein’s lab will help me to collaborate with researchers having in-depth experience with automated fact-checking and language generation systems to produce high quality research artifact(s). I plan to publish my work done during the research stay at EMNLP 2021. The project fits within the Multi3Generation COST action, where we will generate fact-checking explanations in low-resource settings. Contribution to the Action’s scientific objectives: WG1 — Grounded multi-modal reasoning and generation: During the proposed STSM, I plan to research methods for explaining machine learning models, which is in line with WG1’s aims. WG2 — Efficient Machine Learning algorithms, methods, and applications to language generation: I further plan to research novel methods for low-resource and transfer learning for generative models, which is in line with WG2’s objectives. Planning: Start of research stay: 01 January 2021 31 January 2021: Complete literature review on Natural Language Generation in Low-Resource settings. Examine and implement the first baseline for the fact checking system. 28 February 2021: Implement scoring function. Integrate generated justifications into the current model framework used by [1]. 31 March 2021: Continue experiments and paper-writing. End of Research Stay: 31 March 2021 Target Conference: EMNLP 2021 References: [1] Atanasova, P., Simonsen, J.G., Lioma, C. and Augenstein, I., 2020. Generating Fact Checking Explanations. arXiv preprint arXiv:2004.05773. [2] Alhindi, T., Petridis, S. and Muresan, S., 2018, November. Where is your Evidence: Improving Fact-checking by Justification Modeling. In Proceedings of the First Workshop on Fact Extraction and VERification (FEVER) (pp. 85-90). [3] Liu, X., Mou, L., Meng, F., Zhou, H., Zhou, J. and Song, S., 2019. Unsupervised paraphrasing by simulated annealing. arXiv preprint arXiv:1909.03588. [4] Kumar, D., Mou, L., Golab, L. and Vechtomova, O., 2020. Iterative Edit-Based Unsupervised Sentence Simplification. ACL. |
Host Details
Name | Prof Isabelle Augenstein |
Institution | University of Copenhagen |
Institution address | Universitetsparken 5, 2100, Copenhagen, Denmark |
Financial Support
Amount for Travel in EUR | 300 |
Amount for Subsistence in EUR | 3200 |
Total Amount in EUR | 3500 |