Large pre-trained language models have transformed the field of natural language processing in recent years (e.g. BERT, XLNet, GPT-3) and state-of-the-art performance on many benchmark datasets has been improved by fine-tuning such models for the given task. BERT is the most widely used large language model and several extensions have been developed focused on downstream tasks in a specialised domain: e.g. Clinical BERT, SciBERT and BioBERT. These models use large domain-specific corpora to finetune or train from scratch, and show improvements on several domain-specific downstream tasks. In this project we would work with texts written by learners of English as training or finetuning data for transformer models, experimenting with a number of training conditions and evaluating on a number of downstream tasks such as essay scoring and native language identification.
We are interested in the edits learners of English make to their essays in an online writing practice platform which provides automatic grades and grammatical error feedback. There is firstly a pre-processing task in aligning different versions of an essay which may involve major revisions at the level of sentence structure. Secondly there is the research question of automatically deciding whether each sentence level revision is an improvement or not.
This is a project about generalisability of text classifiers for underground hacking forums: we have trained various models on HackForums texts but have not yet applied them to other forum data contained in the CrimeBB collection of underground hacking forums. This project would involve doing so, including annotation of new test sets for the new forums in order to evaluate performance of the old models on new data. Potential directions for this project include improved training of more generalised models, domain adaptation to each forum, and application of English-focused methods to the forum data in other languages (so far: German, Spanish, Russian).
Supervisors: Andrew Caines, Ahmed Zaidi, Zheng Yuan, Helen Yannakoudakis, Marek Rei, Øistein Andersen and Paula Buttery
Essay scoring systems (e.g. Taghipour & Ng, 2016) automatically assign a mark to some essay. However, it would be desirable to produce an explanation for the assigned score, both (i) as feedback to the learner and (ii) to provide users a “right to an explanation”, as required by the GDPR. There has been a lot of work in explainability/interpretability, with one strand being rationale generation (Yu et al, 2019). This project aims to apply recent ideas in rationale generation to obtain explanations for neural essay marking systems. Other potential extensions include exploring other explainability methods (Ribeiro et al, 2016; Shrikumar et al, 2017) and looking to present model confidence and/or uncertainty in conjunction with marking decisions.
Kaveh Taghipour and Hwee Tou Ng. 2016. A Neural Approach to Automated Essay Scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Mo Yu, Shiyu Chang, Yang Zhang, and Tommi Jaakkola. 2019. Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Important Features Through Propagating Activation Differences. In Proceedings of the 34th International Conference on Machine Learning
Eric Wallace, Shi Feng, and Jordan Boyd-Graber. 2018. Interpreting Neural Networks with Nearest Neighbors. BlackboxNLP
Aili Shen, Daniel Beck, Bahar Salehi, Jianzhong Qi, Timothy Baldwin. 2019. Modelling Uncertainty in Collaborative Document Quality Assessment. W-NUT
This project involves translation and paraphrasing of multi-lingual data with a focus on language education. This was the WNGT shared task in 2020, organised by Duolingo, and now completed. We would seek to learn from the best entries to the competition, and attempt to combine the strengths of several models. In addition, data science explorations of the data may be welcome.