Research
- Language learning: My current projects relate to second language learning and form part of the ALTA Institute research programme. We are working to better understand learner proficiency levels in spoken English and provide individualised, automated teaching feedback. The Institute, launched in October 2013 and funded by Cambridge Assessment, is concerned with corpus linguistics, computational linguistics, speech processing, machine learning and computer systems and platforms, relevant to the teaching and assessment of English.
- CALL systems: I also work on CALL (computer-assisted language learning) systems, with personalised automated feedback. I’m currently co-organising the 3rd Spoken CALL shared task. Please see the website and consider participating!
- Security NLP: I have worked on NLP for the analysis of online hacking forums. This involves domain adaptation, text classification and transfer to languages other than English. I was involved in a 6-month project funded by the Alan Turing Institute’s Defence & Security Programme.
- Low-resource NLP: I am interested in the processing of non-standard and low-resource natural languages, where ‘non-standard’ includes speech and online discourse, and ‘low-resource’ refers to any text type which is not well represented by current NLP models and resources. Specifically, I’ve been involved in projects to normalise transcriptions of speech, web vocabulary, and online forum posts, as well as the development of educational technology for the Runyakitara languages of western Uganda.
- Innovation in spoken English: This was the topic of my PhD, focused on the ‘zero auxiliary’ (omission of the tensed verb in questions such as ‘where you been’, ‘how you doing’, ‘we going to town’) in British English. I investigated zero auxiliary frequencies in the spoken section of the British National Corpus and found evidence for social, discourse and grammatical factors underlying its use. I subsequently used these findings to inform a repair algorithm for spoken language processing, in a 2010 ACL workshop paper with my supervisor Paula Buttery.
Publications
For my publication list please see my Google Scholar page. There are links to some pre-prints below. Most of my other publications are open access but please get in touch if you’re having trouble with paywalls etc, because I can probably share a pre-print with you.
Pre-prints for:
- A survey on recent approaches to question difficulty estimation from text, 2022, ACM Computing Surveys with Luca Benedetto, Paolo Cremonesi, Paula Buttery, Andrea Cappelli, Andrea Giussani, Roberto Turrin.
- Building natural language processing tools for Runyakitara, 2020, Applied Linguistics Review with Fridah Katushemererwe and Paula Buttery.
- Adaptive forgetting curves for spaced repetition language learning, 2020, AIED Conference paper with Ahmed Zaidi, Russell Moore, Paula Buttery & Andrew Rice.
- Behavioural cloning of teachers for automatic homework selection, 2019, AIED Conference paper with Russell Moore, Paula Buttery & Andrew Rice.
- Skills embeddings: a neural approach to multicomponent representations of students and tasks, 2019, EDM Conference paper with Russell Moore, Mark Elliott, Ahmed Zaidi, Paula Buttery & Andrew Rice.
- Accurate modelling of language learning tasks and students using representations of grammatical proficiency, 2019, EDM Conference paper with Ahmed Zaidi, Chris Davis, Russell Moore, Paula Buttery & Andrew Rice.
- ‘You Still Talking to Me?’: The Zero Auxiliary Progressive in Spoken British English Twenty Years On, 2018, with Mike McCarthy & Paula Buttery.
- The effect of task and topic on opportunity of use in learner corpora, 2017, with Paula Buttery.
- Incremental dependency parsing and disfluency detection in spoken learner English, 2015, with Russell Moore, Calbert Graham & Paula Buttery.
UROPs
After a break because of Covid-19 in 2020, we are running UROP projects again in the 2021 summer vacation. The UROP is a national programme designed to give undergraduates a taste of research work and the chance to put what they’ve learned into practice. Each project is funded by a sponsor and as such has a specific focus. There’s more information for this year’s UROP students on this page (Raven log-in required). Also: our Computer Lab news feature from April 2021.
Contact me: firstname.lastname @ cl.cam.ac.uk