Project Suggestions 2019

Grammatical Error Detection/Correction - Targeting Specific Error Types

Proposer: Chris Bryant
Supervisors: Chris Bryant and Paula Buttery
Resources: BEA-2019 Shared Task Data
Course: Part III or ACS

Description

Grammatical Error Correction is the task of automatically detecting and correcting grammatical errors in text. Although recent work has focused on correcting all error types simultaneously, systems that target specific subsets of error types may be more effective. One of the aims of this project is hence to compare specialised systems against general systems, in terms of detection and/or correction.

There are many different subsets of error types that can be explored, but you should choose from the types in the ERRANT framework (Table 2 or Appendix A). Some example specialised systems might correct only: e.g.

Unnecessary words (U)
Morphology errors (FORM, INFL, NUM, TENSE, ...)
Function words (DET, PREP, ...)

Depending on the subset chosen, a system could be built using rules, language models, classifiers, and/or neural sequence labelling. Since there are many possibilities for different combinations of error type detection/correction systems, several students could undertake different versions of this project.

References

The BEA-2019 shared task on grammatical error correction.
Christopher Bryant, Mariano Felice, Øistein E. Andersen and Ted Briscoe. 2019.

Automatic annotation and evaluation of error types for grammatical error correction.
Christopher Bryant, Mariano Felice, and Ted Briscoe. 2017.

Neural grammatical error correction systems with unsupervised pre-training on synthetic data.
Roman Grundkiewicz, Marcin Junczys-Dowmunt, and Kenneth Heafield. 2019.

The CoNLL-2014 shared task on grammatical error correction.
Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, Christopher Bryant. 2014.

Unsupervised Error Detection

Proposer: Marek Rei
Supervisors: Chris Bryant, Marek Rei and Paula Buttery
Resources: BEA-2019 Shared Task Data
Course: ACS

Description

Automated systems for detecting errors in learner writing are valuable tools for second language learning and assessment. Previous work has mostly treated error detection as a supervised sequence labeling task, requiring manually annotated training corpora (Rei & Yannakoudakis 2016, Rei 2017, Rei et al. 2017, Kasewa et al. 2018, Bell et al. 2019). Some recent work has also explored error correction and detection without training data, but relying on hand-curated lexicons of all possible word forms (Bryant & Briscoe 2018, Stahlberg et al. 2019). In this project, we will explore fully unsupervised error detection, using only unannotated corpora and methods that can also be applied to other languages where no error detection corpora are available.

One possible strategy is to construct a neural error detection model, provide it with various information learned from plain text corpora, and train it to be a discriminative error detector using synthetic data. Several components and extensions can be investigated:

Pre-trained contextual word representations (BERT, ELMo, Flair, etc).
Language models (GPT-2, LSTM-LM, Kneser-Ney, etc).
Different methods for constructing synthetic data.
Word and phrase occurrence statistics in different corpora.
Features from other existing tools (POS taggers, parsers).

References

Compositional sequence labeling models for error detection in learner writing.
Marek Rei and Helen Yannakoudakis. 2016.

Semi-supervised multitask learning for sequence labeling.
Marek Rei. 2017.

Artificial error generation with machine translation and syntactic patterns.
Marek Rei, Mariano Felice, Zheng Yuan and Ted Briscoe. 2017.

Wronging a right: Generating better errors to improve grammatical error detection.
Sudhanshu Kasewa, Pontus Stenetorp, Sebastian Riedel. 2018.

Context is key: Grammatical error detection with contextual word representations.
Samuel Bell, Helen Yannakoudakis and Marek Rei. 2019.

Language model based grammatical error correction without annotated training data.
Christopher Bryant and Ted Briscoe. 2018.

Neural grammatical error correction with finite state transducers.
Felix Stahlberg, Christopher Bryant, Bill Byrne. 2019.