Machine Translation (MT) approaches to Grammatical Error Correction (GEC) have attracted considerable attention in recent years as they have been shown to achieve state-of-the-art results (Felice et al., 2014; Junczys-Dowmunt and Grundkiewicz, 2016; Yannakoudakis et al., 2017). Given an ungrammatical input sentence, the task is formulated as "translating" it to its grammatical counterpart.
The aim of this project is to develop an adversarial neural MT training architecture for GEC in which the MT model is assisted by an adversary: as the adversary tries to differentiate between system and gold corrections, the neural MT system tries to produce corrections that "fool" the adversary. In contrast to previous approaches that aim at maximising the likelihood of the gold correction, this is now formulated as minimising the distinction between gold and system-generated corrections.
Wu et al., 2017. Adversarial Neural Machine Translation.
Yuan and Ted Briscoe., 2016. Grammatical error correction using neural machine translation.
Felice et al., 2014. Grammatical error correction using hybrid systems and type filtering.
The goal of this project is to develop a system that can predict whether a given sentence needs language editing to improve it or not. The most pertinent work in this area comes from the AESW shared task (Daudarvicius et al., 2016), in which participants were asked to predict either binary labels or probability confidence scores for sentences in scientific writing. The best system in this task achieves a precision of ~55% and recall of ~75% (Schmaltz et al., 2016).
This project differs from the shared task in two ways: 1) we want to build a model on non-native learner data rather than scientific texts, and 2) we want to prioritise precision over recall. In particular, if we have high confidence that a sentence does NOT need editing, this saves us from having to send it further downstream to token-based detection or correction systems (Yannakoudakis et al., 2017).
Aims of the project:
Daudarvicius et al., 2016. A Report on the Automatic Evaluation of Scientific Writing Shared Task.
Yannakoudakis et al., 2017. Neural Sequence-Labelling Models for Grammatical Error Correction.
Andersen et al., 2013. Developing and testing a self-assessment and tutoring system.
The vast majority of compositional distributional models build a single representation for all senses of a word, collapsing distinct senses together. Several researchers argue that terms with ambiguous senses can be handled by such models without any recourse to additional disambiguation steps, as long as contextual information is available. For instance, Baroni et al (2014) suggest that their models largely avoid problems handling polysemous adjectives because the adjective matrices implicitly incorporate contextual information. However, they do draw a distinction between two ways in which the meaning of a term can vary. Continuous polysemy — the subtle and continuous variations in meaning resulting from the different contexts in which a word appears, e.g. run to the store vs. run a marathon, — is relatively tractable, in their opinion. This contrasts with discrete homonymy — the association of a single term with completely independent meanings, e.g. river bank vs investment bank; and regular polysemy — systematic shifts in meaning due to metaphorical or metonymic use, e.g. bright light vs. bright student. The latter are more challenging to handle in compositional distributional models. The recent approaches of Kartsaklis and Sadrzadeh (2013) and Gutierrez et al. (2016) have shown that training sense-disambiguated functions for the cases of homonymy and regular polysemy resulted in improved model performance as compared to single-sense models. However, none of the approaches studied polysemy systematically and at a large scale, nor contrasted discrete and continuous polysemy in their models. This project will test to what extent the widely used single-sense compositional distributional models can handle a variety of senses, using WordNet as a gold standard for sense distinction. WordNet is the largest collection of word senses available to date and it captures both the fine-grained distinctions in continuous polysemy and the discrete homonymy and regular polysemy. After testing the single-sense models, the project will move on to building a novel compositional distributional model incorporating sense distinctions learned from WordNet and corpus data.
Marco Baroni, Raffaella Bernardi and Roberto Zamparelli. 2014. Frege in space: A program for compositional distributional semantics. In Linguistic Issues in Language Technology, special issue on Perspectives on Semantic Representations for Textual Inference. Volume 9, pp 241–346.
Marco Baroni and Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1183–1193. Association for Computational Linguistics.
Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, and Stephen Pulman. 2013b. Separating disambiguation from composition in distributional semantics. In Proceedings of the 2013 Conference on Computational Natural Language Learning, pages 114–123
Stephen Clark, Laura Rimell, Tamara Polajnar and Jean Maillard. The Categorial Framework for Compositional Distributional Semantics. Technical Report, University of Cambridge Computer Laboratory.
Dario Gutierrez, Ekaterina Shutova, Tyler Marghetis and Benjamin Bergen. 2016. Literal and Metaphorical Senses in Compositional Distributional Semantic Models. In Proceedings of ACL 2016, Berlin, Germany.
Bahdanau et al., 2015. Neural Machine Translation by Jointly Learning to Align and Translate.