Computer Laboratory

Technical reports

Error detection in content word combinations

Ekaterina Kochmar

May 2016, 170 pages

This technical report is based on a dissertation submitted December 2014 by the author for the degree of Doctor of Philosophy to the University of Cambridge, St. John’s College.

Abstract

This thesis addresses the task of error detection in the choice of content words focusing on adjective–noun and verb–object combinations. We show that error detection in content words is an under-explored area in research on learner language since (i) most previous approaches to error detection and correction have focused on other error types, and (ii) the approaches that have previously addressed errors in content words have not performed error detection proper. We show why this task is challenging for the existing algorithms and propose a novel approach to error detection in content words.

We note that since content words express meaning, an error detection algorithm should take the semantic properties of the words into account. We use a compositional distributional semantic framework in which we represent content words using their distributions in native English, while the meaning of the combinations is represented using models of compositional semantics. We present a number of measures that describe different properties of the modelled representations and can reliably distinguish between the representations of the correct and incorrect content word combinations. Finally, we cast the task of error detection as a binary classification problem and implement a machine learning classifier that uses the output of the semantic measures as features.

The results of our experiments confirm that an error detection algorithm that uses semantically motivated features achieves good accuracy and precision and outperforms the state-of-the-art approaches. We conclude that the features derived from the semantic representations encode important properties of the combinations that help distinguish the correct combinations from the incorrect ones.

The approach presented in this work can naturally be extended to other types of content word combinations. Future research should also investigate how the error correction component for content word combinations could be implemented.

Full text

PDF (3.8 MB)

BibTeX record

@TechReport{UCAM-CL-TR-886,
  author =	 {Kochmar, Ekaterina},
  title = 	 {{Error detection in content word combinations}},
  year = 	 2016,
  month = 	 may,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-886.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-886}
}