Department of Computer Science and Technology

Technical reports

Minimally supervised dependency-based methods for natural language processing

Marek Rei

September 2013, 169 pages

This technical report is based on a dissertation submitted December 2012 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College.

DOI: 10.48456/tr-840

Abstract

This work investigates minimally-supervised methods for solving NLP tasks, without requiring explicit annotation or training data. Our motivation is to create systems that require substantially reduced effort from domain and/or NLP experts, compared to annotating a corresponding dataset, and also offer easier domain adaptation and better generalisation properties.

We apply these principles to four separate language processing tasks and analyse their performance compared to supervised alternatives. First, we investigate the task of detecting the scope of speculative language, and develop a system that applies manually-defined rules over dependency graphs. Next, we experiment with distributional similarity measures for detecting and generating hyponyms, and describe a new measure that achieves the highest performance on hyponym generation. We also extend the distributional hypothesis to larger structures and propose the task of detecting entailment relations between dependency graph fragments of various types and sizes. Our system achieves relatively high accuracy by combining distributional and lexical similarity scores. Finally, we describe a self-learning framework for improving the accuracy of an unlexicalised parser, by calculating relation probabilities using its own dependency output. The method requires only a large in-domain text corpus and can therefore be easily applied to different domains and genres.

While fully supervised approaches generally achieve the highest results, our experiments found minimally supervised methods to be remarkably competitive. By moving away from explicit supervision, we aim to better understand the underlying patterns in the data, and to create systems that are not tied to any specific domains, tasks or resources.

Full text

PDF (1.1 MB)

BibTeX record

@TechReport{UCAM-CL-TR-840,
  author =	 {Rei, Marek},
  title = 	 {{Minimally supervised dependency-based methods for natural
         	   language processing}},
  year = 	 2013,
  month = 	 sep,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-840.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-840},
  number = 	 {UCAM-CL-TR-840}
}