Computer Laboratory

Technical reports

Optimising the speed and accuracy of a Statistical GLR Parser

Rebecca F. Watson

March 2009, 145 pages

This technical report is based on a dissertation submitted September 2007 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Darwin College.

Abstract

The focus of this thesis is to develop techniques that optimise both the speed and accuracy of a unification-based statistical GLR parser. However, we can apply these methods within a broad range of parsing frameworks. We first aim to optimise the level of tag ambiguity resolved during parsing, given that we employ a front-end PoS tagger. This work provides the first broad comparison of tag models as we consider both tagging and parsing performance. A dynamic model achieves the best accuracy and provides a means to overcome the trade-off between tag error rates in single tag per word input and the increase in parse ambiguity over multipletag per word input. The second line of research describes a novel modification to the inside-outside algorithm, whereby multiple inside and outside probabilities are assigned for elements within the packed parse forest data structure. This algorithm enables us to compute a set of ‘weighted GRs’ directly from this structure. Our experiments demonstrate substantial increases in parser accuracy and throughput for weighted GR output.

Finally, we describe a novel confidence-based training framework, that can, in principle, be applied to any statistical parser whose output is defined in terms of its consistency with a given level and type of annotation. We demonstrate that a semisupervised variant of this framework outperforms both Expectation-Maximisation (when both are constrained by unlabelled partial-bracketing) and the extant (fully supervised) method. These novel training methods utilise data automatically extracted from existing corpora. Consequently, they require no manual effort on behalf of the grammar writer, facilitating grammar development.

Full text

PDF (1.4 MB)

BibTeX record

@TechReport{UCAM-CL-TR-743,
  author =	 {Watson, Rebecca F.},
  title = 	 {{Optimising the speed and accuracy of a Statistical GLR
         	   Parser}},
  year = 	 2009,
  month = 	 mar,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-743.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-743}
}