Department of Computer Science and Technology

Technical reports

Underspecified quantification

Aurelie Herbelot

February 2011, 163 pages

This technical report is based on a dissertation submitted 2010 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Trinity Hall.

DOI: 10.48456/tr-795

Abstract

Many noun phrases in text are ambiguously quantified: syntax doesn’t explicitly tell us whether they refer to a single entity or to several and, in main clauses, what portion of the set denoted by the subject Nbar actually takes part in the event expressed by the verb. For instance, when we utter the sentence ‘Cats are mammals’, it is only world knowledge that allows our hearer to infer that we mean ‘All cats are mammals’, and not ‘Some cats are mammals’. This ambiguity effect is interesting at several levels. Theoretically, it raises cognitive and linguistic questions. To what extent does syntax help humans resolve the ambiguity? What problem-solving skills come into play when syntax is insufficient for full resolution? How does ambiguous quantification relate to the phenomenon of genericity, as described by the linguistic literature? From an engineering point of view, the resolution of quantificational ambiguity is essential to the accuracy of some Natural Language Processing tasks.

We argue that the quantification ambiguity phenomenon can be described in terms of underspecification and propose a formalisation for what we call ‘underquantified’ subject noun phrases. Our formalisation is motivated by inference requirements and covers all cases of genericity.

Our approach is then empirically validated by human annotation experiments. We propose an annotation scheme that follows our theoretical claims with regard to underquantification. Our annotation results strengthen our claim that all noun phrases can be analysed in terms of quantification. The produced corpus allows us to derive a gold standard for quantification resolution experiments and is, as far as we are aware, the first attempt to analyse the distribution of null quantifiers in English.

We then create a baseline system for automatic quantification resolution, using syntax to provide discriminating features for our classification. We show that results are rather poor for certain classes and argue that some level of pragmatics is needed, in combination with syntax, to perform accurate resolution. We explore the use of memory-based learning as a way to approximate the problem-solving skills available to humans at the level of pragmatic understanding.

Full text

PDF (1.1 MB)

BibTeX record

@TechReport{UCAM-CL-TR-795,
  author =	 {Herbelot, Aurelie},
  title = 	 {{Underspecified quantification}},
  year = 	 2011,
  month = 	 feb,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-795.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-795},
  number = 	 {UCAM-CL-TR-795}
}