Computer Laboratory

Technical reports

Automatic summarising and the CLASP system

Richard Tucker

January 2000, 190 pages

This technical report is based on a dissertation submitted 1999 by the author for the degree of Doctor of Philosophy to the University of Cambridge.

Abstract

This dissertation discusses summarisers and summarising in general, and presents CLASP, a new summarising system that uses a shallow semantic representation of the source text called a “predication cohesion graph”.

Nodes in the graph are “simple predications” corresponding to events, states and entities mentioned in the text; edges indicate related or similar nodes. Summary content is chosen by selecting some of these predications according to criteria of “importance”, “representativeness” and “cohesiveness”. These criteria are expressed as functions on the nodes of a weighted graph. Summary text is produced either by extracting whole sentences from the source text, or by generating short, indicative “summary phrases” from the selected predications.

CLASP uses linguistic processing but no domain knowledge, and therefore does not restrict the subject matter of the source text. It is intended to deal robustly with complex texts that it cannot analyse completely accurately or in full. Experiments in summarising stories from the Wall Street Journal suggest there may be a benefit in identifying important material in a semantic representation rather than a surface one, but that, despite the robustness of the source representation, inaccuracies in CLASP’s linguistic analysis can dramatically affect the readability of its summaries. I discuss ways in which this and other problems might be overcome.

Full text

PDF (0.9 MB)

BibTeX record

@TechReport{UCAM-CL-TR-484,
  author =	 {Tucker, Richard},
  title = 	 {{Automatic summarising and the CLASP system}},
  year = 	 2000,
  month = 	 jan,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-484.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-484}
}