Computer Laboratory

Technical reports

Automatic summarising of English texts

John Irving Tait

137 pages

This technical report is based on a dissertation submitted December 1982 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Wolfson College.

Abstract

This thesis describes a computer program called Scrabble which can summarise short English texts. It uses large bodies of predictions about the likely contents of texts about particular topics to identify the commonplace material in an input text. Pre-specified summary templates, each associated with a different topic are used to condense the commonplace material in the input. Filled-in summary templates are then used to form a framework into which unexpected material in the input may be fitted, allowing unexpected material to appear in output summary texts in an essentially unreduced form. The system’s summaries are in English.

The program is based on technology not dissimilar to a script applier. However, Scrabble represents a significant advance over previous script-based summarising systems. It is much less likely to produce misleading summaries of an input text than some previous systems and can operate with less information about the subject domain of the input than others.

These improvements are achieved by the use of three main novel ideas. First, the system incorporates a new method for identifying the idea or topics of an input text. Second, it allows a section of text to have more than one topic at a time, or at least a composite topic which may be dealt with by the computer program simultaneously applying the text predictions associated with more than one simple topic. Third, Scrabble incorporates new mechanisms for the incorporation of unexpected material in the input into its output summary texts. The incorporation of such material in the output summary is motivated by the view that it is precisely unexpected material which is likely to form the most salient matter in the input text.

The performance of the system is illustrated by means of a number of example input texts and their Scrabble summaries.

Full text

PDF (8.0 MB)

BibTeX record

@TechReport{UCAM-CL-TR-47,
  author =	 {Tait, John Irving},
  title = 	 {{Automatic summarising of English texts}},
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-47.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-47}
}