Computer Laboratory

Technical reports

Automatic summarising: a review and discussion of the state of the art

Karen Spärck Jones

January 2007, 67 pages

Abstract

This paper reviews research on automatic summarising over the last decade. This period has seen a rapid growth of work in the area stimulated by technology and by several system evaluation programmes. The review makes use of several frameworks to organise the review, for summarising, for systems, for the task factors affecting summarising, and for evaluation design and practice.

The review considers the evaluation strategies that have been applied to summarising and the issues they raise, and the major summary evaluation programmes. It examines the input, purpose and output factors that have been investigated in summarising research in the last decade, and discusses the classes of strategy, both extractive and non-extractive, that have been explored, illustrating the range of systems that have been built. This analysis of strategies is amplified by accounts of specific exemplar systems.

The conclusions drawn from the review are that automatic summarisation research has made valuable progress in the last decade, with some practically useful approaches, better evaluation, and more understanding of the task. However as the review also makes clear, summarising systems are often poorly motivated in relation to the factors affecting summaries, and evaluation needs to be taken significantly further so as to engage with the purposes for which summaries are intended and the contexts in which they are used.

A reduced version of this report, entitled ‘Automatic summarising: the state of the art’ will appear in Information Processing and Management, 2007.

Full text

PDF (0.6 MB)

BibTeX record

@TechReport{UCAM-CL-TR-679,
  author =	 {Sp{\"a}rck Jones, Karen},
  title = 	 {{Automatic summarising: a review and discussion of the
         	   state of the art}},
  year = 	 2007,
  month = 	 jan,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-679.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-679}
}