Computer Laboratory

Technical reports

Generic summaries for indexing in information retrieval – Detailed test results

Tetsuya Sakai, Karen Spärck Jones

May 2001, 29 pages

Abstract

This paper examines the use of generic summaries for indexing in information retrieval. Our main observations are that:

– With or without pseudo-relevance feedback, a summary index may be as effective as the corresponding fulltext index for precision-oriented search of highly relevant documents. But a reasonably sophisticated summarizer, using a compression ratio of 10–30%, is desirable for this purpose.

– In pseudo-relevance feedback, using a summary index at initial search and a fulltext index at final search is possibly effective for precision-oriented search, regardless of relevance levels. This strategy is significantly more effective than the one using the summary index only and probably more effective than using summaries as mere term selection filters. For this strategy, the summary quality is probably not a critical factor, and a compression ratio of 5–10% appears best.

Full text

PS (0.1 MB)

BibTeX record

@TechReport{UCAM-CL-TR-513,
  author =	 {Sakai, Tetsuya and Sp{\"a}rck Jones, Karen},
  title = 	 {{Generic summaries for indexing in information retrieval --
         	   Detailed test results}},
  year = 	 2001,
  month = 	 may,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-513.ps.gz},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-513}
}