Computer Laboratory

Course pages 2014–15

Information Retrieval

Principal lecturer: Dr Simone Teufel
Taken by: Part II
Past exam questions
Information for supervisors (contact lecturer for access permission)

No. of lectures: 8
Suggested hours of supervisions: 2
Prerequisite courses: Mathematical Methods for CS (Part IB)

Aims

The course is aimed to characterise information retrieval in terms of the data, problems and concepts involved. IT follows the text book “Introduction to Information Retrieval”, cf. below. The main formal retrieval models and evaluation methods are described, with an emphasis on indexing. Web search is also covered. We also consider clustering as an application case of IR.

Lectures

  • Introduction. (Chapters 1; 2.3) Key problems and concepts. Information need. Boolean Operators.

  • Boolean Retrieval and Indexing. (Chapters 2.2; 2.4) and Implementation of Boolean Operators. Term manipulations; equivalence classes, stemming.

  • Spelling Correction and Tolerant Retrieval. (Chapter 3). Wildcards. Spelling Correction.

  • Index Construction and Compression(Chapters 4.2-4.4, 5). BSBI, SPIMI, Distributed indexing. Dictionary compression. Byte- and bit-level codes.

  • The Vector Space Model. (Chapter 6). VSM and Term weighting.

  • Evaluation. (Chapter 8, p. 139-148). Test Collections. Relevance. Precision, Recall, MAP, 11pt interpolated average precision.

  • Clustering. Chapters 16.1-16.4; 17.1-17.2). Proximity metrics, hierarchical vs. partitional clustering. Clustering algorithms. Evaluation metrics.

  • Link Analysis. (Chapter 21, excluding 21.2.3). PageRank; Hubs and Authorities.

Objectives

At the end of this course, students should be able to

  • define the tasks of information retrieval, web search and clustering, and the differences between them;

  • understand the main concepts, challenges and strategies used in IR, in particular the retrieval models currently used.

  • develop strategies suited for specific retrieval, clustering and classification situations, and recognise the limits of these strategies;

  • understand (the reasons for) the evaluation strategies developed for the tasks covered.

Recommended reading

* Manning, C.D., Raghavan, P. & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press. Available at http://nlp.stanford.edu/IR-book/.