Computer Laboratory

Course pages 2011–12

Information Retrieval

Principal lecturer: Dr Simone Teufel
Taken by: Part II
Past exam questions
Lecture Notes (Part 1 and Part 2)
Information for supervisors (contact lecturer for access permission)

No. of lectures: 8
Prerequisite courses: a basic encounter with Probability is assumed


The course is aimed to characterise information retrieval in terms of the data, problems and concepts involved. The main formal retrieval models and evaluation methods are described. Web search is also covered. The course then turns to problems and standard solutions in two related areas, clustering and text classification.


  • Introduction. Key problems and concepts. Information need. Indexing model. Examples.

  • Retrieval models I. Boolean model. Stemming and other Term Manipulations.

  • Retrieval models II. Vector Space Model and Term Weighting.

  • Clustering. Proximity metrics, hierarchical vs. partitional clustering. Clustering algorithms. Evaluation metrics.

  • Retrieval models III. Advanced Models: Dimensional Reduction. Language Models. Relevance Feedback. Query Expansion.

  • Search engines and linkage algorithms. PageRank; Kleinberg’s Hubs and Authorities.

  • Evaluation Strategies. Test Collections. Precision, Recall, and more complex evaluation metrics.

  • Question Answering. Task Definition and Evaluation. Three Algorithms for Question Answering.


At the end of this course, students should be able to

  • define the tasks of information retrieval, web search, clustering and text classification and differences between them;

  • understand the main concepts, challenges and strategies used in IR, in particular the retrieval models currently used.

  • develop strategies suited for specific retrieval, clustering and classification situations, and recognise the limits of these strategies;

  • understand (the reasons for) the evaluation strategies developed for these three areas.

Recommended reading

* Manning, C.D., Raghavan, P. & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press. Available at