Course pages 2012–13

Information Retrieval

Principal lecturer: Dr Simone Teufel
Taken by: Part II
Past exam questions
Information for supervisors (contact lecturer for access permission)

No. of lectures: 8
Suggested hours of supervisions: 2
Prerequisite courses: a basic encounter with Probability is assumed

Aims

The course is aimed to characterise information retrieval in terms of the data, problems and concepts involved. The main formal retrieval models and evaluation methods are described. Web search is also covered. The course then turns to problems and standard solutions in two related areas, clustering and text classification.

Lectures

Introduction. Key problems and concepts. Information need. Indexing model. Examples.
Retrieval models I. Boolean model. Stemming and other Term Manipulations.
Retrieval models II. Vector Space Model and Term Weighting.
Clustering. Proximity metrics, hierarchical vs. partitional clustering. Clustering algorithms. Evaluation metrics.
Retrieval models III. Advanced Models: Dimensional Reduction. Language Models. Relevance Feedback. Query Expansion.
Search engines and linkage algorithms. PageRank; Kleinberg’s Hubs and Authorities.
Evaluation Strategies. Test Collections. Precision, Recall, and more complex evaluation metrics.
Question Answering. Task Definition and Evaluation. Three Algorithms for Question Answering.

Objectives

At the end of this course, students should be able to

define the tasks of information retrieval, web search, clustering and text classification and differences between them;
understand the main concepts, challenges and strategies used in IR, in particular the retrieval models currently used.
develop strategies suited for specific retrieval, clustering and classification situations, and recognise the limits of these strategies;
understand (the reasons for) the evaluation strategies developed for these three areas.

Computer Laboratory

Information Retrieval

Aims

Lectures

Objectives

Recommended reading