# Computer Laboratory

Course pages 2015–16

# Information Retrieval

Principal lecturer: Dr Ronan Cummins
Taken by: Part II
Past exam questions

No. of lectures: 8
Suggested hours of supervisions: 2
Prerequisite courses: Mathematical Methods for CS (Part IB)

## Aims

The course is aimed to characterise information retrieval in terms of the data, problems and concepts involved. It follows the text book “Introduction to Information Retrieval”, cf. below. The main formal retrieval models and evaluation methods are described, with an emphasis on indexing. Web search is also covered. We also consider clustering as an application case of IR.

## Lectures

• Introduction. (Chapters 1; 2.3) Key problems and concepts. Information need. Boolean Operators.

• Boolean Retrieval and Indexing. (Chapters 2.2; 2.4) and Implementation of Boolean Operators. Term manipulations; equivalence classes, stemming.

• Index representation and Tolerant Retrieval. (Chapter 3, 4.2-4.4). Index construction. Wildcards. Spelling Correction.

• The Vector Space Model. (Chapter 6). VSM and Term weighting.

• Language Models for Information Retrieval and Classification. (Chapters 12; 13). Query-likelihood, Smoothing. Naive Bayes Classification.

• Evaluation. (Chapter 8, p. 139-148). Test Collections. Relevance. Precision, Recall, MAP, 11pt interpolated average precision.

• Clustering. Chapters 16.1-16.4; 17.1-17.2). Proximity metrics, hierarchical vs. partitional clustering. Clustering algorithms.

• Link Analysis. (Chapter 21.1, 21.2). PageRank.

## Objectives

At the end of this course, students should be able to

• define the tasks of information retrieval, web search and clustering, and the differences between them;

• understand the main concepts, challenges and strategies used in IR, in particular the retrieval models currently used.

• develop strategies suited for specific retrieval, clustering and classification situations, and recognise the limits of these strategies;

• understand (the reasons for) the evaluation strategies developed for the tasks covered.