Course pages 2016–17 (still under preparation!)
No. of lectures: 8
Suggested hours of supervisions: 2
Prerequisite courses: Mathematical Methods for CS (Part IB)
The course is aimed to characterise information retrieval in terms of the data, problems and concepts involved. It follows the text book “Introduction to Information Retrieval”, cf. below. The main formal retrieval models and evaluation methods are described, with an emphasis on indexing. Web search is also covered. Also outlined are several query operations.
- Introduction. (Chapters 1; 2.3) Key problems and concepts. Information need. Boolean Operators.
- Boolean Retrieval and Indexing. (Chapters 2.2; 2.4) and Implementation of Boolean Operators. Term manipulations; equivalence classes, stemming.
- Index representation and Tolerant Retrieval. (Chapter 3, 4.2-4.4). Index construction. Wildcards. Spelling Correction.
- The Vector Space Model. (Chapter 6). VSM and Term weighting.
- Language Models for Information Retrieval and Classification. (Chapters 12; 13). Query-likelihood, Smoothing. Naive Bayes Classification.
- Evaluation. (Chapter 8, p. 139-148). Test Collections. Relevance. Precision, Recall, MAP, 11pt interpolated average precision.
- Relevance Feedback and Query Expansion (Chapters 9, 11.3.4). Rocchio algorithm, Relevance models, Expansion Techniques.
- Link Analysis. (Chapter 21.1, 21.2). PageRank.
At the end of this course, students should be able to
- define the tasks of information retrieval, web search and classification, and the differences between them;
- understand the main concepts, challenges and strategies used in IR, in particular the retrieval models currently used.
- develop strategies suited for specific retrieval and classification situations, and recognise the limits of these strategies;
- understand (the reasons for) the evaluation strategies developed for the tasks covered.
* Manning, C.D., Raghavan, P. & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press. Available at http://nlp.stanford.edu/IR-book/.