# Machine Learning and Algorithms for Data Mining

**Principal lecturers:** Dr Mateja Jamnik, Dr Pietro Lio, Dr Thomas Sauerwald**Taken by:** MPhil ACS, Part III**Code:** L42**Hours:** 16**Prerequisites:** None, but recommend courses on the principles of data science, and familiarity with basic mathematics, artificial intelligence, algorithms, statistics beneficial.

## Aims

This module aims to introduce students to basic principles and some advanced methods of machine learning algorithms that are typically used for mining large data sets. In particular, we will look into algorithms typically used for analysing networks, fundamental principles of techniques such as decision trees and support vector machines, and finally, neural network architectures. The students will gain practical understanding through two practical lab sessions, and two coding and analysis exercises where they will implement and apply machine learning algorithms on particular large data sets.

## Syllabus

- Introduction (1 lecture)
- Support Vector Machines (3 lectures)
- Reasoning and Machine learning (1 lecture)
- SVM Practical Lab Session (1-2 hours)
- Spectral Graph Theory and Spectral Clustering (2 lectures)
- Randomised Algorithms and Random Walks (2 lectures)
- Decision Trees and Decision Support Systems (2 lectures)
- Neural Networks (2 lectures)
- Neural Nets Practical Lab Session (1-2 hours)

Note that some content may vary, and the number of lectures per topic is provisional.

## Objectives

On completion of this module, students should:

- understand the issues involved in dealing with large amount of data
- understand the principles of a number of machine learning algorithms
- be able to implement and apply different machine learning algorithms on large data sets
- know how to analyse large data sets
- be familiar with potential applications of different algorithms
- be able to critically analyse and evaluate a research area

## Coursework

Coursework will consist of two 1-2 hour practical lab sessions, plus two practical exercises.

First, students will study a recent research paper that focuses on one of the topics of the course, redo the analysis with potential personal modifications or additional tests, and comment on these. The report should report on the methodology, analysis and results carried out by the student, with explanations of deviations to the original analysis in the paper. The report should be at most 2500 words.

Second, students will carry out a project where they will be given a large data set (which may come from a range of different types of data sets) and will be asked to implement a particular machine learning algorithm (which will have been covered in the course), and then run an analysis on the provided data set using their implementation. The students will then write a 2500 word project report on their analysis of the data set resulting from applying their own implementation of the algorithm.

Further details regarding coursework and assessment can be found on the **Moodle** page *(Only available to Cambridge University staff and students)*

## Assessment

- 10% - Two ticked practical exercises (5% each)
- 45% - Project 1 - Reconstruction of research paper results and a written report on the analysis of at most 2500 words;
- 45% - Project 2 - Coding practical and written report on the practical of at most 2500 words.

## Recommended reading

Leskovec, J & Rajaraman, A. & Ullman, J (2014). *Mining of Massive Datasets*. The book is available online from here.

Bishop, C. (2007). *Pattern Recognition and Machine Learning*. More information supporting the book can be found
here.

James, G. & Witten, D. & Hastie, T. & Tibshirani, R. (2014). *An introduction to Statistical Learning: with Applications in R*. The book is available online from here.

Murphy, K.P. (2012). *Machine Learning: A Probabilistic Perspective*. MIT Press. More information supporting the book can be found here.

Mitzenmacher, M. and Upfal, E. (2005). Probability and Computing. Cambridge University Press. A PDF version of the book is available here