Course pages 2016–17

Machine Learning and Algorithms for Data Mining

Principal lecturers: Dr Mateja Jamnik, Dr Pietro Lio', Dr Thomas Sauerwald
Taken by: MPhil ACS, Part III
Code: L42
Hours: 16
Prerequisites: None, but recommend L120 Principles of Data Science in Michaelmas, and familiarity with basic mathematics, artificial intelligence, algorithms, statistics beneficial.

Aims

This module aims to introduce students to basic principles and methods of machine learning algorithms that are typically used for mining large data sets. In particular, we will look into algorithms typically used for analysing networks, fundamental principles of techniques such as decision trees and support vector machines, and finally, neural network architectures. The students will gain practical understanding through 2 practicals and a coding exercise where they will implement and apply one machine learning algorithm on a particular large data set.

Syllabus

Machine learning chalenges in data mining (1-2 lectures)
Support Vector Machines (4 lectures)

regression
maximising margine
common kernel functions
implementation of kernels
non-parametric SVM-based clustering
multiclass SVM

Algorithms on Networks (4 lectures)

spectral graph theory and clustering
randomised algorithms and random walks

Practical (1-2 hours)
Decision Trees and Random Forests (2 lectures)

classification tree algorithms (e.g., survival trees, clustering trees, linear splits, class prior, binary splits)
data integration and calibration (e.g., rank quality of data, how it is used, check consistency)
decision support systems
multivariate parameter evidence synthesis
recommender systems

Neural Networks and Deep Learning (2 lectures)

basic principles of self-organisation and supervised learning
representation aspects of neural networks, neural circuits, neurons
learning and neural coding
convolution networks
technical aspects and implementation in deep learning

Practical (1-2 hours)

Note that some content may vary, and the number of lectures per topic is provisional; the final plan will depend on the students' background and the number of students taking the course.

Objectives

On completion of this module, students should:

understand the issues involved in dealing with large amount of data
understand the principles of a number of machine learning algorithms
be able to implement and apply different machine learning algorithms on large data sets
know how to analyse large data sets
be familiar with potential applications of different algorithms
be able to critically analyse and evaluate a research area

Coursework

Coursework will consist of two 1-2 hour practical lab sessions, plus two practical exercises.

First, students will study a recent research paper that focuses on one of the topics of the course, redo the analysis with potential personal modifications or additional tests, and comment on these. The report should report on the methodology, analysis and results carried out by the student, with explanations of deviations to the original analysis in the paper. The report should be at most 2500 words.

Second, students will carry out a project where they will be given a large data set (which may come from a range of different types of data sets) and will be asked to implement a particular machine learning algorithm (which will have been covered in the course), and then run an analysis on the provided data set using their implementation. The students will then write a 2500 word project report on their analysis of the data set resulting from applying their own implementation of the algorithm.

Assessment

Comment and redo the analysis of a recent research paper that focuses on one of the topics of the course: the report should consist of at most 2500 words (50% of the final mark);
Coding exercise on a dataset and written report on the practical of at most 2500 words (50% of the final mark).

Computer Laboratory