Computer Laboratory

Course pages 2016–17

Machine Learning and Algorithms for Data Mining

Principal lecturers: Dr Mateja Jamnik, Dr Pietro Lio', Dr Thomas Sauerwald
Taken by: MPhil ACS, Part III
Code: L42
Hours: 16
Prerequisites: None, but recommend L120 Principles of Data Science in Michaelmas, and familiarity with basic mathematics, artificial intelligence, algorithms, statistics beneficial.

Aims

This module aims to introduce students to basic principles and methods of machine learning algorithms that are typically used for mining large data sets. In particular, we will look into algorithms typically used for analysing networks, fundamental principles of techniques such as decision trees and support vector machines, and finally, neural network architectures. The students will gain practical understanding through 2 practicals and a coding exercise where they will implement and apply one machine learning algorithm on a particular large data set.

Syllabus

  • Machine learning chalenges in data mining (1-2 lectures)
  • Support Vector Machines (4 lectures)
    • regression
    • maximising margine
    • common kernel functions
    • implementation of kernels
    • non-parametric SVM-based clustering
    • multiclass SVM
  • Algorithms on Networks (4 lectures)
    • spectral graph theory and clustering
    • randomised algorithms and random walks
  • Practical (1-2 hours)
  • Decision Trees and Random Forests (2 lectures)
    • classification tree algorithms (e.g., survival trees, clustering trees, linear splits, class prior, binary splits)
    • data integration and calibration (e.g., rank quality of data, how it is used, check consistency)
    • decision support systems
    • multivariate parameter evidence synthesis
    • recommender systems
  • Neural Networks and Deep Learning (2 lectures)
    • basic principles of self-organisation and supervised learning
    • representation aspects of neural networks, neural circuits, neurons
    • learning and neural coding
    • convolution networks
    • technical aspects and implementation in deep learning
  • Practical (1-2 hours)

Note that some content may vary, and the number of lectures per topic is provisional; the final plan will depend on the students' background and the number of students taking the course.

Objectives

On completion of this module, students should:

  • understand the issues involved in dealing with large amount of data
  • understand the principles of a number of machine learning algorithms
  • be able to implement and apply different machine learning algorithms on large data sets
  • know how to analyse large data sets
  • be familiar with potential applications of different algorithms
  • be able to critically analyse and evaluate a research area

Coursework

Coursework will consist of two 1-2 hour practical lab sessions, plus two practical exercises.

First, students will study a recent research paper that focuses on one of the topics of the course, redo the analysis with potential personal modifications or additional tests, and comment on these. The report should report on the methodology, analysis and results carried out by the student, with explanations of deviations to the original analysis in the paper. The report should be at most 2500 words.

Second, students will carry out a project where they will be given a large data set (which may come from a range of different types of data sets) and will be asked to implement a particular machine learning algorithm (which will have been covered in the course), and then run an analysis on the provided data set using their implementation. The students will then write a 2500 word project report on their analysis of the data set resulting from applying their own implementation of the algorithm.

Assessment

  • Comment and redo the analysis of a recent research paper that focuses on one of the topics of the course: the report should consist of at most 2500 words (50% of the final mark);
  • Coding exercise on a dataset and written report on the practical of at most 2500 words (50% of the final mark).

Recommended reading

Leskovec, J & Rajaraman, A. & Ullman, J (2014). Mining of Massive Datasets. The book is available online from here.
Bishop, C. (2007). Pattern Recognition and Machine Learning. More information supporting the book can be found here.

Additional relevant material and research papers will be suggested during lectures.