Department of Computer Science and Technology

Course pages 2018–19

Data Science: principles and practice

Principal lecturers: Dr Marek Rei, Dr Ekaterina Kochmar, Dr Damon Wischik, Prof Ted Briscoe
Taken by: Part II CST 75%

No. of lectures and practical classes: 12
Prerequisite courses: NST Mathematics, Machine Learning and Real-World Data and Foundations of Data Science.
Capacity: 40-50

Aims

The course will develop core areas of Data Science (eg. models for regression and classification) from several perspectives: conceptual formulation and properties, solution algorithms and their implementation, data visualization for exploratory data analysis and the effective presentation of modelling outputs. The lectures will be complemented by practical classes using Python, scikit-learn and TensorFlow.

Lectures

  • Introduction. Motivation, applications, examples, loading common data formats, calculating statistics over a dataset, logistics and overview of the course.

  • Linear Regression. Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.

  • Practical: Linear Regression

  • Classification. Classification, perceptron, logistic regression, multi-class classification, regularisation, kernels, exploratory data visualisation.

  • Practical: Classification

  • Deep Learning, part I. Training neural networks, applications, multilayer perceptrons, stochastic gradient descent, backpropagation.

  • Deep Learning, part II. Advanced architectures, convnets, RNNs, introduction to TensorFlow.

  • Practical: Deep Learning

  • Visualization, part I. Scales and coordinates, depicting comparisons.

  • Visualization, part II. Common plotting patterns, including dimension reduction.

  • Practical: Visualization

  • Challenges in Data Science. Summary of the course, overview of other relevant techniques, ethics and privacy issues, bias in the training data, information about the hand out test.

Objectives

By the end of the course students should be able to:

  • demonstrate understanding and practical skills in Data Science;

  • be able to specify and work with an analytical model;

  • be able to effectively implement Data Science algorithms;

  • understand how data visualization underpins exploring datasets as well as communicating the findings of data science models.

Recommended reading

Bishop, C.M. (2008). Pattern Recognition and Machine Learning. Springer.
MacKay, D.J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.
Python Basic Tutorial. Available online: https://www.tutorialspoint.com/python/index.htm
Numpy: Quickstart Tutorial. Available online: https://docs.scipy.org/doc/numpy/user/quickstart.html
Get Started with TensorFlow. Available online: https://www.tensorflow.org/tutorials/