Course pages 2019–20

# Data Science: principles and practice

**Principal lecturers:** Prof Mateja Jamnik, Dr Ekaterina Kochmar, Dr Guy Emerson**Taken by:** MPhil ACS**Code:** M20**Hours:** 12 (plus 4 hours self-study)

No. of lectures and practical classes: 12

Prerequisite courses: NST Mathematics, Machine Learning and Real-World Data and Foundations of Data Science.

Capacity: 40-50

## Aims

The course will develop core areas of Data Science (eg. models for regression and classification) from several perspectives: conceptual formulation and properties, solution algorithms and their implementation, data visualization for exploratory data analysis and the effective presentation of modelling outputs. The lectures will be complemented by practical classes using Python, scikit-learn and TensorFlow.

## Lectures

**Introduction.**Motivation, applications, examples, loading common data formats, calculating statistics over a dataset, logistics and overview of the course.**Linear Regression.**Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.**Practical: Linear Regression****Classification.**Classification, perceptron, logistic regression, multi-class classification, regularisation, kernels, exploratory data visualisation.**Practical: Classification****Deep Learning, part I.**Training neural networks, applications, multilayer perceptrons, stochastic gradient descent, backpropagation.**Deep Learning, part II.**Advanced architectures, convnets, RNNs, introduction to TensorFlow.**Practical: Deep Learning****Visualization, part I.**Scales and coordinates, depicting comparisons.**Visualization, part II.**Common plotting patterns, including dimension reduction.**Practical: Visualization****Challenges in Data Science.**Summary of the course, overview of other relevant techniques, ethics and privacy issues, bias in the training data, information about the hand out test.

## Objectives

By the end of the course students should be able to:

- demonstrate understanding and practical skills in Data Science;
- be able to specify and work with an analytical model;
- be able to effectively implement Data Science algorithms;
- understand how data visualization underpins exploring datasets as well as communicating the findings of data science models.

## Recommended reading

Bishop, C.M. (2008). *Pattern Recognition and Machine Learning*. Springer.

MacKay, D.J. (2003). *Information Theory, Inference and Learning
Algorithms*. Cambridge University Press.

Python Basic Tutorial. Available online:
`https://www.tutorialspoint.com/python/index.htm`

Numpy: Quickstart Tutorial. Available online:
`https://docs.scipy.org/doc/numpy/user/quickstart.html`

Get Started with TensorFlow. Available online:
`https://www.tensorflow.org/tutorials/`

This course is borrowed from Part II of the Computer Science Tripos. This module is offered as background for some Lent Term ACS modules but cannot be taken for credit.