Course pages 2020–21 (these pages are still being updated)

# Data Science: principles and practice

**Principal lecturers:** Dr Ekaterina Kochmar, Prof Ted Briscoe**Taken by:** MPhil ACS**Code:** M20**Hours:** 12 (plus 4 hours self-study)

No. of lectures and practical classes: 16

Prerequisite courses: NST Mathematics, Machine Learning and Real-World Data and Foundations of Data Science.

Capacity: 40-50

## Aims

The course will develop core areas of Data Science (eg. models for regression and classification) from several perspectives: conceptual formulation and properties, solution algorithms and their implementation, data visualization for exploratory data analysis and the effective presentation of modelling outputs. The lectures will be complemented by practical classes using Python, scikit-learn and TensorFlow.

## Lectures

**Introduction.**Motivation, applications, examples, common data formats (csv, json), loading data with Python, calculating statistics over a dataset with numpy, logistics and overview of the course.**Linear Regression.**Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.**Practical: Linear Regression****Classification, part I.**Classification, logistic regression, perceptron, multi-class classification, classification performance measures.**Practical: Classification I****Classification, part II.**An overview of other classification techniques (e.g., decision trees, SVMs) and more advanced techniques including ensemble-based models (boosting, bagging, exemplified with AdaBoost and Random Forests).**Practical: Classification II****Deep learning basics.**Neural networks, applications in the world, optimization, stochastic gradient descent, backpropagation, learning rates**Deep learning with TensorFlow.**Introduction to TensorFlow, minimal TensorFlow example, symbolic graphs, training a network, practical tips for deep learning.**Practical: Deep learning with TensorFlow****Deep learning architectures.**Convolutional networks, RNNs, LSTMs, autoencoders, regularization.**Practical: Deep learning architectures****Visualization, part I.**Scales and coordinates, depicting comparisons.**Visualization, part II.**Common plotting patterns, including dimension reduction.**Practical: Visualization****Challenges in Data Science.**Summary of the course, ethics and privacy in data science, P-hacking, look-everywhere effect, bias in the training data, interpretability, information about the hand out test.

## Objectives

By the end of the course students should be able to:

- demonstrate understanding and practical skills in Data Science;
- be able to specify and work with an analytical model;
- be able to effectively implement Data Science algorithms;
- understand how data visualization underpins exploring datasets as well as communicating the findings of data science models.

## Recommended reading

Bishop, C.M. (2008). *Pattern Recognition and Machine Learning*. Springer.

MacKay, D.J. (2003). *Information Theory, Inference and Learning
Algorithms*. Cambridge University Press.

Python Basic Tutorial. Available online:
`https://www.tutorialspoint.com/python/index.htm`

Numpy: Quickstart Tutorial. Available online:
`https://docs.scipy.org/doc/numpy/user/quickstart.html`

Get Started with TensorFlow. Available online:
`https://www.tensorflow.org/tutorials/`

This course is borrowed from Part II of the Computer Science Tripos. This module is offered as background for some Lent Term ACS modules but cannot be taken for credit.