Data Science: principles and practice
Principal lecturers: Dr Ekaterina Kochmar, Prof Ted Briscoe
Taken by: Part II CST 75%
Hours: 16
Class limit: max. 50 students
Prerequisites: Data Science, Machine Learning and Real-world Data.
NST Mathematics
Aims
The course will develop core areas of Data Science (eg. models for regression and classification) from several perspectives: conceptual formulation and properties, solution algorithms and their implementation, data visualization for exploratory data analysis and the effective presentation of modelling outputs. The lectures will be complemented by practical classes using Python, scikit-learn and TensorFlow.
Lectures
- Introduction. Motivation, applications, examples, common data formats (csv, json), loading data with Python, calculating statistics over a dataset with numpy, logistics and overview of the course.
- Linear Regression. Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.
- Practical: Linear Regression
- Classification, part I. Classification, logistic regression, perceptron, multi-class classification, classification performance measures.
- Practical: Classification I
- Classification, part II. An overview of other classification techniques (e.g., decision trees, SVMs) and more advanced techniques including ensemble-based models (boosting, bagging, exemplified with AdaBoost and Random Forests).
- Practical: Classification II
- Deep learning basics. Neural networks, applications in the world, optimization, stochastic gradient descent, backpropagation, learning rates
- Deep learning with TensorFlow. Introduction to TensorFlow, minimal TensorFlow example, symbolic graphs, training a network, practical tips for deep learning.
- Practical: Deep learning with TensorFlow
- Deep learning architectures. Convolutional networks, RNNs, LSTMs, autoencoders, regularization.
- Practical: Deep learning architectures
- Visualization, part I. Scales and coordinates, depicting comparisons.
- Visualization, part II. Common plotting patterns, including dimension reduction.
- Practical: Visualization
- Challenges in Data Science. Summary of the course, ethics and privacy in data science, P-hacking, look-everywhere effect, bias in the training data, interpretability, information about the hand out test.
Objectives
By the end of the course students should be able to:
- demonstrate understanding and practical skills in Data Science;
- be able to specify and work with an analytical model;
- be able to effectively implement Data Science algorithms;
- understand how data visualization underpins exploring datasets as well as communicating the findings of data science models.
Recommended reading
Bishop, C.M. (2008). Pattern Recognition and Machine
Learning. Springer.
MacKay, D.J. (2003). Information Theory, Inference and
Learning Algorithms. Cambridge University Press.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep
Learning, MIT Press.
Python Basic Tutorial. Available online: https://www.tutorialspoint.com/python/index.htm
Numpy: Quickstart Tutorial. Available online: https://docs.scipy.org/doc/numpy/user/quickstart.html
Get Started with TensorFlow. Available online: https://www.tensorflow.org/tutorials/