Machine Learning for Language Processing

Principal lecturer: Dr Andreas Vlachos
Taken by: MPhil ACS, Part III
Code: L101
Term: Michaelmas
Hours: 16 (8 lectures + 8 seminar sessions)
Class limit: max. 16 students
Prerequisites: L90 Overview of Natural Language Processing (or similar) AND L95 Introduction to Natural Language Syntax and Parsing. These two modules may be taking concurrently with this module to meet the prerequisites
Moodle, timetable

Aims

This module aims to provide an introduction to machine learning (ML) with specific application to tasks in natural language processing (NLP). We will cover the concepts of classification, structured prediction and language modelling, with applications on sentiment analysis, named entity recognition, machine translation and information extraction. Methods will include both linear and non-linear models (aka deep learning), including (multilayer) perceptrons, logistic regression, recurrent neural networks, sequence2sequence and transformers.

Syllabus

Classification by machine learning: classification, types of classifier, generative vs. discriminative models, (un-/semi-)supervised training.

Structured prediction: sequence tagging, incremental language generation with recurrent neural networks.

Language modelling: self-supervised learning, conditional language modelling, embeddings

Objectives

On completion of this module, students should:

understand the issues involved in applying machine learning approaches to a range of language processing applications;
understand the optimization underlying a number of machine learning approaches that have been applied to language processing, including: Perceptron, Logistic Regression, and Multi-Layer Perceptron, recurrent neural networks, transformers
understand some applications and specific tasks including: document classification, named entity recognition, machine translation, natural language generation

Coursework

Students will be expected to undertake reading for assigned lectures and seminars. Each student will give a 20 minute presentation of one paper.

Assessment

Students will receive one tick worth 5% for attendance at seminar sessions, reading of assigned material, and satisfactory contribution during seminars.
Students will receive a second tick worth 5% for a satisfactory presentation of an assigned paper.
students will undertake a small project to be agreed with the lecturers and write a project report of not more than 5000 words. The report will be due around the beginning of the Lent Term (see academic calendar for precise date), will be assessed by the lecturers, and will account for 90% of the module marks.

Machine Learning for Language Processing

Aims

Syllabus

Objectives

Coursework

Assessment

Recommended reading

Study at Cambridge

About the University

Research at Cambridge