Course pages 2015–16
Advanced Topics in Natural Language Processing
Organisation and Instructions
We will run the N most popular topics (with a minimum of M students) and ask all students taking the module to rank all topics in order of preference. Please send your rankings to Ted Briscoe by noon on Friday 8th January 2016.
Each student will attend 4 topics and each topic will consist of 4 sessions. Each topic will typically consist of one preliminary lecture followed by 3 reading and discussion sessions, so that a typical topic can accommodate 6 students presenting a paper each, allowing at least 10 minutes general discussion per session. Each student will be required to write an essay or undertake a short project and write a project report on ONE of their chosen topics. The topic organiser will first mark these and help you formulate a project or essay. The module organisers will second mark the assessed work, which will consist of a maximum of 5000 words.
- Learning to Rank
- Integrating Distributional and Compositional Semantics
- Computational Creativity
- Kernels and Kernel Methods
- Constructing and evaluating word embeddings
- Applications of Neural Networks
- Active Learning
Topic List
Learning to Rank
Description
Ranking items is an important aspect in many natural language and information retrieval tasks. Learning to rank is a relatively new field in the area of machine learning with broad applicability. Tasks for which supervised learning to rank methods have improved the state-of-the-art (over regression or multiclass classification methods) include document retrieval, statistical machine translation, automated essay grading, and collaborative filtering.
We will present a number of different ways of formulating learning to rank, including pointwise, pairwise, and listwise approaches, and how they differ from unsupervised methods. Students will learn the fundamentals of learning to rank and be able to identify problems where it can be applied. The first session will be a lecture describing the different approaches to ranking. The next three sessions will consist of presentations of the readings by students and discussion.
Resources & Datasets
Movies, books, food, Scholarly paper recommendation
Background Reading:
Liu, Tie-Yan, Learning to rank for information retrieval, Springer, 2011
Yannakoudakis et al, A New Dataset and Method for Automatically Grading ESOL Texts, ACL 2011
SOLAR: Scalable Online Learning Algorithms for Ranking
Readings:
Mark Hopkins and Jonathan May, Tuning as Ranking (SMT) (Slides)
Thorsten Joachims, Optimizing search engines using clickthrough data (Slides)
György Szarva set a, Learning to Rank Lexical Substitutions, (Slides)
Xia et al, Listwise Approach to Learning to Rank - Theory and Algorithm (Slides)
Reidel et al, Constraint-Driven Rank-Based Learning for Information Extraction (Slides)
Integrating Distributional and Compositional Semantics
Description
A combination of compositional and distributional representations has many potential advantages for computational semantics. From the distributed side: robustness, learnability from data, ease of handling ambiguity, and the ability to represent gradations of meaning. From the compositional side: the ability to handle the unbounded nature of natural language, and the existence of established accounts of semantic phenomena such as logical words, quantification and inference. The development of such a combination has many challenges.
There are essentially three approaches in the current literature to combining distributional word representations to form distributional representations for phrases and sentences. The first, simple approach is to combine vectors using a pointwise operator such as addition or pointwise multiplication. This has the immediate disadvantage of being insensitive to word order, since both these operators are commutative; however, these operators provide competitive baselines on a number of standard similarity tasks. The second approach is to use a recursive neural network (RNN), which combines input vectors using a matrix and non-linearity. The third approach is to treat the distributional meanings of some words as (multilinear) functions, i.e. tensors, and combine them using tensor contraction.
Here we will focus on the first and third approaches, with some representative papers listed below.
Resources
Background Reading:
Readings:
Computational Creativity
Description
Computational Creativity (CC) is a young subfield of AI that investigates the use of computational models of human creative processes both as concrete cognitive models of human creativity and as practical tools. In this respect, it has much in common with Computational Linguistics, and NLP models and systems have a crucial role to play in building creative systems. Thinking of creativity in the context of computational systems raises a lot of philosophical questions -- for example, what does "creativity" mean for an autonomous system? However, putting these questions aside, we can address a wide variety of interesting theoretical and practical questions by using and building on existing technologies (in NLP and other AI fields) to build systems to tackle creative tasks, or to play some role in creative processes.
There is a particularly close connection to the field of computational semantics. Recent advances in distributional semantics, for instance, provide powerful techniques to represent and manipulate concepts in potentially creative ways. It is an open question to what extent the same types of semantic representations that have proved useful in, for example, language modelling and question answering can be used to perform the reasoning required to produce meaningful and valuable creative ideas.
This topic will cover a general introduction to CC and focus specifically on areas of research that are related to NLP. Subjects will include metaphor analysis, idea generation, narrative generation and creative natural language generation.
Resources
Background Reading:
Readings:
Tony Veale (2014), A Service-Oriented Architecture for Metaphor Processing. ACL
Kernels and Kernel Methods
Description
Kernels are an integral component of several machine learning approaches, including Support Vector Machines and Gaussian Processes. Kernels are matrices that have particular properties and can be designed and derived for different applications. As such they offer a flexible way of integrating data of various types into a classification or regression algorithm. This module will provide an introduction to kernels and the mathematical rules for kernel construction as well as an overview of some of the most popular kernel-based machine learning methods.
Resources
Background Reading:
Readings:
Constructing and evaluating word embeddings
Description
Representing words as low-dimensional vectors allows systems to take advantage of semantic similarities, generalise to unseen examples and improve pattern detection accuracy on nearly all NLP tasks. Advances in neural networks and representation learning have opened new and exciting ways of learning word embeddings with unique properties.
In this topic we will provide an introduction to the classical vector space models and cover the most influential research in neural embeddings from the past couple of years, including word similarity and semantic analogy tasks, word2vec models and task-specific representation learning. We will also discuss the most recent advances in the field including multilingual embeddings and multimodal vectors using image detection.
By the end of the course you will have learned to construct word representations using both traditional and various neural network models. You will learn about different properties of these models and how to choose an approach for a specific task. You will also get an overview of the most recent and notable advances in the field.
Resources & Datasets
Word similarity evaluation tool and datasets
Word vectors pretrained on 100B words. More information on the word2vec homepage.
Vectors trained using 3 different methods (counting, word2vec and dependecy-relations) on the BNC
GloVe model and pre-trained vectors
Retrofitting word vectors to semantic lexicons
Tool for converting word2vec vectors between binary and plain-text formats.
t-SNE, a tool for visualising word embeddings in 2D.
Background Reading
Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space
Mikolov et al. (2013). Linguistic Regularities in Continuous Space Word Representations
Levy et al. (2015) Improving Distributional Similarity with Lessons Learned from Word Embeddings
Readings
Socher et al. (2012). Semantic Compositionality through Recursive Matrix-Vector Spaces (Slides)
Levy & Goldberg (2014, CoNLL best paper) Linguistic Regularities in Sparse and Explicit Word Representations (Slides)
Moritz Hermann and Blunsom (2014, ACL). Multilingual Models for Compositional Distributed Semantics (Slides)
Faruqui et al. (2015, best paper at NAACL). Retrofitting Word Vectors to Semantic Lexicons
Norouzi et al (2014, ICLR) Zero-Shot Learning by Convex Combination of Semantic Embeddings (Slides)
Applications of Neural Networks
Description
In recent years, deep learning approaches, or neural networks, have proven very effective on a variety of Natural Language Processing tasks. Neural networks are powerful models and require little feature engineering. This module will investigate applications of neural networks, potentially including parsing, supertagging, machine translation, sentiment analysis, and a glimpse at computer vision. There will be a special focus on Recursive Neural Networks (RNN), which are appropriate for many of the tasks listed here, but other neural network architectures will be touched on as well, including simple feed-forward/recurrent networks, and encoder-decoder networks. At the end of this module students will have an understanding of these neural network architectures. Students will learn how to train them using back-propagation (through time) and how to avoid overfitting using dropout. In the end, students should be able to implement their own version of an RNN for sequence labelling or other tasks.
Resources
Background Reading:
Yoav Goldberg. 2015. A Primer on Neural Network Models for Natural Language Processing.
Mike Lewis and Mark Steedman. 2014. Improved CCG Parsing with Semi-supervised Supertagging. TACL.
Readings:
Active Learning
Description
Active Learning is a subfield of machine learning where the system interacts with the user or database to actively query for annotations of the instances it deems most informative to learn from. An algorithm that is able to select the most informative training examples should reach higher accuracy faster and require less manually annotated training data. Thus, active learning can help speed up the learning process and reduce the costs of obtaining human input by keeping the annotation effort to a minimum. Active learning is typically compared to passive learning where the learner chooses instances randomly.
During these lectures, we will cover different strategies that can be used to identify the most informative training instances, including Uncertainty Sampling, Query-By-Committee, Expected Model Change, and Expected Error Reduction. We will also discuss stopping criteria for active learning and look into when to terminate the learning process. Finally, we will review a number of different applications of active learning to NLP.
This topic assumes the audience has a working knowledge of supervised learning and statistical methods.