Course pages 2016–17

Advanced Topics in Natural Language Processing

Organisation and Instructions

We will run all 4 topics and ask all students taking the module to rank all topics in order of preference. Please send your rankings to Ted Briscoe by noon on Friday 6th January 2017.

Each student will attend 4 topics and each topic will consist of 4 sessions. Each topic will typically consist of one preliminary lecture followed by 3 reading and discussion sessions, so that a typical topic can accommodate up to 6 students presenting a paper each, allowing at least 10 minutes general discussion per session. Each student will be required to write an essay or undertake a short project and write a project report on ONE of their chosen topics. The topic organiser will first mark these and help you formulate a project or essay. The module organiser will second mark the assessed work, which will consist of a maximum of 5000 words.

Topic List

Learning to Rank
Constructing and evaluating word embeddings
Applications of Neural Networks
Active Learning

Learning to Rank

Proposers: Ted Briscoe, Ronan Cummins

Description

Ranking items is an important aspect in many natural language and information retrieval tasks. Learning to rank is a relatively new field in the area of machine learning with broad applicability. Tasks for which supervised learning to rank methods have improved the state-of-the-art (over regression or multiclass classification methods) include document retrieval, statistical machine translation, automated essay grading, and collaborative filtering.

We will present a number of different ways of formulating learning to rank, including pointwise, pairwise, and listwise approaches, and how they differ from unsupervised methods. Students will learn the fundamentals of learning to rank and be able to identify problems where it can be applied. The first session will be a lecture describing the different approaches to ranking. The next three sessions will consist of presentations of the readings by students and discussion.

Constructing and evaluating word embeddings

Proposers:Marek Rei and Ekaterina Kochmar

Description

Representing words as low-dimensional vectors allows systems to take advantage of semantic similarities, generalise to unseen examples and improve pattern detection accuracy on nearly all NLP tasks. Advances in neural networks and representation learning have opened new and exciting ways of learning word embeddings with unique properties.

In this topic we will provide an introduction to a range of vector space models and cover the most influential research in neural embeddings from the past couple of years, including word similarity and semantic analogy tasks, word2vec models and task-specific representation learning. We will also discuss the most recent advances in the field including multilingual embeddings, multimodal vectors using image detection, and building character-based representations.

By the end of the course you will have learned to construct word representations using both traditional and various neural network models. You will learn about different properties of these models and how to choose an approach for a specific task. You will also get an overview of the most recent and notable advances in the field.

Resources & Datasets

Introductory slides

Word2vec

Word vectors pretrained on 100B words. More information on the word2vec homepage.

Vectors trained using 3 different methods (counting, word2vec and dependecy-relations) on the BNC

GloVe model and pre-trained vectors

Global context vectors

Multilingual vectors

Retrofitting word vectors to semantic lexicons

Tool for converting word2vec vectors between binary and plain-text formats.

t-SNE, a tool for visualising word embeddings in 2D.

Background Reading

Baroni et al. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vector

Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space

Mikolov et al. (2013). Linguistic Regularities in Continuous Space Word Representations

Levy et al. (2015) Improving Distributional Similarity with Lessons Learned from Word Embeddings

Socher et al. (2012). Semantic Compositionality through Recursive Matrix-Vector Spaces

Readings

Levy & Goldberg (2014, CoNLL best paper) Linguistic Regularities in Sparse and Explicit Word Representations (Slides)

Faruqui et al. (2015, best paper at NAACL). Retrofitting Word Vectors to Semantic Lexicons (Slides)

Moritz Hermann and Blunsom (2014, ACL). Multilingual Models for Compositional Distributed Semantics (Slides)

Jozefowicz et al. (2016, arXiv preprint) Exploring the Limits of Language Modeling (Slides)

Norouzi et al. (2014, ICLR) Zero-Shot Learning by Convex Combination of Semantic Embeddings (Slides)

Kiela and Bottou (2014, EMNLP) Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics (Slides)

Applications of Neural Networks

Proposers: Laura Rimell and Tamara Polajnar

Description

In recent years, deep learning approaches, or neural networks, have proven very effective on a variety of Natural Language Processing tasks. Neural networks are powerful models and require little feature engineering. This module will investigate applications of neural networks. Depending on the interests of the students, applications investigated may include parsing, machine translation, sentiment analysis, summarization, multimodal approaches to language, and/or other linguistic tasks. Emphasis will be placed on the Recurrent Neural Network (RNN), but other architectures may be touched on, including feed-forward, encoder-decoder, and/or convolutional networks.

At the end of this module students will have an understanding of neural network architectures, how they can be applied in NLP, and how neural networks are trained. Students choosing this topic for their project will implement a simple neural network using a state-of-the-art deep learning toolkit.

Resources

Introductory Slides

Background Reading:

Yoav Goldberg. 2015. A Primer on Neural Network Models for Natural Language Processing.

Readings:

Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of ICLR.(Slides)

Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell. 2016. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data. Proceedings of CVPR.(Slides)

Joel Legrand and Ronan Collobert. 2015. Joint RNN-Based Greedy Parsing and Word Composition. Proceedings of ICLR.(Slides)

Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang. 2016. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. Proceedings of CONLL.(Slides)

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of EMNLP.(Slides)

Wenduan Xu. 2016. LSTM Shift-Reduce CCG Parsing. Proceedings of ACL Short. Proceedings of EMNLP.(Slides)

Topic List

Active Learning

Proposer:Helen Yannakoudakis

Description

Active Learning is a subfield of machine learning where the system interacts with the user or database to actively query for annotations of the instances it deems most informative to learn from. An algorithm that is able to select the most informative training examples should reach higher accuracy faster and require less manually annotated training data. Thus, active learning can help speed up the learning process and reduce the costs of obtaining human input by keeping the annotation effort to a minimum. Active learning is typically compared to passive learning where the learner chooses instances randomly.

During these lectures, we will cover different strategies that can be used to identify the most informative training instances, including Uncertainty Sampling, Query-By-Committee, Expected Model Change, and Expected Error Reduction. We will also discuss stopping criteria for active learning and look into when to terminate the learning process. Finally, we will review a number of different applications of active learning to NLP.

This topic assumes the audience has a working knowledge of supervised learning and statistical methods.