Computer Laboratory

Course pages 2016–17

Advanced Topics in Natural Language Processing

Organisation and Instructions

We will run all 4 topics and ask all students taking the module to rank all topics in order of preference. Please send your rankings to Ted Briscoe by noon on Friday 6th January 2017.

Each student will attend 4 topics and each topic will consist of 4 sessions. Each topic will typically consist of one preliminary lecture followed by 3 reading and discussion sessions, so that a typical topic can accommodate up to 6 students presenting a paper each, allowing at least 10 minutes general discussion per session. Each student will be required to write an essay or undertake a short project and write a project report on ONE of their chosen topics. The topic organiser will first mark these and help you formulate a project or essay. The module organiser will second mark the assessed work, which will consist of a maximum of 5000 words.

Learning to Rank

  • Proposers: Ted Briscoe, Ronan Cummins


    Ranking items is an important aspect in many natural language and information retrieval tasks. Learning to rank is a relatively new field in the area of machine learning with broad applicability. Tasks for which supervised learning to rank methods have improved the state-of-the-art (over regression or multiclass classification methods) include document retrieval, statistical machine translation, automated essay grading, and collaborative filtering.

    We will present a number of different ways of formulating learning to rank, including pointwise, pairwise, and listwise approaches, and how they differ from unsupervised methods. Students will learn the fundamentals of learning to rank and be able to identify problems where it can be applied. The first session will be a lecture describing the different approaches to ranking. The next three sessions will consist of presentations of the readings by students and discussion.

    Resources & Datasets

    Introductory Slides

    Movies, books, food, Scholarly paper recommendation

    Information retrieval

    Information retrieval

    Essay scoring

    Short answer scoring

    MT quality estimation

    Background Reading:

    Liu, Tie-Yan, Learning to rank for information retrieval

    SOLAR: Scalable Online Learning Algorithms for Ranking


    Mark Hopkins and Jonathan May, Tuning as Ranking (SMT) (Slides)

    Thorsten Joachims, Optimizing search engines using clickthrough data(Slides)

    György Szarva et al, Learning to Rank Lexical Substitutions(Slides)

    Barlacchi et al. Learning to Rank Answer Candidates for Automatic Resolution of Crossword Puzzles(Slides)

    Yannakoudakis et al, A New Dataset and Method for Automatically Grading ESOL Texts(Slides)

    Reidel et al, Constraint-Driven Rank-Based Learning for Information Extraction(Slides)

  • Topic List

    Constructing and evaluating word embeddings

  • Proposers:Marek Rei and Ekaterina Kochmar


    Representing words as low-dimensional vectors allows systems to take advantage of semantic similarities, generalise to unseen examples and improve pattern detection accuracy on nearly all NLP tasks. Advances in neural networks and representation learning have opened new and exciting ways of learning word embeddings with unique properties.

    In this topic we will provide an introduction to a range of vector space models and cover the most influential research in neural embeddings from the past couple of years, including word similarity and semantic analogy tasks, word2vec models and task-specific representation learning. We will also discuss the most recent advances in the field including multilingual embeddings, multimodal vectors using image detection, and building character-based representations.

    By the end of the course you will have learned to construct word representations using both traditional and various neural network models. You will learn about different properties of these models and how to choose an approach for a specific task. You will also get an overview of the most recent and notable advances in the field.

    Resources & Datasets

    Introductory slides


    Word similarity evaluation tool and datasets

    Word vectors pretrained on 100B words. More information on the word2vec homepage.

    Vectors trained using 3 different methods (counting, word2vec and dependecy-relations) on the BNC

    GloVe model and pre-trained vectors

    Global context vectors

    Multilingual vectors

    Retrofitting word vectors to semantic lexicons

    Tool for converting word2vec vectors between binary and plain-text formats.

    t-SNE, a tool for visualising word embeddings in 2D.

    Background Reading

    Baroni et al. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vector

    Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space

    Mikolov et al. (2013). Linguistic Regularities in Continuous Space Word Representations

    Levy et al. (2015) Improving Distributional Similarity with Lessons Learned from Word Embeddings

    Socher et al. (2012). Semantic Compositionality through Recursive Matrix-Vector Spaces


    Levy & Goldberg (2014, CoNLL best paper) Linguistic Regularities in Sparse and Explicit Word Representations (Slides)

    Faruqui et al. (2015, best paper at NAACL). Retrofitting Word Vectors to Semantic Lexicons(Slides)

    Moritz Hermann and Blunsom (2014, ACL). Multilingual Models for Compositional Distributed Semantics(Slides)

    Jozefowicz et al. (2016, arXiv preprint) Exploring the Limits of Language Modeling(Slides)

    Norouzi et al. (2014, ICLR) Zero-Shot Learning by Convex Combination of Semantic Embeddings(Slides)

    Kiela and Bottou (2014, EMNLP) Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics(Slides)

    Applications of Neural Networks

  • Proposers: Laura Rimell and Tamara Polajnar


    In recent years, deep learning approaches, or neural networks, have proven very effective on a variety of Natural Language Processing tasks. Neural networks are powerful models and require little feature engineering. This module will investigate applications of neural networks. Depending on the interests of the students, applications investigated may include parsing, machine translation, sentiment analysis, summarization, multimodal approaches to language, and/or other linguistic tasks. Emphasis will be placed on the Recurrent Neural Network (RNN), but other architectures may be touched on, including feed-forward, encoder-decoder, and/or convolutional networks.

    At the end of this module students will have an understanding of neural network architectures, how they can be applied in NLP, and how neural networks are trained. Students choosing this topic for their project will implement a simple neural network using a state-of-the-art deep learning toolkit.


    Introductory Slides

    Background Reading:

    Yoav Goldberg. 2015. A Primer on Neural Network Models for Natural Language Processing.


    Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of ICLR.(Slides)

    Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell. 2016. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data. Proceedings of CVPR.(Slides)

    Joel Legrand and Ronan Collobert. 2015. Joint RNN-Based Greedy Parsing and Word Composition. Proceedings of ICLR.(Slides)

    Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang. 2016. Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond. Proceedings of CONLL.(Slides)

    Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of EMNLP.(Slides)

    Wenduan Xu. 2016. LSTM Shift-Reduce CCG Parsing. Proceedings of ACL Short. Proceedings of EMNLP.(Slides)

  • Topic List

    Active Learning

  • Proposer:Helen Yannakoudakis


    Active Learning is a subfield of machine learning where the system interacts with the user or database to actively query for annotations of the instances it deems most informative to learn from. An algorithm that is able to select the most informative training examples should reach higher accuracy faster and require less manually annotated training data. Thus, active learning can help speed up the learning process and reduce the costs of obtaining human input by keeping the annotation effort to a minimum. Active learning is typically compared to passive learning where the learner chooses instances randomly.

    During these lectures, we will cover different strategies that can be used to identify the most informative training instances, including Uncertainty Sampling, Query-By-Committee, Expected Model Change, and Expected Error Reduction. We will also discuss stopping criteria for active learning and look into when to terminate the learning process. Finally, we will review a number of different applications of active learning to NLP.

    This topic assumes the audience has a working knowledge of supervised learning and statistical methods.


    Introductory Slides

    Background Reading:

    Tong, Simon, & Koller, Daphne. (2002). Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2, 45--66.

    Settles, Burr. (2010). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin, Madison.


    Vlachos, Andreas. (2008). A stopping criterion for active learning. Computer Speech & Language, 22(3), 295--312.(Slides)

    Settles, B., & Craven, M. (2008). An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1070--1079.(Slides)

    Horbach & Palmer (2016), Investigating Active Learning for Short-Answer Scoring(Slides)

    Qian, Buyue, Li, Hongfei, Wang, Jun, Wang, Xiang, & Davidson, Ian. (2013). Active learning to rank using pairwise supervision. In Proceedings of SDM.(Slides)

    Niraula, Nobal B., & Rus, Vasile. (2015). Judging the Quality of Automatically Generated Gap-fill Question using Active Learning. In Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, 196--206.(Slides)

    Vlachos, Ghahramani, Briscoe (2010) Active learning for constrained Dirichlet process mixture models(Slides)

  • Topic List