Computer Laboratory

Course pages 2015–16

Advanced Topics in Natural Language Processing

Organisation and Instructions

We will run the N most popular topics (with a minimum of M students) and ask all students taking the module to rank all topics in order of preference. Please send your rankings to Ted Briscoe by noon on Friday 8th January 2016.

Each student will attend 4 topics and each topic will consist of 4 sessions. Each topic will typically consist of one preliminary lecture followed by 3 reading and discussion sessions, so that a typical topic can accommodate 6 students presenting a paper each, allowing at least 10 minutes general discussion per session. Each student will be required to write an essay or undertake a short project and write a project report on ONE of their chosen topics. The topic organiser will first mark these and help you formulate a project or essay. The module organisers will second mark the assessed work, which will consist of a maximum of 5000 words.

Learning to Rank

  • Proposers: Ted Briscoe, Ronan Cummins


    Ranking items is an important aspect in many natural language and information retrieval tasks. Learning to rank is a relatively new field in the area of machine learning with broad applicability. Tasks for which supervised learning to rank methods have improved the state-of-the-art (over regression or multiclass classification methods) include document retrieval, statistical machine translation, automated essay grading, and collaborative filtering.

    We will present a number of different ways of formulating learning to rank, including pointwise, pairwise, and listwise approaches, and how they differ from unsupervised methods. Students will learn the fundamentals of learning to rank and be able to identify problems where it can be applied. The first session will be a lecture describing the different approaches to ranking. The next three sessions will consist of presentations of the readings by students and discussion.

    Resources & Datasets

    Introductory Slides

    Movies, books, food, Scholarly paper recommendation

    Information retrieval

    Information retrieval

    Essay scoring

    Short answer scoring

    MT quality estimation

    Background Reading:

    Liu, Tie-Yan, Learning to rank for information retrieval, Springer, 2011

    Yannakoudakis et al, A New Dataset and Method for Automatically Grading ESOL Texts, ACL 2011

    SOLAR: Scalable Online Learning Algorithms for Ranking


    Mark Hopkins and Jonathan May, Tuning as Ranking (SMT) (Slides)

    Thorsten Joachims, Optimizing search engines using clickthrough data (Slides)

    György Szarva set a, Learning to Rank Lexical Substitutions, (Slides)

    Xia et al, Listwise Approach to Learning to Rank - Theory and Algorithm (Slides)

    Zhai, Chengxiang, and John Lafferty, A study of smoothing methods for language models applied to information retrieval, TOIS

    Reidel et al, Constraint-Driven Rank-Based Learning for Information Extraction (Slides)

  • Topic List

    Integrating Distributional and Compositional Semantics

  • Proposer: Stephen Clark


    A combination of compositional and distributional representations has many potential advantages for computational semantics. From the distributed side: robustness, learnability from data, ease of handling ambiguity, and the ability to represent gradations of meaning. From the compositional side: the ability to handle the unbounded nature of natural language, and the existence of established accounts of semantic phenomena such as logical words, quantification and inference. The development of such a combination has many challenges.

    There are essentially three approaches in the current literature to combining distributional word representations to form distributional representations for phrases and sentences. The first, simple approach is to combine vectors using a pointwise operator such as addition or pointwise multiplication. This has the immediate disadvantage of being insensitive to word order, since both these operators are commutative; however, these operators provide competitive baselines on a number of standard similarity tasks. The second approach is to use a recursive neural network (RNN), which combines input vectors using a matrix and non-linearity. The third approach is to treat the distributional meanings of some words as (multilinear) functions, i.e. tensors, and combine them using tensor contraction.

    Here we will focus on the first and third approaches, with some representative papers listed below.


    Introductory Slides

    Background Reading:

    Turney, P.D., and Pantel, P. (2010), From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research (JAIR), 37, 141-188

    Vector Space Models of Lexical Meaning. Stephen Clark. Handbook of Contemporary Semantics, second edition, edited by Shalom Lappin and Chris Fox. Chapter 16. Wiley-Blackwell, 2015

    Combining Symbolic and Distributional Models of Meaning. Stephen Clark and Stephen Pulman. Proceedings of the AAAI Spring Symposium on Quantum Interaction, pp.52-55, Stanford, CA, 2007


    Jeff Mitchell and Mirella Lapata. 2008. Vector-based Models of Semantic Composition. In Proceedings of ACL-08: HLT, 236--244. Columbus, Ohio

    Low-Rank Tensors for Verbs in Compositional Distributional Semantics. Daniel Fried, Tamara Polajnar and Stephen Clark. Proceedings of the Short Papers of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), Beijing, China, 2015

    Prior Disambiguation of Word Tensors for Constructing Sentence Vectors. Dimitri Kartsaklis and Mehrnoosh Sadrzadeh. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP). Seattle, USA. October, 2013

    M. Baroni and R. Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), 1183-1193

    Mathematical Foundations for a Compositional Distributional Model of Meaning Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark. Linguistic Analysis, 36(1-4): A Festschrift for Joachim Lambek, pp. 345-384, van Bentham and Moortgat (eds), 2011 [primarily for those familiar with category theory]

  • Topic List

    Computational Creativity

  • Proposers:Stephen Clark, Mark Granroth-Wilding


    Computational Creativity (CC) is a young subfield of AI that investigates the use of computational models of human creative processes both as concrete cognitive models of human creativity and as practical tools. In this respect, it has much in common with Computational Linguistics, and NLP models and systems have a crucial role to play in building creative systems. Thinking of creativity in the context of computational systems raises a lot of philosophical questions -- for example, what does "creativity" mean for an autonomous system? However, putting these questions aside, we can address a wide variety of interesting theoretical and practical questions by using and building on existing technologies (in NLP and other AI fields) to build systems to tackle creative tasks, or to play some role in creative processes.

    There is a particularly close connection to the field of computational semantics. Recent advances in distributional semantics, for instance, provide powerful techniques to represent and manipulate concepts in potentially creative ways. It is an open question to what extent the same types of semantic representations that have proved useful in, for example, language modelling and question answering can be used to perform the reasoning required to produce meaningful and valuable creative ideas.

    This topic will cover a general introduction to CC and focus specifically on areas of research that are related to NLP. Subjects will include metaphor analysis, idea generation, narrative generation and creative natural language generation.


    Background Reading:

    Ekaterina Shutova, Simone Teufel and Anna Korhonen (2012), Statistical Metaphor Processing. Computational Linguistics, 39(2)

    Tony Veale, Yanfen Hao (2013), Talking Points in Linguistic Creativity. Creativity and the Agile Mind: A Multi-Disciplinary Study of a Multi-Faceted Phenomenon. Walter de Gruyter


    Tony Veale (2014), A Service-Oriented Architecture for Metaphor Processing. ACL

    S. Colton and G. A. Wiggins (2012), Computational Creativity: The Final Frontier. In proc. 20th European Conference on Artificial Intelligence

    Leon C, Gervas P (2014), Creativity in Story Generation From the Ground Up: Non-deterministic Simulation driven by Narrative. In proc. 5th International Conference on Computational Creativity, ICCC 2014

    Leon C, Gervas P (2014), Reading and Writing as a Creative Cycle: The Need for a Computational Model. In proc. 5th International Conference on Computational Creativity, ICCC 2014

    M. T. Llano, R. Hepworth, S. Colton, J. Gow, J. Charnley, N. Lavrac, M. Znidarsic, M. Perovsek, M. Granroth-Wilding and S. Clark (2014), Baseline Methods for Automated Fictional Ideation. In proc. Fifth International Conference on Computational Creativity

  • Topic List

    Kernels and Kernel Methods

  • Proposer: Tamara Polajnar


    Kernels are an integral component of several machine learning approaches, including Support Vector Machines and Gaussian Processes. Kernels are matrices that have particular properties and can be designed and derived for different applications. As such they offer a flexible way of integrating data of various types into a classification or regression algorithm. This module will provide an introduction to kernels and the mathematical rules for kernel construction as well as an overview of some of the most popular kernel-based machine learning methods.


    Background Reading:

    Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press.

    Mehmet Gunen and Ethem Alpayduin. 2011. Multiple Kernel Learning Algorithms. J. Mach. Learn. Res. 12 (July 2011), 2211-2268

    T. Joachims. Making Large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning, MIT Press, 1999.


    Heiko Hoffmann. 2007. Kernel PCA for novelty detection, Pattern Recognition, Volume 40, Issue 3, March 2007, Pages 863-874

    Hancheol Park, Gahgene Gweon, Ho-Jin Choi, Jeong Heo and Pum-Mo Ryu. 2014. Sentential Paraphrase Generation for Agglutinative Languages Using SVM with a String Kernel. The 28th Pacific Asia Conference on Language, Information and Computing

    Zanzotto, F. M. & Dell'Arciprete, L. 2012. Distributed Tree Kernels, Proceedings of the 29th International Conference on Machine Learning (ICML-12)

    Daniel Preoiuc-Pietro and Trevor Cohn. 2013. A temporal model of text periodicities using Gaussian Processes. EMNLP 2013

    Diarmuid O'Seaghdha and Ann Copestake. 2013. Interpreting compound nouns with kernel methods. Journal of Natural Language Engineering 19 (3): 331--356

    Daniel Beck, Trevor Cohn, Christian Hardmeier, Lucia Specia. 2015. Learning Structural Kernels for Natural Language Processing. EMNLP 2015.

  • Topic List

    Constructing and evaluating word embeddings

  • Proposers:Marek Rei and Ekaterina Kochmar


    Representing words as low-dimensional vectors allows systems to take advantage of semantic similarities, generalise to unseen examples and improve pattern detection accuracy on nearly all NLP tasks. Advances in neural networks and representation learning have opened new and exciting ways of learning word embeddings with unique properties.

    In this topic we will provide an introduction to the classical vector space models and cover the most influential research in neural embeddings from the past couple of years, including word similarity and semantic analogy tasks, word2vec models and task-specific representation learning. We will also discuss the most recent advances in the field including multilingual embeddings and multimodal vectors using image detection.

    By the end of the course you will have learned to construct word representations using both traditional and various neural network models. You will learn about different properties of these models and how to choose an approach for a specific task. You will also get an overview of the most recent and notable advances in the field.

    Resources & Datasets

    Introductory slides

    Last lecture slides


    Word similarity evaluation tool and datasets

    Word vectors pretrained on 100B words. More information on the word2vec homepage.

    Vectors trained using 3 different methods (counting, word2vec and dependecy-relations) on the BNC

    GloVe model and pre-trained vectors

    Global context vectors

    Multilingual vectors

    Retrofitting word vectors to semantic lexicons

    Tool for converting word2vec vectors between binary and plain-text formats.

    t-SNE, a tool for visualising word embeddings in 2D.

    Background Reading

    Baroni et al. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vector

    Mikolov et al. (2013). Efficient Estimation of Word Representations in Vector Space

    Mikolov et al. (2013). Linguistic Regularities in Continuous Space Word Representations

    Levy et al. (2015) Improving Distributional Similarity with Lessons Learned from Word Embeddings


    Socher et al. (2012). Semantic Compositionality through Recursive Matrix-Vector Spaces (Slides)

    Levy & Goldberg (2014, CoNLL best paper) Linguistic Regularities in Sparse and Explicit Word Representations (Slides)

    Moritz Hermann and Blunsom (2014, ACL). Multilingual Models for Compositional Distributed Semantics (Slides)

    Faruqui et al. (2015, best paper at NAACL). Retrofitting Word Vectors to Semantic Lexicons

    Norouzi et al (2014, ICLR) Zero-Shot Learning by Convex Combination of Semantic Embeddings (Slides)

    Applications of Neural Networks

  • Proposer: Laura Rimell


    In recent years, deep learning approaches, or neural networks, have proven very effective on a variety of Natural Language Processing tasks. Neural networks are powerful models and require little feature engineering. This module will investigate applications of neural networks, potentially including parsing, supertagging, machine translation, sentiment analysis, and a glimpse at computer vision. There will be a special focus on Recursive Neural Networks (RNN), which are appropriate for many of the tasks listed here, but other neural network architectures will be touched on as well, including simple feed-forward/recurrent networks, and encoder-decoder networks. At the end of this module students will have an understanding of these neural network architectures. Students will learn how to train them using back-propagation (through time) and how to avoid overfitting using dropout. In the end, students should be able to implement their own version of an RNN for sequence labelling or other tasks.


    Introductory Slides

    Background Reading:

    Yoav Goldberg. 2015. A Primer on Neural Network Models for Natural Language Processing.

    Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. Proceedings of IEEE Spoken Language Technology Workshop.

    Mike Lewis and Mark Steedman. 2014. Improved CCG Parsing with Semi-supervised Supertagging. TACL.

    Karl Moritz Hermann and Phil Blunsom. 2014. Multilingual Distributed Representations without Word Alignment. Proceedings of ICLR.


    Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of ICLR.

    Joel Legrand and Ronan Collobert. 2015. Joint RNN-Based Greedy Parsing and Word Composition. Proceedings of ICLR.

    Richard Socher, Cliff Lin, Andrew Y. Ng, and Christopher D. Manning. 2011. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. Proceedings of ICML.

    Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of EMNLP.

    Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of ICML.

    Wenduan Xu, Michael Auli and Stephen Clark. 2015. CCG Supertagging with a Recurrent Neural Network. Proceedings of ACL Short.

  • Topic List

    Active Learning

  • Proposer:Helen Yannakoudakis


    Active Learning is a subfield of machine learning where the system interacts with the user or database to actively query for annotations of the instances it deems most informative to learn from. An algorithm that is able to select the most informative training examples should reach higher accuracy faster and require less manually annotated training data. Thus, active learning can help speed up the learning process and reduce the costs of obtaining human input by keeping the annotation effort to a minimum. Active learning is typically compared to passive learning where the learner chooses instances randomly.

    During these lectures, we will cover different strategies that can be used to identify the most informative training instances, including Uncertainty Sampling, Query-By-Committee, Expected Model Change, and Expected Error Reduction. We will also discuss stopping criteria for active learning and look into when to terminate the learning process. Finally, we will review a number of different applications of active learning to NLP.

    This topic assumes the audience has a working knowledge of supervised learning and statistical methods.


    Background Reading:

    Tong, Simon, & Koller, Daphne. (2002). Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2, 45--66.

    Settles, Burr. (2010). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin, Madison.


    Vlachos, Andreas. (2008). A stopping criterion for active learning. Computer Speech & Language, 22(3), 295--312.

    Settles, B., & Craven, M. (2008). An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1070--1079.

    Olsson, Fredrik. (2009). A literature survey of active machine learning in the context of natural language processing. SICS Technical Report T2009:06.

    Qian, Buyue, Li, Hongfei, Wang, Jun, Wang, Xiang, & Davidson, Ian. (2013). Active learning to rank using pairwise supervision. In Proceedings of SDM.

    Niraula, Nobal B., & Rus, Vasile. (2015). Judging the Quality of Automatically Generated Gap-fill Question using Active Learning. In Proceedings of the 10th Workshop on Innovative Use of NLP for Building Educational Applications, Association for Computational Linguistics, 196--206.

  • Topic List