skip to primary navigationskip to content

Course pages 2020–21

Machine Learning for Language Processing

Andreas Vlachos's lectures

The notes and slides will be added incrementally.


Ted Briscoe's reading seminars


Please select 3 papers that you would like to present in order of preference by noon on Wednesday 14th October and email your selections to [Javascript required]. I will assign papers by 5pm that day. Do not do this if you are only planning to audit the course. Instead email me and let me know.

There will be 2 presentations per 50 minute session. Your presentations should be about 15 minutes allowing for a further 5 minutes for questions, and 10 minutes at the end of each session for general discussion. You should summarise the paper briefly (remember everyone will have read it), explicate any parts you found difficult or innovative, and critically evaluate the work described. For your evaluation you should consider questions like: To what extent have the stated aims of the research been achieved? To what extent is the work replicable given the information provided? In what way does the work advance the state of the art?, etc. You may prepare slides and share your screen via the data projector and Zoom. You should liaise with your co-presenter to decide the order in which to make presentations. You should have all slides for the session loaded onto a single laptop set up with Zoom and the data projector by the beginning of each session.

All students should read all the papers and attend all sessions prepared to discuss each paper after the presentations


You may undertake a small project using existing datasets and machine learning software, and then submit a project report. Your project should define an experiment (probably comparative) from which you are able to draw precise and definite conclusions. Do not be too ambitious and undertake an experiment so computationally intensive that you are unable to obtain results on the hardware available to you. Alternatively, you may write an essay on any aspect of the course content. Your essay topic should involve an in-depth critical evaluation of a specific machine learning technique and its application to language processing, or of a specific language processing task and machine learning techniques that have been applied to that task. Little credit will be given for summaries of papers. In both cases, your essay or report should not exceed 5000 words and will be due in around the end of the first week of Lent Term.

You should discuss and agree your essay topic or project with [Javascript required] by email after the division (during week 4) of the Michaelmas Term. Write a proposal of up to 500 words outlining the topic or project giving a preliminary reading list and indicating what resources (datasets, hardware, and toolkits / packages) you plan to use, if relevant.

Suitable small projects will need to make use of existing labelled datasets and existing machine learning tools that are distributed and documented, so that they can be completed in reasonable time. Some examples of text classificataion tasks and datasets are: spam filtering (lingspam, genspam), sentiment of movie reviews (sentiment polarity datasets, Pang, but not if you are taking L90), named entity recognition (conll shared task ner), hedge (scope) detection (conll shared task hedge scope), language identification (altw 2010 langid dataset), document topic classification (Reuters-21578), genre classification (genre collection repository), and many, many more. The best machine learning toolkit that won't require huge computing resources for your project is Scikit-Learn, but there are others such as SVMlight. A project might replicate a published experiment but try different feature types or a different model architecture, and describe the experiment and report results in a comparable manner to the relevant (short) paper.

An example of a possible title/topic for an essay on named entity recognition might be `To what extent do we need sequential models to achieve accurate NER?' This essay might critically examine the claim made by e.g. Ratinov and Roth that NE recognition and classification can be done accurately by conditioning only on the class label assigned to the previous word(s) (as well as other invariant observed features of the context) without (Viterbi) decoding to find the most likely path of label assignments. In doing this, it might review the NER task definition and consider how dealing adequately with conjoined or otherwise complex NEs (see Mazur and Dale, Handling Conjunctions in Named Entities) might affect their claims. It might also propose an experiment that would resolve the issue empirically and/or identify one that has been published that sheds some light on it.

To find out what datasets and computational resources are available locally, see NLIP group resources

Schedule and Reading List

  • Seminars will be mostly on Tuesdays starting 20th October

    Text Classification


  • 1) Nigam & McCallum, A comparison of event models for naive bayes text classification, 1998
  • 2) Wang and Manning Baselines and Bigrams: Simple, Good Sentiment and Topic Classification, ACL 2012
  • Text Classification (cont)


  • 3) Medlock, An adaptive, semi-structured language model approach to spam filtering on a new corpus, CEAS, 2006
  • 4) Joulin et al., Bag of Tricks for Efficient Text Classification, EACL, 2017
  • NER / NLI


  • 5) Ratinov & Roth, Design Challenges and Misconceptions in NER, CoNLL 2009
  • 6) Bowman et al., A large annotated corpus for learning natural language inference, EMNLP, 2015
  • Structured Prediction / NER

  • 7) Klein et al, Named Entity Recognition with Character-Level Models, CoNLL, 2003
  • 8) Renzato et al., Sequence Level Training With Recurrent Neural Networks, ICLR, 2016
  • Word Representations / NER


  • 9) Levy, Goldberg, Dagan Improving Distributional Similarity with Lessons Learned from Word Embeddings TACL, 2015,/li>
  • 10) Reimers, Gurevych, Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging, EMNLP, 2017
  • NMT -- GEC and Multimodal


  • 11) Yuan & Briscoe, Grammatical error correction using neural machine translation, NAACL, 2016
  • 12) Caglayan et al., Probing the Need for Visual Context in Multimodal Machine Translation, NAACL, 2019
  • NMT / Semantic Parsing


  • 13) Stahlberg, Byrne, On NMT Search Errors and Model Errors: Cat Got Your Tongue?, EMNLP, 2019
  • 14) Goodman et al., Noise reduction and targeted exploration in imitation learning for Abstract Meaning Representation parsing, ACL, 2016
  • Unsupervised Part-of-Speech Tagging / Language Models


  • 15) Plank, et al., Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss, ACL 2016
  • 16) Petroni et al., Language Models as Knowledge Bases?, EMNLP, 2019