Department of Computer Science and Technology

Course pages 2019–20

Overview of Natural Language Processing

Assessment is by coursework as follows:

A practical is performed where a corpus of movie reviews is given, and students write code that detects the sentiment of each text as positive or negative. In the first task, students build a commonly used baseline approach using Naive Bayes. In the seccond task, students improve over the baseline Support Vector Machines and document embeddings, and perform an error analysis on the strengths and weaknesses of the approach.

Practical sessions: 23 October, 30 October and 13 November, 3-5pm, SW02.

Assessment is by two reports on the practical (on paper to Student admin):

  • First task report (20%, ticked, up to 1,000 words, excluding references) due on Friday 22 November 2019 at 12:00 noon.
  • Second task report (80%, 4,000 words, excluding references) due on Tuesday 14 January 2020 at 12:00 noon.

Your reports should include a word count and a pointer to your working code on the Mphil machines (your account).

Part 1 (First Practical Session)

Build a Naive Bayes Sentiment Classification System. Instructions are here.

Instruction Slides and "How to write a report" slides (same as last year).

  • Here are some slides from the (2018-19) course "MLRD" that explain overtraining and crossvalidation

    Part 2 (Second Practical Session)

    Upgrade to SVM Classification and use doc2vec representations instead of bag of words. Instructions are here. And the slides are here.

    Part 2 (Third Practical Session)

    A better significance test (Permutation test) and some tips on how to perform an Analysis of the embedding space. Slides are here.