Department of Computer Science and Technology

Course pages 2018–19

Natural Language Processing

Assessment is by coursework as follows:

  • Assignment 1, for 20% of the overall grade. Write a 500 word report of your experiment of SVM-based sentiment classification. Ticked, i.e., Pass/Fail.
  • Assignment 2, for 40% of the overall grade. Write a 1000 word report of your experiment of Doc2Vec-based sentiment classification.
  • Assignment 3, for 40% of the overall grade. Write a 1000 word report of your design for a text understanding question answering system.

Deadlines:

  • Assignment 1: 14 November, 4pm
  • Assignment 2: 30 November, 4pm
  • Assignment 3: 30 November, 4pm

Instructions for Practical

  • Assignment 1: Instructions Part 1

    NOTE The assignment states "replicate Pang et al. (2002) as closely as possible". What was meant was: do so only for those interventions explicitly stated in the instructions. Pang et al. do many things you are not expected to do: for instance MaxEnt, negation treatment, experiments with excluding certain POS. Pang et al also don't do some things that I would like for you to do: namely stemming and trying a feature cutoff. Sorry if that wasn't clear. I have added this warning also into the instructions.

    Strictly speaking, experimenting with feature cutoffs (i.e., doing a systematic search) is methodologically questionable, as you don't have a separate validation corpus. There is a danger of overtraining. The only thing you are (kind of) allowed to do is to choose a feature cutoff once (e.g. 2 or 3 or 4) before the experiment and then run the experiment only once with this feature cutoff. If you compare that to the full feature set (no cutoff), most people would probably judge that was still OK. But we are moving into a grey area.

  • Assignment 2: Instructions Part 2

  • Assignment 3: Instructions Part 3

Data etc for Practical

Errata (NLP Practical)

  • Clarification added to instructions for Part 1 about "replication of Pang et al." meaning "replication of aspects that are mentioned in the instructions, not the entire paper".
  • Added as footnote to instructions for Part 1: is it methodologically OK to experiment with different feature frequency cutoffs?
  • Word limit for assignments changed to 500 (assignment 1) and 1000 words (assignments 2 and 3). Due to late written announcement, students will not be penalised for submitting reports of the old length of 300 words for assignment 1.