Course pages 2015–16
Overview of Natural Language ProcessingAssessment is by coursework as follows:
A practical is performed where a corpus of texts is given, and students write code that detects the sentiment of each text as positive or negative. Various natural language processing tools will be tested as to how they improve performance on the task.
To prepare yourselves for the task, please do the following two things before the first demonstrated session:
- Please read the following paper: Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of EMNLP 2002. Bo Pang et al. were the "inventors" of the movie review sentiment classification task, an d the above paper was one of the first papers on the topic. The first version of your sen timent classifier will do something similar to Bo Pang's system, so please read it, and as me questions about it in our first demonstrated practical.
- Familiarize yourself with the data in
/usr/groups/mphil/L90/data. There are 2000 movie reviews, split into
two directories, NEG and POS. These, unsurprisingly, are negative and
positive reviews; the data is balanced, so half of each.
Please read *some* of the texts (at your choice) to understand the difficulties of the task. How might one go about classifying the texts?
NEW: As preparation, please read the following instructions. The first part of the practical ("A Baseline System") will be demonstrated on Wednesday 18/11; the second part ("Extension system") will be demonstrated on Wednesday 2/12.
Simone Teufel, Nov. 2015