Course pages 2019–20

Machine Learning for Language Processing

The assessment is primarily via the project (95%), with the remaining 5% for attendance at lecture sessions, reading of assigned material, and satisfactory contribution during lectures.

The project consists of picking a task/dataset (suggestions below), implementing an approach, comparing against the literature, and writing a report. The assessment will be via the report, which should be 5000 words maximum length. The code, while not evaluated per se, should also be made available (e.g. on github) together with instructions on how one can reproduce the results mentioned in the paper. We encourage you to follow the format of a recent CL conference, e.g. ACL 2020, which sometimes offer their templates ready for editing online.

Your report should address the following questions:

Introduction: What is the task and why is it important?
Literature review (minimum 3 papers)
- Identify weaknesses / room for improvement
- Motivate your approach
Detail your proposed approach
Experiments:
- Proper experimental design
- Train/dev/test
- What is the error metric for your task?
- How did you choose your hyperparameters?
- Does your idea work as expected?
- Error analysis/Plot learning curves
Conclusions: what have we learnt from your experiments that could inform future work

Datasets/tasks:

Dependency Parsing or Morphological Tagging (http://universaldependencies.org/
Morphological Inflection Generation (http://sigmorphon.github.io/sharedtasks/2019/task2/)
Fact Checking against Wikipedia (http://fever.ai/)
Natural Language Generation (http://www.macs.hw.ac.uk/InteractionLab/E2E/)
Your choice! Please clear it with us. Need to ensure it is interesting and feasible within time/resource constraints

Deadline to submit: 14/1/2020, 4PM (moodle)

Department of Computer Science and Technology

Machine Learning for Language Processing