Course pages 2019–20
Machine Learning for Language Processing
The assessment is primarily via the project (95%), with the remaining 5% for attendance at lecture sessions, reading of assigned material, and satisfactory contribution during lectures.
The project consists of picking a task/dataset (suggestions below), implementing an approach, comparing against the literature, and writing a report. The assessment will be via the report, which should be 5000 words maximum length. The code, while not evaluated per se, should also be made available (e.g. on github) together with instructions on how one can reproduce the results mentioned in the paper. We encourage you to follow the format of a recent CL conference, e.g. ACL 2020, which sometimes offer their templates ready for editing online.
Your report should address the following questions:
- Introduction: What is the task and why is it important?
- Literature review (minimum 3 papers)
- Identify weaknesses / room for improvement
- Motivate your approach
- Detail your proposed approach
- Experiments:
- Proper experimental design
- Train/dev/test
- What is the error metric for your task?
- How did you choose your hyperparameters?
- Does your idea work as expected?
- Error analysis/Plot learning curves
- Conclusions: what have we learnt from your experiments that could inform future work
Datasets/tasks:
- Dependency Parsing or Morphological Tagging (http://universaldependencies.org/
- Morphological Inflection Generation (http://sigmorphon.github.io/sharedtasks/2019/task2/)
- Fact Checking against Wikipedia (http://fever.ai/)
- Natural Language Generation (http://www.macs.hw.ac.uk/InteractionLab/E2E/)
- Your choice! Please clear it with us. Need to ensure it is interesting and feasible within time/resource constraints
Deadline to submit: 14/1/2020, 4PM (moodle)