Dr Andrew Caines

General advice for Part II projects

Much of the following applies to ACS projects too…

Objective / evidence-based

The focus of these projects is on software engineering, project management and report writing: it doesn’t need to be original research, in fact it’s better if you’re replicating some existing published system, because we expect it to work, and you shouldn’t run into the problem of being partway through the year with a non-functioning system.
Make sure you’re familiar with the department pages on Part II projects, noting in particular the important dates and distribution of marks for ‘professional practice and presentation’, ‘introduction and preparation’, ‘implementation’, ‘evaluation and conclusions’.
When creating your timeline, make sure you leave some buffer periods at the end of each term, so that you can absorb unexpected delays with the project and/or heavy workloads in your other modules; aim to have your core goals implemented by the time of the progress reports & presentations midway through Lent term, then all evaluation and any extensions by the end of Lent term; writing should then get underway in the Easter break, aiming for a full draft at least 2 weeks before the deadline.
You need to write about your software engineering practices, so make sure you’re doing sensible things that you can document and refer to relating to agile programming, version control, repository structure, code comments, etc.
In addition, it’s important to do unit testing in ways that help you and that you can show in your code repository (remember that you need to submit your code with your dissertation).
You might look into standards for structuring repositories, such as Cookiecutter
Keep a log / extensive notes of what you do as you go along, because when you write up months later, memories as to why you took the decisions you took, or how things worked exactly, might have faded.
This also relates to any challenges you face, any lessons you learn about managing a large project, and future work you think you should be done on the system you’ve developed (even if you won’t necessarily ever touch it again) – it’ll be good if you can write about all of these in your dissertation.
Measure timings for any data processing, whether it’s part-of-speech tagging, classifier training, inference or whatever – because you may need to look at efficiency issues at some point, and you don’t want to have to run everything again in order to get timing info.
Similarly, store the outputs of your models in systematic ways, so that you can refer to them later and know what they relate to (e.g. which version of your model, with what features, and any other important parameters).
I’ll need to verify that I’ve seen a working demo of your system, so please make sure this is possible towards the end of your project.
Start from the existing Overleaf template for Computer Science dissertations; I’d prefer not to require that you sign up for things, but registration for Overleaf is free, you get extra features by signing up with your Cam email address, and at the moment there’s nothing better in terms of LaTeX authoring and review (link share with editing rights, so that your supervisors can add comments; don’t worry, we won’t directly edit the text).
Make sure certain things are done properly and consistently: namely citations where needed (including for use of software libraries, if requested – usually stated in their documentation), defining key concepts on first use (from relatively NLP tasks such as tokenisation, to what a library is for, to what ‘attention’ is in a machine learning context, to give examples), spelling out acronyms on first use (e.g. natural language processing (NLP), machine translation (MT), etc).
Note that your dissertation may be put through Turnitin after submission: so be sure to write in your own words and not plagiarise (obviously) – in addition, don’t use a language model such as ChatGPT to generate text for your dissertation (but at the same time, I’d be interested in the legitimate ways you might be using it: e.g. synonym checking, system naming, code completion, etc).

Opinionated / personal preferences

Please share a full draft of your write up with me at least two weeks before the deadline, in order to give me time to read it (I’ll have other work and deadlines of my own), send you feedback, and so that you have time to make changes. Ideally you’d send me your drafted chapters as you go, so that it can be a gradual, iterative feedback process. If you give me less than two weeks, the feedback I can give you will be necessarily less than it could otherwise be.
Put your college crest on the cover page: they are colourful and varied (so it makes the cover page smart and interesting).
Write in the first person singular (‘I’, ‘my’) rather than plural (‘we’, ‘our’) to show ownership of your own work (in the academic literature the norm is to write in first person plural).
Avoid the word ‘interesting’ (e.g. “X is an interesting finding”): it’s either bland and redundant (what you’re working on is hopefully inherently interesting and that’s one reason why you’re working on it…), or it’s not redundant but it’s highly subjective (not everyone finds the same things interesting) – but in any case, avoid it!
In the Introduction, there’s no need to preview the content of the dissertation (“In Chapter 2 there is the Background. In Chapter 3…”) because all dissertations have the same structure. Use the space for something else such as a summary of what you do and how it went.
Don’t leave your evaluation metric as an after-thought only mentioned in the Evaluation chapter: make sure you present your chosen metric(s) in Preparation, define them and justify the choices (compare/contrast with alternatives); then explain how you calculate the metrics in Implementation, and present results in Evaluation.
When creating plots, as a minimum make sure they are colour-blind friendly and that any text is legible. For more of my opinions about plotting, please see my Research Skills presentation on Moodle slides available from here.
Make things really easy for the reader: explain concepts and terminology without assuming your reader is an NLP or ML person, do state the obvious so that your reader isn’t left to infer too much, return repeatedly to the big picture and overall aims of your project, do plenty of signposting and summarisation to remind and aid your reader what’s going on and why.
See also this paper about scientific writing in general: ‘Ten simple rules for structuring papers’

Also: see Marek Rei’s project advice for his students at Imperial College London (the majority of points are not Imperial-specific) – there are a load of great tips regarding machine learning and NLP experiments in particular. And Lucy Foulkes’ writing advice which is intended for research papers mainly, but still contains lots of useful tips.

Contact me: firstname.lastname @ cl.cam.ac.uk