Computer Laboratory

Code and Paper Guidelines

There is an increasing desire within our community to make experimental data available once a publication has been accepted, so as to provide greater transparency and allow others to more easily repeat and build upon our work. In fact, some funders, such as EPSRC actually require it for all work that they fund, and have strict guidelines on how it should be made accessible. For research that is not subject to any particular conditions, it is generally good practice to make data publicly available, where possible, and I maintain a separate page for our group containing things we've released. In the context of our research, data is basically source code, be it a compiler pass, simulator module or standalone application.

This page is a basic attempt to set some norms within our group that should help all of us in sharing data and then packaging it up to push out to the public. Because so much data sharing is tied into the publication cycle, and many concepts are shared, I've also put in my guidelines for paper writing, which should also aid in collaboratively creating articles, especially in those last-minute rushes to finish right before the deadline! None of these are hard-and-fast rules, but sticking close to them should make life a little easier for us all.

Code

The key idea is to make it easy to know what code was used for a paper submission, then package it up once a paper is published, or we want to make a release, and ship it out. In addition, we want to enable sharing as much as possible between group members. Finally, and don't underestimate this, we want to avoid code loss. So, the following guidelines should help us all in this.

  1. Use a revision control system.

    I don't have a strong opinion on what that should be, but generally prefer git. However, use whatever you like. In particular, if you're writing patches for an existing project, it makes sense to use their system.

  2. If building on another project, make a note of the version you're using.

    This will save lots of time later when you want to tell others what to apply the patches on top of. Remember to update your note should you rebase your patches.

  3. Create a remote repository that you can push to for backups.

    This could be on a Lab or group machine (e.g. in local scratch space, bigdisc or the filer, see the system admin pages on disk space), or something external. If going outside the university, I'd prefer Bitbucket or GitHub since you can have private repositories for free with both for academic use. Also bear in mind the code you're writing and who's funding you before using an external service, so you don't inadvertently violate the terms of any agreements we have.

  4. Give me access.

    This is quite important since it allows me to pull in and run your code easily, which is useful if you want to show me something, or I want to use it for anything (e.g. generating results for a talk). If you put it on a Lab machine, tell me which one and the directory it's in. I am timothymjones on GitHub and tmjones on Bitbucket.

  5. Write clear commit messages.

    Although this should be self-explanatory and obvious, it's still worth saying here. Please write enough information in each message so that it's clear what the commit does. It doesn't have to be long, but taking the time to make it clear will make everyone's life easier later down the line when you want to find out when a feature was introduced, a bug fixed or why a particular component was added.

  6. Use tags for papers, or create separate branches.

    As soon as a publication has been written, tag the versions of the code used to perform experiments within each relevant repository. This is important so that we can identify the correct code for a paper, once it is published, so as to release that code alongside the paper. Also, it allows quick rollback to the version used in the paper if we need to add more information during the rebuttal phase or before the camera-ready version is completed. Otherwise it's too easy to add new features during the time it takes for a paper to be reviewed, then have to trawl back through the repository's history at a later date, trying to work out which commit was the one used for that particular paper.

Papers

Writing papers collaboratively is very similar to writing code collaboratively, and so a similar methodology should be employed. However, there are some specifics to make life easier for everyone involved, which I highlight here. You could also follow these when writing your PhD dissertation.

  1. Use Latex.

    Honestly, don't write papers in anything else.

  2. Use a revision control system, create a backup, give me access, write clear commit messages.

    See above for all of these.

  3. Put figure and graph sources into the repository.

    This is really useful to allow anybody to make a tweak to them. Everyone involved in the paper should be able to regenerate any of the content, so having graph source files and data, scripts required, and figure sources is very useful.

  4. Write one sentence per line.

    It might look messy at first, but it really helps to see what has changed between different revisions. With a whole paragraph on a single line, it's quite difficult to pick out the changes, especially if they don't change the number of characters. With less than a sentence, it's difficult to get context. Having only a whole sentence on each line solves both these issues and makes it easier to resolve conflicts, should they occur.

  5. Read Peter Robinson's excellent notes on writing.

    There's a lot of good information there that I won't repeat. Also, up front are some reference texts to go to for more information. Especially take note of the tense, voice, references and simple language bullet points.