Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 14 Mar 2011

Writing a Part II dissertation

Not that long after I did Part II of the Tripos in Cambridge, I wrote a piece about how to do a successful Part II project, based on bitter experience of having done a less-than-completely successful one. Now it's a few years later, and I've supervised four Part II dissertations and proof-read a couple more (and now left Cambridge!). I thought I'd write down some hints about what makes a good dissertation. As before, the pink book is well-meaning but misses a lot of this.

I should also add that this advice is not prescriptive---it's a mixture of my personal preferences as a reader of dissertations, and a conservative underapproximation of “known ways to write a good dissertation” according to my experience. Not being part of the examination process myself, I can't say it's the only way, nor that it's sufficient, nor even that it's always correct---only that these recommendations seem to have some correlation with good marks.

Introduction

This chapter is for you to identify the problem space of the work, and what specifically your work seeks to achieve in that space. You should outline the technical background of the work here, but only at a high level---save details for next section. The emphasis should be on context: motivating use-cases, and covering a selection of differing alternative approaches to the problem. Once you've covered that, you can give a specific taster of the one you adopted---probably in the form of a simplified summary, perhaps a preview-style example or two (of the work you did and its supported use-cases) and an outline of the rest of the dissertation.

Preparation

This chapter should consist mostly of technical background, at a level deeper than the introduction. Think of it as mirroring the preparatory reading (and perhaps experimentation) you had to do before beginning the project work proper. You should describe the specification of the system you built, outline its overall design, and mention any changes to proposal. It's particularly worth emphasising anything hard that you had to learn in order to undertake the project---such as unfamiliar tools, languages, libraries, programming styles, algorithms, bits of mathematics, and so on.

For pitching your work, my rule of thumb is to assume your reader knows basic computer science (say, an outline understanding of the first two years' Tripos content) but is a bit slow to catch on. In other words, try to make the reasoning in your write-up extremely clear and easy to follow. This is not because the examiners are slow to catch on. Rather, it's because they are probably in a hurry, and in order to counterbalance the fact that you're almost certain to underestimate the difficulty of understanding your own write-up.

Avoid assuming that the reader knows any particular language, tool or detailed technique that you use. For example, if your work is based on implementing some particular programming language, you can assume the reader has basic degree-level knowledge about programming languages in general, but you should use this chapter to give an outline introduction to this specific language and its peculiarities, with references to additional material.

This chapter is somewhat infamous for requiring that you write some “boring” stuff about development practices: what tools you used, your testing strategy, how you took backups, what development practices you followed, and so on. I say “infamous” because we researchers, when we are proof-reading dissertations, find this all a bit incongruous---most of a dissertation reads a bit like a research write-up, whereas this stuff doesn't. Since we're not accustomed to seeing this kind of thing written down, it thuds a bit. However, it's the researcher's perspective that's at fault here. Most dissertation authors are about to go off into industry, and from the perspective of professional software development, including this small amount of practice- and process-oriented content is a very reasonable requirement.

Implementation

To a first approximation, this chapter should be describing the clever data structures and algorithms that appeared in your project. Think “meaty”---a really good Implementation chapter will come across as chock-full of interesting, nontrivial technical material, clearly described and comprehensible. It should also have reasoned discussion of how your techniques work and why they are well-suited to the problem.

When explaining how your system works, there is a question of what level of explanation is appropriate, and how much detail you should go into. I'd advise that you completely avoid descending to syntactic code-level details (e.g. names of particular classes and methods). There could be an exception if some class or method was so conceptually significant and pervasive that mentioning it by name is the clearest presentation, but this is relatively rare. Think of this chapter as providing descriptions sufficient for a reader to go away and reimplement your system algorithm-for-algorithm, but not identifier-for-identifier. Including a UML class diagram, or similar, is okay---particularly when it's showing something interesting!---but I'd advise against going any deeper than that.

Instead of code-level detail, explain your implementation using pseudocode, data-structure diagrams, flow-charts, time-sequence diagrams, and the like. A good project will have meat here that is outside the syllabus, or at least drawn from the more specialised end of Part II courses, so you should make these especially clear and assume your reader is unfamiliar.

You may well have used some third-party software as part of your system. You should explain how it fits in. Moreover, it's okay (and generally good) to include a summary of how it works, even though you didn't write it. This is to show that you could have implemented it yourself if you'd had to, but that there was no good reason to!

I would personally avoid talking about evaluation harnesses, methods or support coding here. That is best saved for the Evaluation chapter.

Look at existing technical writing, perhaps research papers in reading lists of your lecture courses, for more ideas about how to present technical content clearly and at the right level.

Evaluation

Couch your evaluation in terms of claims that your work supports, or goals that you claim your work has fulfilled. The chapter then becomes an exercise in substantiating these claims.

Explain your method of evaluation. This requires answering at least three questions. What data are you collecting? How will you be interpreting it? Finally, how are your claims decided by this interpretation of your data? That final question is the trickiest to get right---usually it involves an argument that the interpretation of the data soundly represents the abstract truth of the claim in some way. (However, you don't need to belabour the point on that last issue---as long as your approach is not completely off-the-wall, it's generally enough to show that show that you've thought about the validity of your evaluation method and that it's not obviously fatally flawed.)

There should be results, or data. Since these will hopefully be quantitative, they are usually presented as tables and/or graphs. Pay attention to precise presentation of results: units, appropriate precision, error bars, discrete vs continuous graph conventions, etc.. For each piece of evidence you present, say why you're interested in it (first!) and (then) what it shows and your confidence in that.

Present results so that they can be reproduced. So if you give program execution times, say what the machine you were running on was. If you can abstract this to something of more robust meaning than numbers relevant only to your particular experimental rig, then do so, although unless the mapping is absolutely trivial, include the raw data too. One example: if you measure execution times of some low-level code in CPU microseconds, say, then it might be worth abstracting that to CPU cycles if you have the relevant data---but keep the original measurement in there too. More importantly, explain what you are doing! One of the easiest ways to write a bad Evaluation chapter about good work is not to explain the origin or significance of the numbers or graphs you're presenting.

Have something to compare your system against. If this really doesn't seem feasible, you can sometimes get around this by defining your own goals up-front (e.g. “at least 70% precision...”--- including some argued justification for the number!), then showing that you satisfy them. In this case, the goal should be mentioned in the Introduction or Preparation sections. It is nevertheless nicer to show that your system does better than some other system or approach, even if it's a somewhat naive or trivial one. For example, comparing a query-routing algorithm against a random algorithm is better than no comparison at all.

Conclusions

Say what you did, in summary. It's okay to repeat yourself from earlier sections. Also, take the opportunity to talk about future work: what you learned, what could be done better, how your system could be extended to do new and wonderful things, or what better ideas you've had since about how to tackle your chosen problem and its close relatives.

Bibliography

The bibliography should consist of references to archival material. There are some things you'll want to “reference” that are not archival material. For example, you will want to reference software packages you've used. I personally avoid trying to “cite” these in the bibliography, for example, although some authors do this. So that point, and much of the following, is mostly personal advice.

Since URLs are not (under current web technology) stable enough for archival use, avoid URLs except as a last resort, when they're the only way of identifying a published document. Unlike paper publications, web pages are living documents, so if you use a URL, include the date on which you looked at the content (e.g. “retrieved on 2010-07-30”) so that a reader can (hopefully) use an archiving search engine to see the precise page that you saw.

I prefer to use footnotes for referencing software by URL. (But generally use footnotes sparingly---only for places where they'd really interrupt the main flow of text, but are nevertheless relevant and interesting.)

This is very personal advice, but I'd say: in the main text, avoid using citation markers as nouns. A lot of writers do this, but I find it jarringly bad style. Instead, use the author's name. For example, don't write “as shown by [42]”, but instead write “as shown by Smith et al [42]”. If you're using author--year citation markers (which I generally prefer to plain numbers), there are usually different citation styles you can use to avoid repeating the author's name, for example “as shown by Smith et al [1990]” is preferable to “as shown by Smith et al [Smith 1990]”.

Style

Following are a few other pieces of advice on style that I've had cause to dispense on the past.

Don't feel the need to tell a complete, linear story. It's very tempting to write what you did as a narrative. However, this usually has two problems. Firstly, by emphasising the what, it's easy to understate the why, which can make the reader baffled by a succession of sentences saying “I did this, then I did this, then I did this...” but without much insight about the overall goal or the motivation behind each step. Secondly, some of the twists and turns that you took are probably not interesting at all---no matter how much of a struggle they were for you at the time---so can be left out. So, instead of a narrative, try to write instead in what I call a “progressive encoding” or “iterative deepening” style. Think about the most significant high-level points you want to make. As a rule of thumb, each of these should be stated concisely at the beginning of its own paragraph. The rest of the paragraph can fill in additional detail and back-story in successive sentences. This results in a structure where the fact and the explanation occur close together. Curiously, they usually turn up in reverse order (fact first, then explanation) and this is still very readable. It's a very common style in good scientific writing. It has the rhetorical benefit that you keep the reader interested. Making a contentious or surprising statement, then delving progressively deeper into a justification of it, is a trick that can keep the reader engaged when used judiciously.

As an example, instead of saying “I tried running the JVM with the default options, but this suffered from out-of-memory errors, so ran with an increased heap size of 384MB“, say “the JVM was configured with the larger heap size of 384MB. The smaller default heap size was discovered to be insufficient when constructing the intermediate representation for larger input files.” Again, you can hoist more significant facts sooner, and leave more peripheral matters until later in the paragraph. Note that there will still be a temporal progression to some extent, particularly between paragraphs. Note also that by this approach you will naturally end up leaving out the redundant twists and turns that are not interesting. If you think something interesting has been left out unfairly by this structuring, you can always put it in a paragraph towards the end of a section or chapter, clearly signposted as a curiosity rather than part of the main flow. (Sometimes, a paragraph will end with a bracketed note like this one, to further outline the distinction between more- and less-important content.)

Avoid editorialising, by which I mean expressing opinions (e.g. about particular designs, technologies, approaches, companies, products etc.) that you don't intend to scientifically validate.

Underneath any heading in the dissertation, it's good style to include at least a summary sentence before descending down a level of heading structure (e.g. between heading and subheading).

Finally, keep your writing grammatical. Every sentence should contain a verb!

That's all I have to offer---please do let me know any suggestions, queries, corrections, counterexamples or contrary experience you may have about anything in this post.

[Update, 2011-10-13: Andrew Moore supplies this very helpful set of notes from his duties as examiner in 2011!]

[/teaching] permanent link


Powered by blosxom

validate this page