Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 13 Apr 2011

PhD examination

So I passed my PhD viva a couple of weeks ago. I do, however, have a lot of corrections to do. In fact I have about the most corrections I could have, in hours of work terms, without having to resubmit my thesis. Thank God I don't have to do that. As it happens, the actual corrections to my thesis are not very many. I have to add the odd paragraph here and there, and collect a small amount of extra data. The killer is the non-thesis bit. I'll talk about that in a moment.

There's a lot I could say to summarise my feelings about the viva. Here are the two words I've been using most when people have asked me how it went: “reasonable” and “annoying”.

For the “reasonable” part, I have to thank my examiners, Andy Rice and Alex Wolf, who deserve credit for the depth at which they got to grips with my thesis. I was quite impressed with their attention to detail. Although I can (and will, shortly) disagree with their take on what is necessary or sufficient to substantiate my thesis, I also appreciate how my doing so is very much challenging a norm... and the examination process isn't the right place to do this. Examination is a pragmatic business, and when considered less on intellectual high ground and more in terms of personal risk and reputation, I could not reasonably have expected (at least not with high probability) their taking a different position.

For the “annoying” part, in short, I was far too idealistic in my conception of the PhD examination process. Of course it has some room for intellectual rigour; but virtually no research in any practical field has such unthreatened validity that examination doesn't fall back on “due diligence” to some extent. Another word for “due diligence” is “hoop-jumping”, and that really sums up why I think my thesis attracted the complaints that it did: it didn't jump enough established hoops to make the examiners feel comfortable rubber-stamping it. I'm not saying my thesis is great; it's fairly weak really---but it's no weaker than a lot of other theses which seem to pass without problem. I suppose the examiners did rubber-stamp it in the end, given that I passed---but subject to corrections which, unsurprisingly, make it jump an additional hoop. I don't feel that jumping this hoop substantiates the thesis any more strongly, and this is the centre of my annoyance.

A new rant about an old problem

My problem is not a new one. Introducing a new language is a relatively common thing for a CS researcher to do. Assuming the claimed benefit of the language is a practical one, rather than a theoretical one, then evaluating the language is a huge problem. PhD students don't have the time or the budget to carry out large field studies. Anyway, instead of this, the usual approaches are to prove something about the language, to show that it has reasonable performance, and/or to apply it to case studies. I'm going to be bold and claim that the first two are hoop-jumping in most cases. It's a rare case indeed where a language's goal is actually to realise the theoretical property in question or to “do X really fast”. (Of course, other kinds of work, in theory and systems respectively, do have these as express goals, but I'm talking about languages here, where “language” is distinct from “calculus”.)

It's reasonable to set for your language a performance or theoretical goals in addition to your main goal, as this can be a source of interesting problems and brings the work closer to applicability in practice or interest in theory. However, it really annoys me when people confuse these goals. I hate seeing papers which introduce some new language feature that is claimed to help programmers---the usual end goal of any language---and then evaluate it either by an irrelevant proof or irrelevant performance measurement. This has the effect of encouraging both a confusion between the main goal of a language and these side-goals, and moreover, encouraging a culture where evaluating the main goal is neglected in favour of the side-goals, or where the side goals are seen to imply the main goals.

Trouble with case studies

Case study evaluation is unsurprisingly the approach I chose. This might have passed muster, except that the other hoop I didn't jump through was producing a complete working implementation. This doesn't mean I didn't implement anything: I did a lot of implementation work during my PhD. But for various reasons, my reach had exceeded my grasp. I had plenty of working examples of the techniques I wrote about, but the code generation side of my compiler had got hairy enough that I decided that it should suffice to show implementability rather than implementation. I think I did this, and I don't think my examiners doubted it either, although they did mince some words on the subject. In the end, they were reluctant to accept this implementability evidence as sufficient defence of the thesis. I couldn't put my finger on why, and I wouldn't say they could, either. Instead, I only got some quite vague questions, in essentially four forms.

The first was: “How do you know your language features are sufficient?” Of course, I don't. Since I spent a whole chapter talking about cases that aren't handled, clearly I make no such claim (although I do identify what needs fixing and how this doesn't break the key abstractions of the language). I do claim that they're sufficient for the case studies, and that since these are representative of other code, that they will be sufficient for a wider range of code. This is demonstrated by argument and careful analysis of code rather than saying “we ran it”. But saying “we ran it” is still subject to error---since realistically, how much testing did you do, and how can you be sure it was enough? The case the examiners seemed to worry most about was the one where, by failing to account for some unspecified detail, some new language feature or altered semantics would be necessary just to handle the case studies themselves, never mind other examples to which I claimed it generalised. I think I provided quite a weight of evidence that this wasn't going to happen. However, even if it did, it would still a matter of error bars, not validity.

The second was: “How do you know you haven't forgotten something in your implementation?” Again, I don't, but I have implemented enough that the implementability isn't in doubt. Even if a fully working version would turn up one or two extra details that need addressing, this wouldn't undermine the thesis.

A final question: “How do you know your language features are necessary?” I still find this question bizarre. The language features exist to handle common cases in a way that saves programmer effort. Every feature is illustrated with a plausibly common example, and any experienced programmer would recognise its usefulness. This doesn't mean they couldn't be eliminated, but doing so would bring a strict degradation in what the language offers the programmer.

What didn't help was that the examiners didn't ask me these questions one at a time, but rather rotated among them with dizzying speed. It was though they themselves hadn't yet separated them in their own heads. Without this, I might have been able to fend them off better, along the above lines. As it was, I can't help feel I did well not to get too put out by it all. I nearly did lose my cool at one point where one examiner suddenly claimed that I needed to do a performance evaluation. I had very explicitly and very clearly excluded performance from any but informal consideration very early in the dissertation, precisely in order to prevent my task from blowing up even further than it already had. Fortunately I managed to argue this one down, although annoyingly, I still have to gather some (meaningless, but fairly trivial to collect) performance data for my corrections.

The “solution”

So, how did the examiners propose that I answer their objections? In the time-honoured hoop-jumping way: to finish the implementation, of course, so that I can say “I ran it”! Actually I only have to get it up to a certain level, rather than finishing everything, which I suppose is something to be glad about. But I had failed to complete my implementation for very good reasons: it's a ton of work, and it was already past the point where its feasibility was established. In hindsight I could have written up this fact better. But I think it was still clear that what remains is a matter of development---which I wasn't prepared to spend any more of my own money to fund, given that I'd already spent 6 months living off savings and consultancy work. Fortunately, circumstances now mean that I have a job which pays enough that by going part-time I can get it done while remaining solvent. (It also had to happen this way round, since if I hadn't been able to submit my thesis without a full implementation, I wouldn't have been able to get the job that is now, indirectly, paying for the implementation's completion.) Of course, my financial situation is an irrelevance as far examination goes, and it has to be that way. The moral is that there is no safety net, and nobody who is truly responsible for your thesis than yourself. The system is accountable to nobody, and it has no incentive for self-improvement... except maybe to the extent that (and over the timescales by which) PhD examinees who suffer negative experiences become examiners who can still remember them. “It's not fair!” as Jennifer Connolly once declaimed, “... but that's the way it is”.

The role of empirical rigour

At the moment, and probably since time immemorial, there is a cohort of CS researchers in the fields of programming languages and software engineering who are vociferously advocating greater empirical rigour in research. Early on in my PhD, I thought that this movement could only be bad news for little old me. I barely had the resources to produce an implementation within one PhD, never mind do an empirically rigorous user study. However, now I think that this movement is actually on my side (as well as the side of “truth” and good science, which I didn't doubt). The hoop-jumping that would have satisfied my examiners---producing a working implementation and running it---doesn't actually strengthen my thesis, and in an empirically rigorous discipline, this would be clear. In turn, it would probably be a more “done thing” to submit theses that don't tell a complete story---because telling a complete story on something complex as complex as a practical programming language, and doing so with empirical rigour, is too much work for one PhD. Perhaps it would be more acceptable to package research “towards” a goal, evidence but not yet conclusive evidence, with its outstanding threats to validity clearly explained, yet unresolved. Instead, in our empirically immature discipline, we try to brush these unresolved threats aside by arbitrary hoop-jumping.

The downside of a more empirically rigorous discipline would of course be that each researcher can't race ahead quite so fast. Within the scope of one PhD, there is far less prospect of telling a neat, complete story. In my case, this would have been both good and bad. For empirical rigour's sake, I would have to have spent much longer on case study, including (probably) starting my thesis with an empirical study. Perhaps all implementation would have to be left for the future, and my thesis's contribution would mostly be on understanding the problem empirically, with a paper sketch of the solution validated by detailed analysis of examples. Of course, this paper sketch would have a weight of evidence behind it. The downside is that I actually like the idea of implementing stuff, and even though I haven't (yet) finished the job (and I am now working on it, again), I would have found it frustrating to embark on a PhD with no intention of completing an implementation.

Conclusion

This post probably sounds like a lot of sour grapes, although I hope it doesn't. It's actually a positive thing for me that circumstances have conspired to give me a chance to finish the Cake implementation, since it will be a useful springboard for future work and perhaps even (gasp) impact. Previously, when I was resigned to not finishing it, it was looking like this would become an albatross. More generally though, I can't pretend not to be a little bit sour about the course my PhD took. Despite making what were defensible and reasonable moves at each stage, the whole thing turned into a bit of a mess and has caused me a lot of pain. However, the mess of the work (which could have been better, but I think was comfortably “good enough”) is a different mess from that of the examination. I am now very strongly convinced that there really is a problem with the attitudes to evidence, rigour and the mythical “completeness” in computer science. If I last long enough in this line of work, perhaps I can help do something about it.

[/research] permanent link


Powered by blosxom

validate this page