Rambles around computer science

Diverting trains of thought, wasting precious time

Tue, 30 Jun 2015

A position on licensing research software

I have a habit of putting code on my github page without any statement of license. Sometimes this is oversight or laziness, but sometimes it's entirely deliberate. It's not because I don't want to open-source or otherwise license my code; I do. But, as this post covers in some depth, the Devil is in the details.

In particular, I believe that software whose purpose is scientific or researchy—to exist, to prove a concept, to be extended by other researchers perhaps, but not to acquire users—can be better off without an explicit license, at least while it remains relatively immature and has only a small number of contributors. In short, this is because there's little practical loss from not licensing at this stage, whereas by forcing a choice of license, it's possible to choose wrongly.

For some reason, this belief, at least when compressed into 140-character snippets, attracted quite a lot of opprobrium. So allow me to explain my justification and respond to the criticisms. Before I go further, I should add that I'm definitely not pretending to be an expert on licenses, law, tech transfer, aggregate properties of big companies' IP policies, or anything else I'm writing about. Everything you're about to read is informed by limited personal experience and anecdote. In fact that's how I got here: I'm very aware that licensing is something I don't fully understand. For me, that's a strong motivation for avoiding making premature commitments that I might later regret. You have the opportunity of enlightening me, and might even change my position—but first please understand what my position is and why I hold it.

So, here goes: I claim that early-stage scientific proof-of-concept software is often better off without an explicit license. Naysayers (some hypothetical, but mostly real) claim the following.

“But then people can't legally use your software?” They can, but they should ask first.

“But people rarely bother to ask.” That's fine; this software is not aiming at getting lots of users.

“But you might miss out on users.” I don't want users.

“But you might miss out on collaborators.” I doubt it. Collaborators have to communicate, not just consume. They get in touch. Or, more likely, I meet them through other channels than their browsing my repositories online.

“But people aren't allowed to look at it without a license.” No, that's not how copyright works—quite the opposite. The code is published, so you're allowed to look.

“But company policies mean employees won't be allowed look at your code, even though it's allowed by law.” I've worked at two large IP-paranoid companies, and in both cases employees would happily exercise discretion about whether to look at code lacking a permissive license. (This was true even if official guidance left no room for their discretion.) Dealing with third-party code was also a well-established part of corporate culture. Among other things, this makes these developers much more likely to ask for permission than others. During my three months at one such company, in a very junior position, I was encouraged to make enquiries not only about licensing third-party software that appeared useful, but also about perhaps contracting the developer to support our effort. This surprised me at the time. But being a corporate entity dealing in non-permissively-licensed IP makes you more amenable to dealing with other people's IP in whatever way is appropriate. As I'll cover shortly, the well-known copyright paranoia (including what I'll call “GPL terror”) arises because of cases where the license on offer doesn't suit the prospecting company, and where there's good cause to fear litigation.

“But if they look at your code, they risk creating a derived work, so could get sued by your employer!” Companies know the pragmatics of IP better than this. If I were talking about Big Company Labs's github page, I'd agree that explicit licensing would be highly desirable. Big companies tend to be litigious. In all cases, the question is: who is liable to get sued by whom? But we're not talking about that. The risk to a company from simply looking at my code (emphasis important) is miniscule. The risk to me and to the public good, from premature commitment to a given license, is non-trivial, because it risks closing off opportunities later.

“Why not GPL your code?” I'm not against doing so in principle. In fact, even though I'm told it's falling out of popular favour, the GPL is one of my favourite licenses. But if I GPL'd my code, many more companies' employees would be scared off. Many big companies' legal departments have spawned corporate policies which foster a terror of GPL'd code. Partly this terror is justified: much GPL'd code is copyrighted by large competing companies, who don't want to relicense it and do want to sue infringers. However, the terror is not justified in all cases. I'm not a competing company, I am amenable to relicensing, and I'm not likely to sue infrigers anyway. So companies have no reason to be terrified of looking at my code. Unfortunately, many company procedures are written in a blanket style, with rules like “don't look at GPL'd code!”. that are not sensitive to these nuances.

“How is not putting any license possibly better? It confers strictly fewer rights!” I know it's perverse. But since these “don't look at GPL” policies exist, the absence of any GPL notice would actually make it more defensible, according to these policies, for employees to look at my code in any depth. The real problem here is that, as far as my anecdotes can see, many corporate legal people don't even understand the basic properties of the domain. Software doesn't “have” a license; it is licensed and can be licensed differently to different people. I sometimes say that licensing is “bilateral”, in an attempt to convey this multi-party “edge, not vertex” nature. Being GPL'd doesn't exclude other licensing possibilities. Sadly, corporate legal types tend to act as though it does. They're not the only ones. Some of my peers appear keen to (as I'll call it) “licensify” all scientific software. They seem to be acting on similar misunderstandings. I've even seen it written that one should not use the CRAPL because it allegedly makes it “impossible” to re-use the software. If you actually read and understand the CRAPL, it does no such thing (notwithstanding that as a license it is basically pointless). Even if the CRAPL were restrictive, the kind of software it's intended for—scientific software, as in my case—would almost always be available for relicensing anyway.

“Why not say “GPL, or other licenses by negotiation”?” It's not a terrible idea, but again, lack of nuance among corporate legal departments makes me believe that this will scare people off unnecessarily. I could be wrong about that.

“Why not MIT- or BSD-license your code?” I'm not against doing so in principle. But as a blanket policy, I worry that it'd weaken my ability to successfully do tech transfer later. In particular, the option of dual-licensing code as useful if you have already gone with a permissive license. This might weaken the position of any company set up to do the tech transfer. As I see it, this is the key difference between BSD and GPL from a researcher's point of view. In particular, note that pursuing tech transfer can be in the public interest (and bigger competitors are likely to be less public-spirited than a small start-up founded by scientists). I've nothing against permissive licenses in general; they just feel like too great a commitment to make in the early stages of a project. I happily release non-research software under these licenses (or even public domain) from time to time, as you can see. And I'll happily contribute under these licenses to other projects that have an established tech transfer model. The key points are that my early-stage projects don't yet have this, that there's more than one way to do it, and that licenses make a difference to what options are available.

“So you might later close off your research outputs for your own commercial gain?” Not at all. I'm dedicated to using my research outputs to create the greatest possible public social good. The question is how to do that. Whether BSD-style (permissive) or GPL-style (copyleft) licensing creates the most social good is still a matter of debate. It almost certainly varies case by case. Meanwhile, two other social goods need to be considered: being able to do more research, and actually getting research results transferred into practice. The commercial model can be useful in both of those regards, even though, ideologically speaking, it's not the model I prefer. It would be foolish to jeopardise these opportunities at the moment I publish my code, especially since nowadays the norm is to publish code early (no later than publishing the paper). Very often, at that stage it won't yet be clear how best to harness the code for social good.

“Aren't you inviting people to take the unreasonable risk of using your code without a license, which is breaking the law?” No. Anyone concerned with staying within the law is welcome to contact me about licensing terms. Anyone worried that I might sue them is welcome to do the same. This is no hardship: anyone interested in running or building on my experimental team-of-one research code almost certainly needs to contact me anyway—if not to build it, then to do anything useful with it. There's also an important separate point, which I'll return to shortly: scientific endeavour is built on personal contact and trust relationships, far more than either commercial or open-source software are built on those things. Even in the case of people who don't bother contacting me, the “reasonableness” of the alleged risk needs to be assessed in the context of the scientific culture.

“But in practice, if everyone followed your approach, won't people end up breaking the law more often?” Perhaps so. Most people break the law just going about their everyday business. Few of my countryfolk over the age of 30 have not transgressed our laws in countless ways: copying a tape, keeping a recording of a TV programme for longer than 28 days, wheeling a bike along a public footpath, borrowing their neighbour's ladder without asking, backing up a shop-bought DVD to their hard disk, and so on. Fortunately, the process of law enforcement involves discretion at the point of prosecution. This discretion is kept in check by social dynamics which, while mostly outside legal system, are very real. If my neighbour prosecuted me for borrowing his ladder, it'd render future reciprocal altruism considerably less likely. Suing good causes creates bad press, regardless of whether it can legally succeed. The law is overapproximate out of necessity, so that the genuinely problematic infringers can be pursued, not so that absolutely all contravening activity can be shut down. It presents the option to prosecute, not the imperative to do so. That's not an argument for egregious overapproximation in our laws. Obviously, legal overapproximateness can be vulnerable to abuse, and should be minimised by a process of continuous refinement. But aspiring towards eventually reaching a perfectly precise law is a hopeless ideal. The law is a practical tool, not a formal system—it is both ambiguous and inconsistent. The letter of the law needs to be viewed in the context of the legal process—one carried out with the frailty and imprecision that come from human weakness, and the malleability that comes from human discretion. In short, we are doomed to living with a background level of lawbreaking, and it's in our interest to live this way. With licensing, the right question to ask is: how can our use of licenses affect scientific research, and how should it? This means thinking about process.

“But don't big companies have good reason to be conservative? Won't this harm your relationship with them?” Yes, they do, but no, I don't believe so. Conservative policies are justified in a corporate context. For big companies in competitive markets, the stakes are high, and the capital cost of pursuing legal cases is affordable. If an equally large competitor had such a case, it could turn into a very costly court battle (regardless of who wins). That's why big companies are paranoid about IP: they're afraid of competitors. But they're not afraid of small guys, like researchers or universities. If I had a credible case that a big company were infringing my copyright, the company would very likely settle with me out of court. In any case, as I covered earlier, if they want to use my software, they will want to negotiate terms with me anyway.

“Isn't your attitude socially irresponsible?” I believe I have a stronger social conscience than most people I meet. Everything I've just said is a position I hold in the interest of maximising public good, whether or not my beliefs are correct or effective. So no, I don't believe I'm socially irresponsible at all. Here are some things that I believe are socially irresponsible: to allow corporate capture of publicly funded research outputs; to make licensing commitments of publicly funded work without very careful consideration; to needlessly jeopardise the future potential to exploit publicly-funded work to public-good ends; to allow the practices of big-company legal departments to get in the way of scientific endeavour.

“Licences are good, m'kay?” This seems to be some people's position, but it's not an argument.

“All sensible software projects have licenses!” Software that seeks users is one thing. I've already covered why I'm talking about something else.

“Why are you resisting doing the normal, “responsible” thing?” My situation isn't normal. In this abnormal situation, the normal thing is not necessarily the most responsible thing.

“GPL is not free, troll troll...” I'm trying not to get into that. But I do suspect that this whole debate is really GPL versus BSD in disguise. If you don't believe in copyleft, then the only open-source licenses you're likely to use are BSD or MIT, and there's not much useful flexibility to be gained by deferring the decision. So you can just license all your code straight away. BSD in effect chooses a fixed attitude to commercialisation (“anything is fine; what goes around comes around”) whereas GPL tends to create a fork in the road. I want to hang on to my option on that fork, until I know that doing away with it is the right thing to do. The fact that the permissive-versus-copyleft debate never dies is a powerful argument against coming down firmly on either side. It's better to take decisions in context, case by case. That's exactly the position I'm defending. Note that Bruce Montague's article advocating the BSD license model for tech transfer is right if you're talking about big projects—it's harder to commercialise GPL'd code than BSD'd code, and deliberately so—but completely wrong for most small research projects written by a small group of collaborators. In those cases, it's much easier to relicense the lot, in which case your options for transfer include a dual-license arrangement in the manner of Qt or MySQL. This still permits GPLing of the ongoing commercial development (or not), while retaining the option to license it for commercial development. It says a lot about how subtle the issues are that a simple tweak of context can more-or-less invert the up- and down-sides of the two licenses. (I actually find Montague's article quite fuzzy, even FUDdy, about the details—it only really says that BSD is “simpler”, which any old Einstein should be quick to challenge.)

“I once had licensing trouble, bad enough to require lawyers. Therefore, clearly explicit licensing is essential!” This is a complete non-sequitur. I don't doubt how problematic these things can get. But it doesn't mean that any blanket policy will improve things. If IP lawyers have come into a scene where science is supposed to be happening, it means something has already gone wrong. The culture clash is between science—where trust, good faith and cooperation are routinely assumed—and corporate licensing, where you can't trust anyone beyond what their contract or license binds them to. In short, this happens when big companies get involved and impose inappropriate licensing terms. These companies are where you should be directing your opprobrium—not at me.

To properly explain the latter point, I need to ditch the question-and-answer format (which was already becoming tedious, I'm sure). The argument came up in reference to the Richards benchmark. Let me summarise my understanding of it (corrections welcome). Martin Richards wrote and distributed some benchmarking code. It was distributed as a tarball, in the scientific spirit, with no license stated. It turned out (later) that he was happy to release it into the public domain. Trust, goodwill, reputation, responsibility, ethicality are currency among scientists. Martin's release occurred within that culture. My github repositories do likewise.

Some time later, Mario Wolczko produced a Java version of the Richards benchmark, while working at Sun Labs. Working for a big company makes a person more IP-fastidious than average. Mario very conscientiously sought permission from Martin (perhaps unnecessarily—I'm not convinced he was actually making a derived work, but perhaps he was; no matter). He was of course granted Martin's blessing. He circulated his new code, including to Martin. Martin even began distributing it in his tarball.

The catch was that Mario's employer was a large company, and would only allow the code to be released under some restrictive licensing terms. You can still read them. They come in two parts; the first page is statutory US export regulations while the second page contains Sun's own licensing terms. These include the following restriction.

“Licensee may not rent, lease, loan, sell, or distribute the Software, or any part of the Software.”

This is a severe restriction. It's especially inappropriate for a work of this kind—a benchmark. Benchmarking is a concern that cuts across research areas: it's not just people working on a particular technique or problem that want benchmarks. Anyone working on any aspect of a language implementation might want to apply some “standard” benchmarks. A benchmark is more useful the more things it has been applied to. That's why we often like to group them into suites and circulate them widely. This license is effectively incompatible with that use.

I'm sure Mario tried his best against the might of Sun legal. It's telling that his current web page advises the “caveat emptor” approach: make your own mind up about how likely you are to be sued, and act accordingly. I'd agree it's not good at all that we are being forced to take this approach around a big, litigious company (as Sun's new owner Oracle certainly is). But it seems bizarre to blame this on Martin's original sans-license release. The blame is entirely on Sun. The problem is not a lack of license statement from the little guy; it's an inappropriate license imposed by the big guy. The right response is to decline to use Sun's code. (That said, it's worth noting that Martin's redistribution of the Sun code, although technically a breach of the license terms, has not resulted in Sun or Oracle suing him. So perhaps they, too, understand the value of scientific culture and have no interest in disrupting it.)

To put all this more succinctly, the onus to license doesn't weigh equally on all parties. For artifacts coming out of big players, there's good reason to demand licenses—not just any license, but fit-for-purpose licenses—precisely because big players can afford to sue. By contrast, even the University of Cambridge is a comparatively little player. It is vanishingly unlikely to sue anyone pursuing public-good scientific or research agendas, not only because it's expensive to do so, but because the reputational damage would be huge. And even if you disagree that this particular university isn't a big player, that's immaterial because it doesn't own my research work's copyright anyway; I do.

I, like Martin, work within the scientific culture as a non-litigious “little guy”. It should go without saying that redistribution of my software towards scientific ends will always attract my blessing, not lawsuits—even if people don't ask first (mildly rude as that may be). Meanwhile, for commercial use, I'd like to defer fixing the exact terms until there's serious interest—so that I have some idea of how it's going to be used, and can choose terms appropriately. The pragmatics of “no license specified” and “restrictive license specified” are wildly different, even though a hard-headed legal interpretation makes the former a degenerate case of the latter.

The push towards “licensification” of scientific software actually works against this culture, since it erodes the norm that says that permission will be granted for doing reasonable things for scientific purposes. That's yet another reason why I resist this way of working. Licensification is also viral. If you believe strongly that all released artifacts should “have” “a license”, then you have to follow this to its transitive closure: to release anything that builds on anything, you must obtain a precise statement of license from all derived-from works. This is justified for commercially valuable software, but I don't believe it's the way science needs to work. And if you share my belief that premature commitment can do more harm than good, then we agree that science ought not to work that way.

You might complain that this “scientific culture” I've been talking about is nebulous. I'd agree. But capacity and inclination to sue are reasonable proxies. Call me naive, but if you really worry about the legality of doing science with artifacts released publicly by a reputable scientist working for a university, either you worry too much or the world has gone very wrong. On the other hand, it is reasonable to worry about artifacts released by big companies, because the culture they operate in is very different. That's why we need to pressure them into offering reasonable terms for the things they release, and decline to use their artifacts if they don't. That's what conspicuously failed to happen in the Sun/Richards case.

As more and more scientific research involves code, licensing of code among scientists is going to become much more of an issue. Misunderstandings are already rife: a quick bit of web searching will easily turn up things like this and this. We need to have a frank discussion about what the real issues are, and an open acknowledgement of their subtlety. I believe that blanket licensification is not the answer. It forces a decision to be taken too early, when too little information and expertise are available. I also fear that it will encourage a culture of litigiousness and bureaucracy that is neither desirable nor affordable within science.

As one tentative suggestion for how to make the law work better for us, I note that US copyright law's “fair use” allowance already covers educational purposes. Ideally it should also cover public-good non-commercial scientific purposes. In fact, this already is a factor in assessing fair use. But it's ill-defined, and I'm not aware of any useful case law on that front (are you?). If we want something to work towards, this kind of fair use law is what it should be—not the up-front licensification of every published scientific artifact.

Just as a coda: the above is what I currently believe, based on limited study and my very limited awareness of evidence, case law and the like. Although that belief has withstood a certain exposure to others telling me how wrong I am, it's not a conviction I'm strongly attached to. I went to the effort of writing the above not because I'm convinced I'm right, but because I don't like being branded “socially irresponsible”. Challenge my argument if you like, but don't doubt that I treat this issue with due care and attention. I'm still waiting for the counter-argument that will make me change my position. You might well have it. If you do, please let me know.

[/research] permanent link contact

Wed, 17 Jun 2015

“Type safety” and observability: an irony

As I've written before, people mean a lot of different things when they say “type-safety”.

The meaning I like best is that of Ungar et al: “the behavior of any program, correct or not, can be easily understood in terms of the source-level language semantics”. This is a dynamic property. It means we can explain failures in terms of the source language; they can't make the program do strange or arbitrary things. This is usually what people mean when they talk informally about “type-safety” in relatively lower-level languages like C, C++, or even Java, in dynamic languages (the context Ungar and friends were coming from), and also by Krishnamurthi and Felleisen. Language safety is an absence of corrupting failures; only understandable, explainable failures are allowed. (I find it irksome that “type-safety” has come to denote this general safety property rather than anything directly to do with type errors. Oh well....)

If type safety is something to do with explaining, what is an “explanation”? The most common kind of explanation is a stack trace. Stack traces are interesting because they combine a certain amount of live execution state and a certain amount of execution history. They tell us not only how the program was trying to proceed (its continuation), but also how it got where it is (its history). An explanatory stack trace will be heavy on the historical information, even if that information is not needed for continuation, because “how it got there” generally offers a clearer explanation than “what it would have done next”. For example, an ideal stack trace would include not only a list of active function calls on the stack, but also the values of their parameters—ideally the parameters' values on entry. Many of these values will be “dead” when the trace is generated: execution has moved on since the entry to the call, so those values are history (literally). This history is useless to the program in continuing its execution, but is invaluable to the programmer reading a stack trace, because they serve as explanation. This phenomenon—that “dead” state can nevertheless be useful as explanation—is one we'll return to.

If one kind of “type safety” is defined in terms of explanation of failure states, what's another sense of type safety? It's that “well-typed programs don't go wrong”. This is a completely different property, and I prefer to call it type-soundness. (Also note that it's using a different meaning of “type”.) In a type-sound language, by the time we run a program, we've proved that certain errors won't happen. The system for doing that proof is called the “type system”.

Now for the irony. Type-soundness is used to justify elimination of explanation in ML implementations. ML-family languages' type-soundness property means that the implementation can do away with run-time metadata (tags). This has the side-effect of removing the explanatory value that these tags would otherwise have. According to our initial definition, “type safety” was all about explanation.

But if we've proven type errors won't happen, there'll be no need to explain them, right? This us true, but the problem is that type errors are not the only errors that type tags can help explain. Even though no type errors will occur, other run-time failures will happen, and they are in need of explanation. Type tags are an obviously useful artifact for generating such explanations at run time. If we want to generate a stack trace, type tags make it easy to implement a more useful explanation—including the valuations of local variables, say. Without them, this becomes very hard.

Are other run-time errors really inevitable? The short answer is yes. Any general-purpose programming language, no matter how “safe”, has to be accept the fact that some run-time errors, for example unhandled exceptions or assertion failures, will be feasible. This is simply because computation is necessarily input-dependent. Even if you prove that your program never fails when run on in-spec input, out-of-spec input has to be detected and handled somewhere. No useful program runs in a completely closed system. (If you can safely assume you don't get any out-of-spec input, it's because it's been filtered out somewhere else in the wider system.)

Are type tags really necessary to present useful stack traces? No, not strictly. We can think of various schemes for restoring this ability, and my previous post goes into this in some depth. But these are much harder than just using tags, and it shows: implementers don't bother with them. On most implementations of ML-family languages, if a fatal exception occurs, the explanation offered will be, at best, a list of source-code locations (function names, possibly with file and line numbers) on the stack. It won't include valuations of locals, or anything about the concrete types of values. That's because the run-time tags which would allow printing out these things have been removed. In short, run-time failures are actively made less explainable by the erasure of type tags. (I'm told that AliceML is better than other MLs in this respect, though I haven't yet tried it myself.)

We should not be impressed by the “look, no tags!” property of ML unless some equivalent mechanism is in place to retain the degree of explainability which those tags afforded.

In theoretically-inclined PL literature, hard-to-formalise concepts such as debuggability or explainability—we could say “usability” generally—don't factor into how a design is evaluated. This is fine; it's simply Dijkstra's “separation of concerns” in action. Formalisation necessarily discards all details that are inessential to some specific (formalised) problem. A corollary is that problems that can't be formally stated are never addressed by theoretical research. Again, that's fine too. But producing a practical artifact calls for some kind of “systems thinking”: taking a much broader view than that of the theoretical results which may have inspired it. Any practical scenario is the superimposition of a collection of problems. Even for those that are amenable to formalisation, theoretically pleasing approaches to each one in isolation won't necessarily compose into a useful whole. The lesson of ML is that we have to be incredibly circumspect about transferring theoretical results, like erasability of tags, into programming practice. There must be a serious attempt to elicit and accommodate unformalised requirements. This needn't be deep or difficult: it's not rocket science to anticipate that we might want to observe the run-time state of an ML program, such as when explaining a run-time error. I find it sad that a succession of real ML implementations, all built by incredibly smart people, have adopted an unobservable, unexplainable approach for no good reason. Mistakes like this are holding us back.

[/research] permanent link contact

Mon, 24 Nov 2014

I hate systems research... sort of

[I wrote this article on 6th August 2008, when I was a second-year PhD student, but then didn't have the guts to post it. Looking back, it's not so controversial, although it was obviously shaped by my (relatively narrow) experiences as a PhD student in a particular corner of the Networks and Operating Systems group here in Cambridge (a group I am very happy to be associated with, I should add).]

David Byrne, a well-known proponent of musics from around the world, once wrote an article entitled “I hate world music”. He hadn't experienced a sudden change in his taste, but was referring to the label of “world music”. By labelling all music that doesn't conform to a certain narrow set of Western-heritage styles as “world music”, it can be conveniently marginalised and dismissed as a unit.

So too with computer science research, labels are used to divide and marginalise. This time, it's the inside marginalising the outside. A recent EuroSys call for papers invited papers on “systems aspects of...” a long list of topics. Despite having been working in a systems research group for nearly three years, nobody has ever told me what “systems aspects” are. Yet most people who'd describe themselves as “systems researchers” seem to wear that badge with some degree of pride. Until someone can tell me what it means and why it's such a good thing, I won't be doing the same.

What defines systems researchers? Is it that they “build stuff”? This is perhaps necessary but hardly sufficient. Is it that they meddle with operating systems, networking stacks or other infrastructure software? Perhaps, but it seems to cover a lot more ground than that to which the label is commonly applied. My preferred take is that “systems” is simply a label, of historical origins, which now identifies a research community—a community which, like all social networks, is defined only by the links among its members and the continuity of that relation's evolution over time. One thing that has continued to surprise and frustrate me about the world of research is that despite huge overlaps of interest, it is divided much more than it is united: divided into largely separate, non-cooperating communities. Many systems researchers would be horrified (thanks to a snobbery I will come back to) if you claimed that their work overlapped with that of the software engineering community. Yet having attended in the last year both EuroSys and ESEC/FSE, I estimate that at least 40% of the papers I saw at each of those conferences could easily have been presented at the other without any sense of incongruity whatsoever.

Last week at the Semantics Lunch, Mike Hicks gave an excellent talk about CMod, the module system for C developed at the University of Maryland. Unlike other Semantics Lunches, the advertisement was cc'd to my research group's list. I get the e-mails anyway because I'm subscribed to other lists, but the obvious inference is the impression that systems researchers will be interested in any languages research that concerns C, but not in any other. This impression is certainly not fair, but the exaggerated impression does indeed stem from a grain of truth.

One way of getting your tools- or languages-oriented paper published in the systems community is to show how it applies to “systems code”. This almost always means one of three things: performance-critical code, resource-constrained code, or code written in C. These are all pretty general requirements, and fully relevant to the widest imaginable “software engineering” community. Regarding the last of the three requirements, of course there are many great reasons for creating tools that work with C, given the astronomically huge volume of existing software that is written in it. Working with C is certainly a priority for my own work. However, everyone knows that C is a language with many gigantic problems, and ones that have been known for decades. Cynically, I would claim that part of the reason that C-language developments, both code and tools, continue to flourish within the “systems community” is an endemic disdain for programming language advances, or in general of any research community that isn't “systems”.

This separatism comes in more than one form. There is a certain air of what I call “macho bullshit”, where, among other things, there is pride is associated with writing unnecessarily nasty code in unnecessarily low-level languages. Now, I have seen some exceptionally clean and well-engineered C programs, particularly ones coming out of my own research group, so this isn't any sort of blanket statement. I've also seen, again locally, many systems researchers who are heavily into language advances. But the other side of the coin is there too: the disdain and the separatism are definitely things I've observed around here, and I don't like them. They're joined by an attitude to criticism: there appears to be an unwelcome degree of sniping and dismissiveness, found in reviews and programme committees, together with a bizarre culture of unnecessary controversy, as typically found in workshops with “Hot” in their title. It seems there are too many systems researchers prepared to forward any criticism of others' work, however invalid, and pronounce any self-aggrandising position, however obtuse and misleadingly argued. This is, in a word, unprofessional, and it undermines the very goals of research. I don't have enough experience to know either how widespread these attitudes are, or how much more systems research suffers from them relative to other fields.

[/research] permanent link contact

Fri, 13 Jun 2014

Linking and loading: what's incidental?

Not long ago I gave a couple of talks about linking, loading and debugging here in Cambridge. It's a notoriously grungy area of our infrastructure, and a recurring question people asked me after the talks was as follows: what's the incidental complexity and what's the intrinsic complexity? Put differently: how might we redesign the whole lot nowadays, to get a less complex end result?

It's hard to answer, and partly it depends on what you value. On the surface, it seems like linking should be conceptually simple. Digging deeper though, it becomes apparent that there are a lot of concerns involved, and they are often in tension, making it very tricky for any simple solution to satisfy all requirements. That doesn't mean that simpler solutions are not possible, but we should not underestimate the intrinic complexity either.

What intrinsic complexity am I talking about? I can see four concerns central to the design of ELF dynamic linking. One is link speed: linking should be as fast as possible. Another is memory sharing: shared libraries were introduced (in part) to save memory. Another is flexibility, achieved through interposability: it should be possible to customise the link to add or replace features without modifying binaries. The last is federation: ELF is shared by many vendors' toolchains and many OSes, so the ELF runtime necessarily has a descriptive role, which is the basis for cross-toolchain runtime protocols for stack walking, exception handling and the like.

Part of the problem is that many people don't value all four of these concerns simultaneously. Link speed is valued by software distributors (who want to cut down startup times) and people who use embedded platforms (who can't afford overheads), but not workaday developers or, directly, users. Position-independence is valued by people who run the kind of systems where it saves memory, like desktop systems, and not by those who don't, like most servers. (Plan 9's decision not to bother with shared libraries is both reasonable, if you run mostly servers, and unreasonable, if you run mostly large desktop applications.) Interposability is valued by people who write tools and those with unusual requirements—which often entail reimplementing selected parts of core libraries (think malloc())—but not by those who don't. Federation is valued by those who value “openness”—competition among vendors, interoperation of tools, multi-language programming, and so on—but not by those who are comfortable with closed platforms, VM-style designs and/or “one true language”.

Few people care about all things simultaneously, so linking seems overcomplicated to everyone. But all these concerns add to complexity individually, and the tension between them also increases complexity. Link speed concerns create incentives for doing more work in the compiler or compile-time linker. This creates “premature optimisation” phenomena which complicate linker implementations (by adding to the variety of relocations) and also complicate the user-facing parts of the linker (such as getting the link options right). Sharing memory entails position independence; this creates complexity for the compiler author, who must support multiple code models, and the user who must grok the related compiler options. Interposition creates complexity for the linker author and user, and costs in performance since it prevents some binding happening earlier than it otherwise could. Federation adds complexity for the compiler author, who must take care to correctly follow conventions and emit metadata such as unwind information and debugging information. It also brings some cost at run time, such as how a cross-language cross-vendor exception protocol is slower than a custom one.

What design decisions can we rethink to get a system that satisfies all these requirements but with less incidental complexity? I believe that a smarter linker could help. The Unix-like linker design, as evolved to elaborate formats such as ELF, can be described as “smart format, dumb linker”. A key “dumbness” of the linker is that it does not know about instruction streams. Rather, the linking format describes all the ways that relocations can be applied as simple arithmetic on byte sequences. The linker's output is a mostly-deterministic function of its inputs (including linker scripts).

If we had a smart linker, we could have slightly dumber compilers—a win for simplicity since there are many more compiler implementations than linker implementations. We could also have less user-facing complexity: the compiler could just output code according to the slowest but most flexible code model, and the linker could take care of optimisations at the instruction level, including position independence. The key difference between such a linker and the current one is that a linker that can do these things must be able to parse and rewrite instruction streams. Current designs have been crafted to avoid this, but I believe this is a good fit for linking and not the burden that it first appears to be.

A smarter linker needn't yield slower programs than a dumb one. In fact it could speed them up. Link-time optimisation is rarely used at present, and not at all across dynamic linking. Meanwhile, interposition has nontrivial run-time costs in some cases, since symbol bindings must be computed at load time and run time. A smarter linker could use its awareness of instruction streams to deploy a variety of strategies to speculatively bind, cache and unbind references, both in disk images (which can be thought of as a less ad-hoc prelink) and in memory (e.g. inlining calls that currently go through the PLT). The former option can be thought of as link-time deoptimisation: we store on disk a binary that has been adaptively optimised for the common case on the particular system, and then if we find that an unusual link requirement comes along, the linker can produce a more flexible deoptimised version. The idea of “linkers as servers” has been explored before (e.g. in the OMOS linker, from Utah) but not in this particular way.

Compatibility concerns have also greatly complicated linking. Before shared libraries, things were fairly simple. To link against the C library, you link with -lc. This is just a bunch of object code in an archive. Contrast that with now. Shared libraries have been added in a “compatible” way that sits alongside, but doesn't replace, archives. As a result, we use both, and get double the complexity. For example, on GNU platforms we mostly use shared libraries, but libc_nonshared and -lgcc et al complicate all that. Since “compatible” shared libraries could not quite offer a complete solution for replacing the old-style linking, we have bifurcated the previous clean static linking model into a mostly-shared-but-partly-static model. A smarter linker could internalise the goal of code sharing. By rewriting instruction streams at load time, any compiled code is potentially shareable, and it's the linker's job to extract an adequate degree of sharing. There's no need for the user-facing complexity of distinguishing shared from non-shared linking.

Shared libraries themselves must bring some added complexity, since being (logically) shareable is a stronger requirement on a chunk of code than having it work only in a single context. In particular, robust shared libraries require symbol versioning or some other notion of “semantics”, to account for the inevitable evolution of software. Current designs for symbol versioning are themselves a great example of unnecessary complexity. Again, they were designed to be a transparent extension to what already existed—versionless shared libraries. This has yielded an overly circuitous solution. By pushing symbol versions into the symbol name, a lot of complexity is pushed to other contexts. Most irritatingly, cients of dlsym() are essentially broken by it, because they must deal with symbol names munged to include versions (or must switch to using dlvsym()). The “exact match” semantics offered by dlsym() no longer offer the right abstraction when looking up symbols provided by versioned libraries. But this solution was accepted because it didn't syntactically break anything (and semantics are the user's problem, right?). The ideal solution would look up symbols not (just) by name but also (partly) by their semantics.

Interposition is also arguably not a fully realised idea. One problem is that not every symbol is interposable, because of the static/dynamic split I mentioned earlier: some programs, and some parts of most programs, are still linked statically. Moreover, “interposition misses” easily occur where the interposer doesn't interpose on all the def–use edges it's supposed to, For example, in glibc, many library-internal calls to low-level functions like mmap() don't go via the publicly-exposed symbol. Calls that appear to be to open() might actually be linked to a mysterious alias of that symbol, such as __open_2(). Although interposition is still useful, the possibility of “missed” calls and the need to grok undocumented binary interfaces like __open_2() make interposition-based software fragile and expensive to maintain. This could be fixed by a more uniform dynamic linking model and a more semantically-founded notion of interposition. One that might work would be interposition based on code paths, not entry points. Again, implementing such a notion inside a linker would require that linker to understand instruction streams and the control flows they allow. When we ask to interpose on a (function) symbol, what we're really asking is a bit more like to insert a new call-graph node that dominates all those in the interposed-on call graph. (It's a bit more complicated than that, since this alone doesn't solve the glibc problem I mentioned, but this basic idea can likely be extended to yield the interposition semantics we really want.)

Hardware complexity also leaks into linkers. The root cause of complexity in relocation and code models is the diversity of addressing modes offered by hardware. Linker authors and others working at the toolchain level tend to take on trust the good sense of the hardware designers' decisions. Whether this complexity is intrinsic or incidental depends on whether we allow ourselves to rethink the hardware too. For the moment I'm going to take this diversity as a hard constraint, but I'd welcome views from more hardware-minded people.

All this is not to say that old-fashioned complexity creep isn't behind its share of problems. Debugging information is a great example of “limb growth” and is crying out for a cleaner design, ideally one that helps reduce the burden on compiler authors and/or improve the accuracy of the debugging information they generate. For linking proper, though, the old-fashioned Unix model of “dumb linking” has stayed relatively clean. It's been keeping that model, in the face of new requirements such as shared libraries, versioning and so on, that has complicated things. I believe it's no longer the right model. Luckily, we can produce a smarter linker that is nevertheless a drop-in replacement for the standard one, so we have an evolutionary path towards something better.

[/research] permanent link contact

Wed, 14 May 2014

Instrumenting casts in C++

A reviewer of my libcrunch paper wanted to see me back up my claim that its architecture is “multi-language” by demonstrating support for another language. In particular, for some strange reason, he wanted C++ support.

I had shyed away from C++ support because C++ front-ends are infamously complex. I don't know of a particularly friendly system for instrumenting C++ code. But on reflection, perhaps this isn't necessary. One of the joys of C++ (yes, you heard me) is that it's designed so that user-defined constructs are more-or-less on par with built-in constructs. My instrumentation needs to instrument pointer casts which would otherwise be unchecked. If you follow good C++ style, these are done using static_cast and reinterpret_cast. (Note that dynamic_cast is already checked by the runtime.)

Can we define our own checked_cast template that is a drop-in replacement for these two built-in operators? Then all we need is to #define the latter to the former and we've done the instrumentation.

I've managed to get something that appears to work well enough, but this is easier said than done. We might be tempted to write a function template like the following.

template <class From, class To>
To checked_cast(const From& f)
    // ... do our check, then...
    return static_cast<To>(f);  // or reinterpret

But that's no good, because we need to distinguish the case where To is a pointer from where it's not. We can't partially specialise function templates, so we can't do this. It's also no good because some instances of static_cast need to be constexpr. Indeed, static_cast to and from integers and chars is used extensively in header files, in constexpr contexts such as expressions which size arrays, or the bodies of inline library functions that are themselves constexpr. We need a more fine-grained approach.

Let's try a class template. This is awkward because casts have to look like function invocations. More specifically, we're instrumenting expressions of the form cast_operator<T>(expr) and we need something that drops in where the cast_operator bit goes—we can't modify the T or expr parts, or add anything afterwards. What seems to work is defining a class template such that the cast expression becomes a constructor invocation of a class we provide, say checked_cast_t<T>. We then define a type conversion operator to turn it back into something of the result type. The compiler inserts the conversion implicitly, so we have a drop-in replacement.

template <typename To>
struct checked_static_cast_t
    To temp; 
    operator To() const { return std::move(temp); }
    // construct a temporary, which decays 
    template <class From>
    checked_static_cast_t(const From& arg) : temp(static_cast<To>(arg)) {}
    // overloaded constructors go here

// ... specialisations go here

#define static_cast checked_static_cast_t

We need to specialise the above to make various cases constexpr. In particular, we want all casts to integral types to be constexpr, and all casts from integral types to be constexpr if they're not going to a pointer type. We handle the former by specialising the whole template, instantiating To for the various built-in integral types. We handle the latter by overloading constructors to be constexpr (which, perhaps surprisingly, does work). We also need constructors that take rvalue references. All that expands out to a lot of tedious code, but the result happily compiles various standard library headers that make use of static_cast and reinterpret_cast. We can then supply the header to our compiler using a command-line -include option, and hey presto: we've instrumented casts in C++ using just the C++ language itself.

It's not perfect, though. We can't instrument C-style casts. It's also certainly possible to construct code that compiles without the instrumentation but not with it, by requiring pointer casts whose results are constexpr (and which, by definition, we refuse to provide). In particular, it'd be nice if constexpr were a modifier that can be a basis for overloading—then we could ensure that a cast for constexpr input always yield a constexpr results, by overloading this case explicitly. As it is, I'm hoping it's good enough; I'll post back later when I have more experience of using it.

[/research] permanent link contact

Tue, 08 Apr 2014

Dynamic linking and security

The thoughts in this post were provoked after reading Tim Brown's very interesting Breaking the Links article.

Dynamic linkers are notable for privilege escalation bugs. The reason is their interaction with the setuid mechanism, and indeed any mechanism that associates privileges with an executable. Unix's original model where executables are trusted in their entirety is fundamentally flawed on modern platforms that have shared libraries, where executables usually link in other code, some of which can be supplied by the user. Rather than getting rid of the now-flawed setuid mechanism, currently dynamic linkers instead impose a raft of ad-hoc restrictions, lashed together in the hope of closing off any paths by which user-supplied code can get into a setuid process. They must also balance this against another goal: to avoid the unwanted side-effect of ruling out some perfectly trustworthy compositions. Unfortunately, these ad-hoc measures invariably fail on both counts.

What does setuid mean? It means that the invoking user has access to any behaviour allowed by the setuid program, as executing with the program owner's effective uid. Attackers escalate their privileges by introducing unanticipated code which widens that set of behaviours. Can we take a better approach? One naive idea would be to construct the process as normal, and then check that it includes only trusted code; at that point, we decide whether it runs with elevated privileges or not. (A wart is that we also have to account for code loading after the start of execution; let's ignore that for now.)

Does this break anything? Certainly it will do. I might run a job that spawns a process tree in which some subprocess is setuid. Normally I can run the tree with some other library LD_PRELOADed, expecting that although my library won't get preloaded into the setuid process, that process will still run with elevated privileges. Under our proposed new model, if we do the preloading then discover that the preloaded library is not trustworthy, we will run it with lower privileges, and likely break the process tree (assuming the process really needed to be setuid).


This is a feature interaction, and what we need is a policy for resolving the interaction. Current Unices have the policy that “setuid clobbers LD_PRELOAD”. The alternative we just considered is that “LD_PRELOAD clobbers setuid”. Neither of these seems adequate. Perhaps instead we can evolve things towards a more subtle mechanism that can avoid the interaction in the first place, perhaps by selecting among untrusted and trusted libraries. For example, if there are multiple available versions of some library, we might use the more trustworthy one instead of the one that a fixed set of name lookup rules guides us towards. In general, we can see this as resolving ambiguity among a set of depended-on library specifications in a way that maximises value (both trust and functionality).

Doing so requires a way to designate what code is trusted, not just what executables are trusted. We also need a sense of what alternative ways there are of satisfying “the same” link requirement. I have been using the example of LD_PRELOAD so far, but on ELF platforms, link requirements (beyond the executable) are specified as either a PRELOAD or (more often) a NEEDED, a.k.a the DT_NEEDED header of ELF's .dynamic section.

To find alternative implementations of “the same” requirement, we can mine the redundancy inherent in RUNPATH, LD_LIBRARY_PATH and perhaps the multiple notions of ORIGIN that can created by hard-linking. Each of these might provide multiple candidate libraries. Setting up a fake ORIGIN is a trick familiar to crackers, but we can turn it around by enumerating all possible ORIGINs of a given shared object and considering all the libraries we find there. (Sadly this requires a scan over all directories in the filesystem, and in the common case will yield only one library. But this approach will defeat link-based attacks, since even after hard-linking, we will still find the original library, and any sensible trust heuristic will select it in preference.) The ABI tag matching (modified by LD_ASSUME_KERNEL) is another instance of how the linker will look for libraries in particular places satisfying particular properties, in a way that is currently very rigid but could be generalised into a search/optimisation problem where paths supplied by developers, administrators and users are used as hints and bootstrapping input, rather than definitive instructions.

This approach brings two further problems. Firstly, what's to prevent us from choosing a probable-looking binary that is semantically broken (with respect to our use of it)? We can argue that all binaries with the same soname should be interchangeable, but in practice there will be difficulties. And matching by soname might be too restrictive anyway. Secondly, like any search- or rule-based system, our approach has a “delocalising” effect, lessening the administrator's explicit control and making the linker's behaviour more complex to configure and debug.

Another subtlety is that trust in the part is not the same as trust in the whole. Even if we refine Unix's notion of trustedness down to libraries rather than just executables, some exploits can work by combining trusted code in untrusted ways. The case of another linker exploit, CVE-2010-3856, is one instance of this: the library is sane enough that it could easily be deemed trusted, but we can construct a very specific context in which it is anything but. (This context is: use it as a linker-auditing library to a setuid binary, causing its constructor to be run with elevated EUID, hence allowing a temporary file exploit that would not emerge in “normal” contexts where the constructor did not have elevated privileges.) This is a classic “confused deputy” situation.

Confused deputies are always a good argument for yet finer-grained models of privilege, such as capabilities. So it's not clear whether we would get much security value from search-based link-time composition, relative to plumbing a better model more deeply into our operating system.

[/research] permanent link contact

Mon, 13 Jan 2014

C libraries and linking

At my talk today, Simon PJ asked an interesting question which I managed to give a slightly wrong answer to. I had observed that asking my C compiler to link an object file invoked the linker with a lot of extra input files, many of which are specific to the particular C library implementation being linked to. Note the various crt*.o files in the following link command concocted by gcc. These files come from the GNU C library.

$ gcc -### -o hello hello.o 
/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/collect2 \
  --eh-frame-hdr \
  -m elf_x86_64 \
  -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
  -o hello \
  /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o \
  /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/crtbegin.o \
  -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.8.0 \
  -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/../../../x86_64-linux-gnu \
  -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/../../../../lib64 -L/lib/x86_64-linux-gnu \
  -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu \
  -L/usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/../../.. \
  hello.o \
  -lgcc \
  --as-needed -lgcc_s --no-as-needed \
  -lc \
  -lgcc \
  --as-needed -lgcc_s --no-as-needed \
  /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/crtend.o \

What does this mean if I've compiled some of my program with compiler A (from some vendor whose C library is in /usr/A/libc.a, say) and some with compiler B (from another vendor whose C library is in /usr/B/libc.a)?

It's tempting to say that C compilers are strongly coupled to their library, so we must link via some unique C compiler and use only its library. Does this preclude using another C compiler for some of our program? I answered more-or-less in the affirmative... but it's not true! There are two clear (in hindsight) bits of evidence to the contrary.

The first is that empirically, it's easy to see the same C library being used by multiple compilers. The important thing is that there's only one set of library headers. When I install clang on my Linux box, it happily uses the incumbent glibc library headers when compiling. When linking, it happily issues the right linker command to link with the glibc binaries. Indeed, it issues a very similar linker command to the one we saw earlier. We can again see the glibc-provided crt*.o objects being linked in.

$ clang -### -o hello hello.o
Ubuntu clang version 3.2-1~exp9ubuntu1 (tags/RELEASE_32/final) (based on LLVM 3.2)
Target: x86_64-pc-linux-gnu
Thread model: posix
 "/usr/bin/ld" "-z" "relro" "--hash-style=gnu" "--build-id" "--eh-frame-hdr" \
 "-m" "elf_x86_64" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" \
 "-o" "hello" \
 "/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crt1.o" \
 "/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crti.o" \
 "/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7/crtbegin.o" \
 "-L/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7" \
 "-L/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu" \
 "-L/lib/x86_64-linux-gnu" \
 "-L/lib/../lib64" "-L/usr/lib/x86_64-linux-gnu" \
 "-L/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7/../../.." "-L/lib" "-L/usr/lib" \
 "hello.o" "-lgcc" "--as-needed" "-lgcc_s" "--no-as-needed" "-lc" "-lgcc" \
 "--as-needed" "-lgcc_s" "--no-as-needed" \
 "/usr/bin/../lib/gcc/x86_64-linux-gnu/4.7/crtend.o" \

But how does it know about these files? The answer is worse than I had imagined. The file lib/Driver/ToolChains.cpp in clang's codebase embodies a ton of knowledge about linking on different platforms—even down to individual GNU/Linux distributions and versions thereof. Unsurprisingly, this is fragile and has been known to spawn bugs, like this one.

The second bit of evidence is in how most C compilers let you tell them to use a foreign set of headers, which could be from any C library implementation we like. To avoid the “standard” headers in /usr/include you need to use an option like -nostdinc, and then use -I to point it at the right headers. Assuming the headers don't use any non-standard features, there's no reason why any compiler couldn't generate code using any other vendor's library headers.

Of course, “there's no reason why not” often translates to “it is unfortunately impossible”. A data point in this case is provided by uClibc, a small replacement C library, whose FAQ notes that “it is possible in some limited cases to re-use an existing glibc toolchain and subvert it into building uClibc binaries by using gcc commands such as -nostdlib and -nostdinc... [but] it proved impossible to completely subvert an existing toolchain in many cases.” I'd love to dig into the details of what didn't work, but that will have to wait for another day. (It might just amount to the same issue we've noted, i.e., making the C compiler generate the right link commands... but it wouldn't surprise me if there's more to it.)

Another case I haven't handled: what about multiple C libraries (or parts thereof) in the same process? An obvious problem is conflicts in the symbol namespace—since the ABI likely requires a unique definition of some symbols, forcing a choice between the two libraries. (Chapter 6 of the System V AMD64 psABI is in a vague state, but appears to confirm the existence of this kind of constraint.) However, with enough hammering on symbol renaming and scope-localisation, there's no reason why certain portions of two different C libraries couldn't coexist. But it seems unlikely that two implementations of any low-level parts (such as startup and shutdown, threading and signal handling) could be combined in a working fashion without being explicitly designed to do so.

In summary: any compiler “should” in principle be able to generate code targeting any C library on a given platform, but there are inevitably some warts that inhibit this mix-and-match in certain cases. Moreover, knowing how to link to a given C library requires knowledge of nasty implementation details of that C library. These details could perhaps be “promoted” to a vaguely standardised ABI detail of (in our case) the combined GNU/Linux/glibc platform, but this hasn't been done so far.

Another question that Simon asked was whether we could apply some hindsight to come up with a simpler model which avoids the hair-raising complexities that I'd talked about. One suggestion I had was a single virtual address space, which would eliminate the need for position-independent code in shared libraries (since a library's load address could be assigned at deployment time). Later, Raphaël also reminded me that Plan 9 does away with shared libraries altogether, apparently without it costing too much in memory. I'm sceptical that this wouldn't be costly on a desktop system though (think how many copies of the KDE or GNOME libraries you'd end up mapping, for example). I'm also a big fan of Multics-style orthogonal persistence, which ties in quite nicely with the SVAS approach, but is a far-reaching change. Meanwhile, I think the trade-offs surrounding large-versus-small processes and the complexity of near-versus-far addressing modes are quite difficult to avoid (without an obvious performance hit), since they come to us from the hardware. Perhaps we could use reconfigurable hardware somehow to push all that complexity out of the view of compiler writers... but I doubt any hardware designers consider this a problem worth tackling.

I'm giving a “Part 2” follow-up talk on (most likely) Monday 3rd February.

[/research] permanent link contact

Tue, 26 Nov 2013

(Tell me why) I don't like Java

I can just about cast my mind back to when I learnt Java. It was 2001, when I was a 17-year-old C++ programmer. (Actually the particular version of Java I first learnt was called C#, but let's not complicate things.) I remember being fairly impressed. It struck me as a C-family language that had kept enough of C++'s features to do abstraction neatly enough, modulo the occasional grumble, while making some impressive complexity savings by carefully-chosen omissions. (I knew nothing about Smalltalk at the time.)

At the time, many of these omissions were bold, and the way they worked together was, to my 17-year-old self, ingenious. Lack of subobject structure keeps garbage collection simple. In turn, garbage collection (with compaction) allows faster heap allocation, clawing back some of the cost of all these extra objects. Even doing away with templates, by relying on checked downcasts and a common supertype, had its elegance, despite now being supplanted by generics. Grumble-wise, I won't say I wasn't alarmed by the lack of multiple inheritance and operator overloading. But overall I wasn't unsympathetic to what Java was doing.

Nowadays, the mention of Java makes me groan (in my mind). It seems to be a combination of the intrinsic and extrinsic properties that makes it so. Thinking extrinsically, Java has a huge, huge mindshare. Like market share, mindshare is dangerous once one player gets hold of too much of it. Many intrinsic details of Java would be unremarkable warts in another context, but have a giant significance, because they're what a large proportion of programmers think are “the way programming is”. In reality, Java contains a selection of somewhat-arbitrary design choices made by clever but flawed human beings some time within the last couple of decades. By letting them become ingrained, we are closing people's minds and inhibiting adoption of better alternatives. What follows is a quick run-down of how I see the problems.

As a language, it's boring. This is actually one of the least bothersome things about Java, but it needs mentioning here. As a researcher, doing any work that is specific to the Java language feels like work not done properly, because it avoids a lot of potentially more interesting cases. (I'm being a little facetious here, but I hope the underlying point is visible.) Against today's competition, Java seems remarkably inexpressive as a language. C++ is cleaner than ever, Scala is achieving a C++-like complexity curve with nicer syntax and marginally less unclear semantics, and dynamic languages are faster than ever. What keeps Java going is its self-perpetuating ubiquity. It also helps that Java is in a sweet spot regarding IDE assistance: Java code has enough statically-derivable properties to do automations like refactoring and autocompletion reasonably precisely, while being simple enough—unlike C++ and, I venture, Scala—to implement them reasonably correctly without giant effort. Is this a good thing or not? It has no doubt delivered certain innovations to practitioners faster than would otherwise be possible. But right now, to me, its net effect seems to be to stifle innovation in better languages.

As a learning device, it's a dog's breakfast. This is probably my number-one beef. Java is too complex to cleanly expose concepts like procedures, structured data, or pure object-oriented programming—never mind functional styles of programming. Yet it is too semantically simplified (constrained) to reveal essential contrasts and orthogonalities, such as between by-value versus by-reference, heap versus stack, dynamic versus static type checking, and inheritance versus subtyping. Various conflations of these are the deliberate features of Java's design that make it simple (relative to C++, say). But now that Java is ubiquitous, they start to be harmful too. I concede that just because a language isn't the ideal vehicle for exploring conceptual distinctions doesn't make it worthless as a programming tool—far from it. But the effect of mindshare is huge. The concepts of programming are harder to teach in a world where we must teach Java, because it is not a vehicle for cleanly conveying any of these concepts. My impression is that today's programmers learn fewer languages than ever, so it is harder to establish contrasts. Even diverse languages are being shoehorned onto the JVM, further enshrining Java's [bytecode's] limited vocabulary as a set of fundamentals. It's like some perverse “worse is better” situation, except that the usual point in favour of the “worse” solution, namely simplicity, it not in evidence much. As I'll rant in a moment, Java is a very complex beast below the language level.

Portability is a myth. Java's portability was its key selling point, but it's not clear that it has succeeded. Although Java makes it hard to write code with nonportable semantics, that is only a small part of the portability puzzle. Libraries are a big issue. JVMs are complex enough to suffer compatibility problems among each other, too. So now we just have portabilty problems one level up. I've been sceptical about the drive for portability because in general, no two portability requirements are quite the same. Having “the same” behaviour among every program built using technology X”, which is my paraphrase of Java's portability sell, is primarily a benefit to the authors of technology X, not to the software's users, nor even to the application developers. For example, as Ian Lance Taylor pithily blogged (talking about Tk), saying that applications look the same on any platform means they look odd on every platform. Attempting to insulate the JVM and its libraries from all details of the host system has become a dogma pursued far beyond its usefulness. This theme continues....

As a deployment infrastructure, it's a mess. Again in the name of “portability”, the Java platform tries to hide all trace of the underlying system, defining its own conventions for configuration (classpath, system properties), resource limits (Java heap size), its own archive formats, its own security model, and so on. The problem is that the underlying system always has some way of doing these, and by duplicating this functionality, the end result is excess complexity and hard-to-understand interactions. Even despite massive uptake, the JVM isn't the only runtime. The approach of eliminating complexity by defining “one platform to rule them all” is a hopeless modernist ideal. In practice, it just adds one more platform to the mix, causing a strict increase in complexity. This is the key paradox of attempting to achieving simplicity through portability. It can only succeed if the portable thing completely hides the underlying complexity. This is less and less likely the higher up the stack you try it. Instead we should aim to build systems out of simple (but not necessarily portability-providing) pieces, then achieve systemic assurances by reasoning about whole compositions. Typically, the kind of “whole composition” I'm talking about would be a runtime plus an operating system plus an instruction set architecture. (Interestingly, I contend that stopping at the ISA is sensible; we don't have to go all the way down to microarchitecture. Unlike JVMs, ISAs are low-down enough that they can hide the underlying complexity fairly well.)

As a development infrastructure, it is diabolical. Debugging and dynamic analysis on the Java platform are hugely, horribly flawed. I wrote a paper about one aspect of this a while back. Primary evidence is how the leading implementation (namely Hotspot) implements various profiling and debugging tools (like hprof) using private interfaces (namely the “serviceability agent”), because the pseudo-standard interfaces aren't good enough. And that's in spite of their unfathomable complexity. These interfaces—JVMTI, JDI and friends—are not officially part of the JVM specification, and are pseudo-standards in that no two implementations are quite alike. They also offer inherently poor coverage, because two large chunks of the process's code—namely natives and the VM itself—are respectively not covered, or covered in limited fashion using a disjoint mechanism (which may or may not be implemented). As a result, if you want to do something as simple as observing all allocations of Java objects in your program, you have to do three different (and fiddly) things: bytecode instrumentation for a selection of bytecodes (creating objects and creating arrays), handle JVMTI's VMObjectAlloc callback, write JNI function inteceptors to catch creation from native code. Even then, you're a long way from understanding the memory behaviour of your program, since, surprise surprise, native allocations—including (on most VMs) those made by the VM itself—are actually very significant. There was even a paper at OOPSLA 2010 about this. There are analogous problems in other kinds of dynamic analysis in Java. Try implementing an information flow analysis at the JVM level, and you will be stymied, because information is continually flowing through native code and through the VM itself, and these are wide, complex, undocumented interfaces. (By contrast, doing this at the whole-process level requires only modelling the system call interface, which is manageably-sized and stable.) Java-specific interfaces simply cannot cover, by definition, the whole behaviour of your program.

Debugging technology has gone backwards This is a bit of a specialist rant, so forgive me. It builds on what I just wrote about dynamic analysis. The conventional approach to debugging is takes a simple and high-coverage abstraction, namely the memory image of the debugged process, as the baseline. On top of this, we selectively describe how source-level language features are realised. We does this by documenting the compiler's (and runtime's) implementation decisions, using standard (albeit quirky and complex) formats like DWARF. Apart from GNU Java, no Java implementation I know does this. Instead, they rely on Java-specific interfaces. In so doing, they take a big technological step backwards, for no better reason than expedience. I can excuse VM prototype developers for taking the short cuts of knocking up an in-process debug server with a fixed wire protocol, and building shim “tool interfaces” as thin abstractions of their internal APIs. But in a platform that is the outcome of billions-of-dollars product development, there is no such excuse. As with portability, Java adopts an approach which can only work if it owns the entire system. Since it does not, it fails to eliminate any complexity, and instead just adds more.

As a culture, it's conspicuously herd-like. I suppose this is a universal property of communities. Languages as practical tools tend to take on the properties of the individuals using them. That's why Haskell is over-clever, Python is hack-filled, C is rarely well commented and Java is bureaucratic and verbose. In each case you could say the same for these languages' advocates. My distate for Java owes partly to the fact that it is favoured by not-so-good programmers and middle managers. The effect permeates the community. Java people love reinventing things which existed before—only this time, it'd “pure Java!”. They also seems to love XML, a technology I despise. They outdid themselves by combining these two properties to spectacularly ill effect, as known to anyone who's ever edited a file called build.xml. Reading about Eclipse plugins makes me jaded. I don't know whether it's the chicken (programmers) or egg (Java language) that's to blame for all these problems. Either way, practice has forgotten far too many principles: of keeping it simple, of not duplicating mechanisms, of using information hiding to hide what is change-prone rather than just to make your code more complicated. The principles of object-orientation itself are long forgotten. To illustrate this latter point, I defer to the very nice (developer-focused) talk by Kevlin Henney titled after William Cook's careful statement that “it is possible to do object-oriented programming in Java”.

Here's an anecdote to finish. I just read a research paper containing the following two sentences.

The first kind of behaviors is interface behaviors. (Please note that the name has nothing to do with Java Interfaces.)

Sadly, I can understand why the qualification is necessary. But it really shouldn't be. If we're doing research, what we're doing should transcend the immediate technology, and we should be accustomed to a presentation which reflects that. But even in research communities, too many people equate doing X “for software” with doing X “for Java”, and this is a sad state of affairs.

[/research] permanent link contact

Thu, 10 Jan 2013

Systems versus languages

Somewhere buried within one recent magnum opus in these pages I highlighted a contrast between systems and languages. I also noted that OOPSLA conspicuously contained the word “systems” in its title. This juxtaposition probably seems incongruous to some, but it is one close to my heart. I see myself as a researcher tackling problems usually labelled as “programming languages”, but with a “systems” mindset.

Richard Gabriel's very interesting piece about incommensurability in scientific (and engineering) research, using mixins as an example, makes some remarkably similar observations. (There's also a video of a presentation from last year's ClojureWest, but I haven't yet watched it.) He distinguishes the “engineering” papers that first discussed mixins from the “scientific” papers that subsequently studied and formalised them. And, without disparaging the latter, he very much emphasises the underrated value of the former. (He also makes some very incisive observations about apparent misunderstandings and omissions in the influential Bracha & Cook paper. A major point, of course, is that they need not be misunderstandings per se—incommensurability is the posited explanation.)

It's nice to get a historical perspective on these matters, from someone like Richard Gabriel who has seen a lot of ideas come and go. In part, his appraisal of the changing role of engineering papers offers me some reassurance that I might not be crazy. In conversation with other researchers, it can be frustrating that declaring my interest in programming languages is taken so often to imply that I do theoretical work. But I am more interested in languages as systems—my interest has to do with their “useful-for” properties, not “abstractly-is” properties. (Of course, I'm glad people are working on the latter, and I try to relate their findings to what I do... while accepting that it's not what I do.) Another interesting historical tidbit is that Gabriel estimates that my kind of work—engineering approaches to programming problems, you could say—was “outlawed” from (yikes!) roughly 1990 to 2005. I suppose it's handy that I started my PhD in 2006. The feeling of “separate camps” is still very much there, of course.

[/research] permanent link contact

Thu, 06 Dec 2012

Bridge that gap

I don't do IDEs.

(This post isn't really about IDEs, but humour me.)

I used to do IDEs, back in my Turbo Pascal days. And I want to now: I value the things they promise. But the reality, for me, always seems infuriating and limited. They get in my way more than they help me.

One example: Eclipse. I used Eclipse for Java programming back when I was an undergraduate, and mostly bent it to my will. Since then I've done fairly little Java programming, and a combination of mind-rot (mine) and evolution (Eclipse's) has left me sufficiently unfamiliar with recent versions that I am unable to use them effectively.

I just tried again. I wanted to open a third-party Java codebase's Eclipse project that I had downloaded from that codebase's web page. I am in the Java perspective, and I click the “Project” menu, hoping it will let me “Open project”. But no! “Open project” is greyed out. Woe is me.

Greying out UI elements is a terrible, terrible practice that should never have been invented, because it doesn't tell you why something was greyed out, so leaves you clueless about how to ungrey it. But no matter. Being a researcher, two ideas occur. Both are ways in which we could, for “little cost”, add back the ability for a program to explain why a widget is greyed.

Idea one: we should really be using some modern programming technology under which the “greyedness” state is described declaratively. In that case, the program's code will contain a concise description of exactly what causes the menu item to be greyed. I could check out the code to understand why it was greyed. Or perhaps this condition could be reflectively exported to the user. Either way, if only we had written our UI in some declarative style! Then our problem would be easily solved. But, alas, we haven't written it like that. Instead, greyedness is the emergence of some maze of twisty little imperative procedures.

Idea two, luckily, is more immediately applicable. Let's “shadow” each UI element with some extra information to do with its greyedness or otherwise. When we set its state to greyed, we snapshot some property of the program state, like the address of the caller who is turning on greying, or the whole callchain, or whatever. Then I can somehow query this shadow—maybe by attaching a debugger to Eclipse, or whatever—and have it tell me why it was greyed.

A thought occurs. Is this a general-purpose approach to “programs that explain themselves”? (Hat-tip: “Functional programs that explain their work”, by Perera, Acar, Cheney and Levy, seeded that phrase in my mind, although the general idea has been with me for much longer.) Interactively querying for explanations of program state or program output seems like a general, powerful tool, both for programmers programming and users UI-using.

Aha! you might say: this could work for a Dr Expert Developer , but there's a problem for poor Joe User. The “explanations” will be in terms of program objects, not “something the user understands”. Therefore, you would argue, my approach is not very useful except to niche-case users like me. But to this, my rejoinder is: why shouldn't they understand? If an object-oriented program is properly abstracted, it will have a “surface” level of objects which model the domain as the user sees it. Dialogs and buttons and widgets and text entry areas are all just objects, and users understand these more-or-less fine (accepting caveats from the HCI audience).

It seems to me that paving this continuum, between user-facing and program-level abstractions, is one of the great promises of object-oriented programming. I wouldn't say it's fulfilled, but then, I wouldn't say we program in a terribly object-oriented way most of the time. When I was quite a bit younger, I found it odd that Alan Kay would simultaneously have been working on user interfaces and on object-oriented programming. More recently I think I have begun to grok this connection. My latest Eclipse problem is yet more evidence of why it's a useful one.

This connection is also one which functional programmers of the Lisp-y, Scheme-y schools understand. The abstractive power of these languages is supposed to be used—I've heard it said, at least—to construct domain-specific abstractions which naturally model the elements of the program's work (its “objects”, you could say, except that here we won't). In this way, the program itself is lifted up to a level of abstraction which the user, by definition, would understand. (Emacs users might be able to tell me how well this can work out... anyone? Oh, all right.) I lean more towards the object abstraction than the lambda, but perhaps it's six versus half-a-dozen.

Perhaps disappointingly, that's mostly the end of this post. But I can't resist appending a rant about my own work. What I'm convinced of is that the object abstraction is everywhere... it's just latent a lot of the time. It's in C programs, functional programs, windowing systems, filesystems, OS kernels, spreadsheets, web browsers, IDEs. It's latent in any chunk of state at all. Yet the promise of seamlessly bridging that gap—of constructing a continuum between the programmatic and the user-facing—is not yet with us. That's because there are hundreds of different ways in which this state has been constructed and encoded, and no infrastructure unifies them. Classically, we can only shrug and say that people weren't enlightened enough to write their code the “right way”, using the right languages, targeting the right runtime infrastructure, and so on. But more likely, those “right” things were never fully realised, and never will be. Either way, what we need is a postmodern object-oriented runtime: one that can find the object abstraction, and present it to us where it exists, even if that presentation is something of an illusion—an adaptation, you could say—of reality. (This concept is also similar to views in a relational database.)

What would such a runtime look like? What's a precise definition of the problem it's solving, even? Both of these are hard questions to which I have no complete answer. But it's no coincidence my PhD was about adaptation (although I'm not saying you should read it). And my more recent side project on DwarfPython (that I would love to pursue, if only I could find a way of getting paid for doing so) is also tackling a subset of the same problem space. Instead of a language implementation owning its object representation, can we build one which gives up that ownership, but rather takes on the job of “seeing” arbitrary chunks of state as objects it can manipulate? The idea of DwarfPython is to use object producers' debugging information to do just that. More generally, in-memory state is not the only kind of object; I also have an interest in unifying files with objects. Again, rather than the classical approach of persistent programming languages, we can take a more postmodern approach in which each file has the freedom to represent its contents, or “object state”, in a different way, or a variety of ways, subject to appropriate interpretation. This is thematically similar to the work of my Bachelor's dissertation; although that pursued a far too classical approach, it was still trying to unify files with objects. So, finding the object abstraction in unusual places seems to have been a theme of my work from day −1, even if I didn't realise it at the time....

[/research] permanent link contact

Mon, 03 Dec 2012

Tools or not tools

Jim Coplien's keynote at SPLASH this year was a peculiar one. It featured two particularly provocative sentiments: firstly that too much abstraction is a bad thing, and secondly that building tools is not what we want to be doing. (The latter was actually due to Dave Ungar, during questions, but met with vociferous agreement from the speaker.)

These remarks met with some puzzlement from much of the audience, judging by a series of apparently disparaging remarks during subsequent conference talks and question sessions. Who could disapprove of abstraction or tools? I think there are some reasonable answers to that question; what follows is my attempt. (I have no idea whether it coincides with Coplien's or Ungar's.)

Abstraction all the way up

The abstraction issue is perhaps not terribly controversial. Most programmers are aware that abstractions present a trade-off. The temptation to abstract endlessly can be a rat-hole that distracts from actual progress on the task at hand. Ian Lance Taylor once blogged a fairly similar opinion. If you abstract too far, you abstract away essential features of the domain, rendering it unrecognisable. This yields “abstractions” that are actually complex, not simple, to use. Abstracting over multiple use cases, i.e. generality, is a common offender here. For example, rolling your own implementation of a graph algorithm can be easier than figuring out how to tame the monstrous generality of something like the Boost graph library. (Pardon my C++; no doubt you can think of your own examples.)

Sometimes, abstractions exploit specificity, by packaging up common case usage patterns. This can be very useful. In fact, in interesting counterpoint to the Taylor piece above was Rustan Leino's note about loop structures in a subsequent SPLASH keynote: inferring loop invariants is one of the hard problems faced by any verifier. By constraining the form of a loop, it becomes easier to find its invariant. Abstract loops are an extreme case of this, since the loop itself is in library code and not in user code, so the invariant need be found only once. But of course, just as Taylor hinted at, any user forcing themselves only to use such loops will end up spending rather a lot of time structuring their code to accommodate this constraint. (In this way, it shares a lot with other syntax-directed reasoning tools, including type checkers. These tools are superficially easy to market—hey, look, it shows you bugs in your code. But there is a hidden cost to using them, deriving from implicit constraints on how you can structure your code such that it interacts nicely with the tool. If you don't stick to these, your tool fails in some way, like false-positive type errors or solver timeouts.)

To end my rants about abstraction on a complaint, I could also roll out one of my previously-blogged complaints about common styles of functional programming—with liberal use of polymorphism, or unification of higher-order with “ordinary” operations (think currying, or juxtaposition-is-application), code can become needlessly hard to read. Lazy languages add the unification of storage with computation, which I concede is sometimes an incredibly useful abstraction, but easily makes the memory behaviour of your program incredibly difficult to understand.

What about tools?

For me, the most interesting issue concerns tools. Dave Ungar phrased it something like as follows: “if every bolt under my car had a little handle on it, I wouldn't need to get out to go and get a wrench”. So, let me frame the contrast I believe he was making as one of tools versus run-time systems. Dynamic O-O environments are very much systems, geared around the ability to push new capabilities down into the system's fabric, rather than having them sit on top. This “fabric” is what emerges from combining the messaging metaphor (messages are fundamentally proxyable) with dynamic extensibility (adding new messaging behaviour is a local change during runtime, not a far-reaching change at compile time). As I have rambled about previously, the lower some functionality is integrated into a system, the more pervasively available it is, so the more power and leverage it confers. Smalltalkers and other dynamic language advocates know this. It's a very tricky thing to convey to the unfamiliar. It's even harder to measure. Most of us don't use runtimes that have this amount of dynamism and immediacy, although Javascript may yet change that. Operating systems, not least Unix, are also dynamic runtimes in this way, although their inability to see inside application means (unfortunately, and avoidably) that a large amount of useful code and data (hence potential “extension”) is opaque to them.

Tools are fragmentary; runtimes are integrating One reason people develop tools and not runtime extensions is that integration is hard. If you write a command-line tool, you get to define its input domain, output domain and behaviour from a clean slate, according to your convenience. This is often (though not always) easier than plumbing something into a runtime, which is a bunch of code somebody else wrote. But let's imagine making the leap. To get slightly more concrete, suppose the “tool” we're interested in is a dynamic analysis tool—pick your favourite bug-finding, race detection, memory leak detection or other tool that fits the general mould. What's better about having it as a “runtime” rather than just a “tool”? Well, its functionality would be embedded right there in your running program. As a consequence, it supports exploratory, interactive, programmatic use. If you dropped to a REPL in your program, the innards of the tool would be laid out across your program state, pushed into fields on program objects. If your tool is a race detector using a lock-set algorithm, for example, then each object's lock-set would be accessible as a field on that object. If you're using timestamps or vector clocks, they would be there too. You're also not stuck with a fixed amount of insight the tool's authors saw fit to provide (e.g. when trying to track down the source of a data race); the tool's code is a service you're free to extend. Getting interactive, exploratory, programmatic usage seems like a useful payoff for the effort of integrating your tool into a runtime. Arguably, then, the challenge is building runtime infrastructures that are not unduly difficult to extend like this.

Progress? Is the “tools not runtimes” tendency getting stronger? “Systems, languages, applications” is the conference title's invariant. “Tools” is nowhere to be found. My vague impression is that today's programming research is more tool-focused, and less system-focused, than 20--30 years ago. (A near-dual property is also true: self-proclaimed “systems” research has less programming focus than it used to. I used to bemoan this frequently while doing my PhD in the systems research group in Cambridge.) But why? Simplistically, we might just say that integration is hard. I think there is something more subtle at work. Integration of research techniques into runtimes arguably scales poorly—since we all have to integrate into the same runtime, we have to achieve consensus on that runtime's interfaces. Tools, being freestanding and piecemeal, arguably scale better. You could say that lots of small, freestanding tools are the postmodern way, whereas “one true runtime system” is a classical ideal. (It's fitting that Noble and Biddle's “Notes on Postmodern Programming” was recognised at SPLASH this year for its influence, in the Onward! strand.)

Avoiding classical fallacy In the battle of the classical versus the postmodern, normally I side with the postmodern. How can we square this with the desire for the benefits of the runtime approach as I described it? I should probably save my thoughts for another post. But two ideas come to mind. The first is one I've already mentioned: design a runtime infrastructure that is specifically easy to extend. But that seems to be begging the question: if we knew how to build this magical runtime, and what kinds of extension it would need to support, we'd already have done it and solved the problem ages ago. For this reason, we also need the second idea: we need to get into the mindset of tolerating a lot more heterogeneity. Very briefly, it means pushing radically downwards our notion of “runtime” so that most of the typical implementation decisions of an object-oriented runtime, such as dispatch mechanisms, introspection mechanisms and object layout, are actually user-level decisions in such a system, but still recognisable as the abstractions they represent. In other words, our system can descriptively extract extract latent object abstractions from the contexts in which they emerge in existing systems, given descriptions of these latent abstractions. This contrasts with traditional runtimes, in which the object abstraction constructed by the runtime implementor in a way that is prescriptive. And hey presto, we are back to my VMIL 2011 workshop paper: we already have a very powerful descriptive technology, in the form of debugging infrastructure for native code; our task is to bend it to this new purpose. So, end of rant for today.

[/research] permanent link contact

Fri, 01 Jun 2012

Metacircularity (or: “what's Java got to do with it?”)

Before I begin, here's an executive summary. Metacircularity is not about self-interpretation at all. Rather, it's an engineering approach to re-using as much code as possible between different parts of a toolchain (including compiler, runtime and debugger). This is noble, but limiting ourselves to working in a single language is needlessly restrictive. If we get over our presumptions about“language barriers” (cf. Oracle's disappointing attempt at explaining metacircularity), we can apply the same re-use philosophy to supporting a diversity of languages, not just one.

I've recently found myself wanting to understand the concept and purpose of metacircularity in language implementations. This is because I've become interested in understanding the Maxine JVM, which is often described as having a metacircular design.

All this meant to me at the time was that it's written in Java---making it a self-interpreter, certainly. But is it meta-circular? What does that mean? Why might it be a good thing? As I will show, if we're even just a tiny bit pedantic (which I hope we are), then metacircularity is not really a special case of self-interpretation, but a completely separate concept.

I've found two very helpful papers describing metacircularity. The first is that of Chiba, Kiczales and Lamping talking about the “meta-helix”. The other is Ungar, Spitz and Ausch describing the Klein VM, a metacircular implementation of Self. The first paper really emphasises the idea of implementation by extension which is at the heart of metacircularity. They note that use of metacircularity [is] “to allow extensions to be implemented in terms of the original non-extended functionality”. As the paper goes on to discuss, there is a tricky bootstrapping problem inherent in this. If we don't keep careful track of the dependencies between all these extensions, subtle and not-so-subtle bugs, most obviously infinite recursions, can arise. The paper is about avoiding confusion of metalevels, and as they propose, the shape of a helix, not a circle, makes much more sense in describing what supposedly meta-circular systems are actually doing.

The second paper, by Ungar et al, is more of a practitioners' view: it shows what VM builders consider to be a metacircular design, and what they hope to achieve by it. After reading these two papers, reading some other things, and scratching my head a lot, it became apparent the primary goal of metacircularity is to solve two very practical engineering problems concerning re-use: re-use of code and re-use of debugging tools. They mention the code re-use issue directly, by saying that in traditional designs “an operation such as array access must be implemented once for the runtime, once for the simple compiler, and once for the optimizing compiler”. The question of tool support is also an explicit motivation in the same work: they lament that in the non-metacircular Self VM, “to inspect a Self object... [in] an application that has crashed the VM, one must invoke a print routine in the VM being debugged, [an approach of] dubious integrity”, that the VM “must be able to parse both Self and C++ stack frames”.

So, perhaps prematurely, I'd like to propose a new characterisation of metacircular VMs that I think captures their true nature. Metacircular VMs are interpreters (in the general sense) that are carefully constructed to have a dense re-use structure. They do this by expressing as much as possible---front-end, compiler, runtime---as extensions over a small common core. This core has some interesting properties: it is designed explicitly for extension, and includes the necessary foundations of debugger support. It is the former which allows code re-use, and the latter which enables the same tools to see all the way up the stack, from various levels of VM code up to application-level code.

Note that this is fundamentally different from a self-hosted compiler. Under self-hosting, the compiling compiler is not re-used at all in the compiled compiler. It is just used to implement a translation function, from end to end; how it does it is completely opaque. By contrast, in a metacircular VM, you can invoke your hosting runtime to perform part of your work---asking it do “do what you would do in case X” by calling the corresponding function (or sending the corresponding message, for Smalltalkers). The trick is to ensure that these helper requests are correct and well-defined, meaning they do not cause infinite regress (the obvious bug) and do not confuse meta-levels (the more subtle bugs mentioned by Chiba et al).

As a consequence of this fundamental difference, the key challenge of metacircularity is not just that of “implementing language X in language X” it's dealing with the bootstrapping problem. What is a suitable common core? How can we make it small? What extensions must it permit, or may it permit? How can we structure the extensions on top of one another, so that they can express what we want, re-using what we want to re-use, and efficiently?

So, we've established that “self-interpretation” is irrelevance. But it seems that most metacircular designs are, in fact, self interpreters, right? I actually consider this to be false. Even when staying within “one language”, the fundamentals of the bootstrapping process means that at a given level in the interpreter, certain language features may only be used in restricted ways. Sometimes these restricted subsets are given a name, like “RPython” in the PyPy project. In other cases, they are not named. But in all cases, there are restrictions on what functionality at some level in the system it is is safe and meaningful to invoke at the metalevel. Indeed, this is exactly the “helix” shape that Chiba et al were describing. In other words, different parts of the interpreter are written in different sub-languages, precisely in order to avoid infinite regress. Just because there is continuity between the core language and the eventual top-level language doesn't make them “the same”, and for this reason, metacircular VM designs are not self-interpreters.

If I were to write a ranty summary of the above paragraphs, it would be that the apparently “beautiful”, head-twisting, recursive, quasi-mathematical aspects of the metacircular design---the things which language nerds get excited about---are both irrelevant and illusory. Metacircularity is motivated by engineering pragmatics, not “deep” linguistical or mathematical concepts. (Homoiconicity, itself a concept of overrated interest, is an orthogonal concept to metacircularity, despite what at least one blogger has written.) I believe this fixation with superficial observations about language stems from the documented inability of many programmers to divorce concepts from language. (For “documented”, I can say at least that Lamport has commented on the problem, and in this case I agree with him. I have big disagreements with other parts of the same article though. I will post about those in the near future.)

So, having stated that re-use is the good thing about metacircularity, why is re-use so much easier in a metacircular design? The reason is that we have a common core providing a coherent and adequate set of services---the services embodied in the “bootstrap image”. And I say “services” and not “language” for a reason. The core really is a set of runtime services. As I have explained, is only a distant relation of whatever high-level language the author is intending to realise. In our current technology, re-using code and re-using tools across languages is hard, and so “build everything in the same language!” seems like a useful answer to a VM author's problems of API-level interoperation and of tool support. Metacircular designs are the result (because it's the closest you can get to doing everything in one language). But as I've just described, the “same language” property is an illusion, and there are inevitably many languages involved. It just happens that in current projects, those languages are designed to be as similar as possible to one another---featurewise increments, in effect. But instead of this unimaginative perspective, anyone building a metacircular VM should ask themselves: how can I design my core services---the core of the VM---to support as many different languages as possible?

This will sound familiar to anyone (so, hmm, maybe ten people on the planet) who has read my “Virtual Machines Should Be Invisible” paper. Although it doesn't approach the problem from a metacircularity perspective, this paper is all about building an infrastructure that can support a diverse variety of languages, sharing code and tools between all of them.

Currently, our shared base infrastructure is a POSIX-like operating system. Every VM author (even those interested in Windows, which I'm cool with) implicitly targets this infrastructure. Unfortunately, these systems don't provide enough abstractions. As such, different language implementors build their own infrastructure which reinvents similar abstractions in incompatible ways---including functions, objects, garbage collected storage, run-time self description, exceptions, closures, continuations, and so on. We can clearly avoid this pointless diversity without sacrificing innovation. Just as with operating system interfaces, there is never complete quiescence or consensus, but we still manage to share a lot more software between OSes than we did in the pre-Unix or pre-POSIX days.

One of the mitagating techniques which my VMIL paper describes but which metacircular designs don't use is: describe your implementation decisions. Don't encapsulate them! If you implement a certain language feature a certain way, describe it. There is nothing fragile about this, because your descriptions will be written in a standard way and consumed by an automated interpreter---called a debugger. This is what native debugging infrastructure does. VM-hosted debuggers, of the Java or Smalltalk flavours, don't do this. To make the value of this approach clear, let me finish with another example from the Ungar paper, where they proudly state that Klein VMs can be debugged remotely, and in a post-mortem fashion, using another Klein or Self VM. “A separate, possibly remote, Self VM hosts an environment that manifests the innards of the Klein VM at the source level. Thanks to Klein's metacircularity and Self's mirror-based reflection model, Klein can reuse a vast amount of already-written Self programming environment code.”

What the authors are forgetting here is that this is not a new facility. Native debuggers have long had the capacity to inspect remote processes. Smalltalk-, Self-, and Java-like designs took a retrograde step by forcing debugging to exploit the help of a server within the VM. Although this has the benefit of allowing the debugger implementation to share the introspection services already present inside the VM, it requires a core of the VM to remain working correctly, even after a failure, which precludes many cases of post-mortem debugging. By contrast, trusty (or crusty? your choice) old native debugging is necessarily designed for this as a common use-case.

The approach I advance instead, as described in the VMIL paper, is to implement introspection on the same infrastructure that supports remote inspection---which happens to be the DWARF infrastructure, in the case of DwarfPython and modern Unix-based compiler toolchains. This is very similar to the Klein approach, in which mirror objects may reflect both local and remote state. But it completely avoids the myth that we should implement everything in a single language. Indeed, debugging information formats like DWARF are actively concerned with supporting a wide variety of languages. One Klein process can inspect another because they share a set of implementation decisions. By contrast, a native debugger need share nothing with its debuggee, because native debugging infrastructure includes a facility which is fundamentally omitted from VM-hosted debuggers: the language implementation explicitly describes its own implementation decisions. It does this down to the machine level, and moreover, up from the machine level.

The result is that given a memory image of a crashed program, we can recover a source-level view of its state at the time of a crash. VM-hosted debuggers are fine for user code because encapsulation and memory-safety protect enough of the VM implementation that the debug server can still work. (Notice I don't say “type-safety”! Type-safety is just an enforcement mechanism for encapsulation, not the key property that ensures encapsulated state is not corrupted.) These VM-level guarantees do not have such a nice property if the failure was due to a bug in the VM itself. This is because the invariants of the VM's own data structures are by definition broken in this case. Some might argue that this is a minority use case, so VM-hosted debugging is fine for general use. Personally I don't mind, as long as I have a debugger that can see all the way down. Currently this doesn't include any VM-hosted debugger, but perhaps it could do. (One of my perhaps-future small projects is to create an implementation of JDWP that knows how to answer queries about native code.)

In summary, I think of the solution to re-use proposed by metacircular designs as a degenerate case of the approach I am pursuing. It sounds strange to most people, but it is not too much to ask for a language-agnostic runtime infrastructure that supports a plurality of language implementations (going right down to native code), direct sharing of code and data, and orthogonality of tool support from language. As I ranted about in the VMIL paper, this infrastructure is a modest and very feasible generalisation of what already exists, with basically only performance questions outstanding. (I'm working on it.) Given this infrastructure, the same careful bootstrapping approach can be used to share code and retain tool support throughout higher-level language implementations. But we can do this without the requirement that everything be in a single language, which doesn't make sense anyway.

[/research] permanent link contact

Tue, 20 Dec 2011

Cathedrals, bazaars and research groups

[Post-hoc clarification: at the time I wrote this rather grumbly post, I was working in the Department of Computer Science at the University of Oxford. It doesn't necessarily reflect any on other institution whose domain you might currently be seeing in your address bar!]

A fe months ago I finally got around to watching the video of Guy Steele's “Growing a Language” talk from OOPSLA '98. It's a spectacularly entertaining and insightful talk.

(It's also a nice demo of how a good keynote doesn't have to be Earth-shattering, as long as it's compelling in concept and delivery. Really, the meat of the talk is quite specific: it's about how language evolution should be managed, with particular reference to the then-ongoing attempts to add two features to Java: generic data types, which we all know and love, and operator overloading, which still hasn't made it.)

It was a nice reminder of the two “organisational modes” of collaborative effort that Eric Raymond called The Cathedral and the Bazaar. Building software is one activity where these metaphors apply. Designing languages is another. Research groups are a third.

Like language design and the construction of any large software project (think Linux), research groups aren't a “fully collaborative” activity. Rather, they are “partially collaborative”---it's not that everyone is working with everyone else, but rather, different participants are interested in different pieces of the overall puzzle. There will always be multiple frontiers of progress open concurrently---but all building on a shared central core.

When I was in Cambridge, the group I was in was very much a bazaar in style. There was no unique leader (but rather a gaggle of four or five faculty). Group communications revolve around a mailing list and weekly meetings where discussion was open, informal talks were and anyone would be free to raise questions big and small.

It wasn't a problem-free group, either in general or for me personally. For my first year in the group, the bazaar was dead. That was a tough time---mainly because communication structures reverted to small cathedrals (and I wasn't really a part of any of them). Even later on, I must admit I didn't always feel completely at home. I was a programmer-oriented researcher in a performance- and applications-oriented group. But in hindsight I appreciate that the group's bazaar-like communication structure and ethos were a very good fit for me, even if the topic selection wasn't great. By the end of my PhD, I found I was getting some reward from my participation in the group. in two ways. For one, my work had gained some degree of recognition in the wider group---I felt I had, in my own small way, “shaped the agenda” at least in a tiny corner. (Sadly this was not enough to get others on board with similar work, but also not miles away from that either.) For another, I had participated in the more topic-independent community aspects of a research group---organising the talks for a while, participating in discussions at talks and on the mailing list, being around, organising events, and so on.

I was recently lamenting to myself---a favourite pastime of mine---how right now, my work isn't a “part of” anything. Nobody cares about what I'm doing, or so it seems, and conversely, I find it hard to get enthused about what those around me seem to be doing. But then again, I have very little idea of what their work is, nor they of mine. There is a lack of transparency and a consequent lack of spontaneity. Without cross-linking communication structures, there just aren't the opportunities to spot synergies and find common interests. I have found this a bewilderingly opaque and unsatisfying environment almost since I arrived, but I only recently realised the reason: that it is a severely cathedral-organised group. There is no institutionalised process for cross-talk (such as frequent group meetings or mailing list), and while there are multiple frontiers open, each is coordinated from the top. This clearly works for a lot of people, but not for me. Does that say anything about the kind of researcher I am, or others are?

As an addendum: it's worth briefly mentioning the “agile research groups” idea, one example of which is Scram of Mike Hicks and Jeff Foster. Eric Eide also mentioned to me he uses some of these ideas, to varying degrees of success, in the Flux group at Utah. Coincidentally, I recently dropped in on both these groups! I think these techniques are mostly orthogonal to the cathedral-versus-bazaar issue: they concern the manner (frequency, duration) of communications, not the topology. I expect Scram works best when participants have a common goal, i.e. there may also be tighter topic-coherence requirements on its suitability. These may perhaps even be more likely to hold in a cathedral-style group, although there is certainly no hard-and-fast causal relationship there.

[/research] permanent link contact

Wed, 14 Dec 2011

Heterogeneity or homogeneity: what's the problem?

My attention was recently drawn to a problem that some web developers call the “language heterogeneity problem”. I'm not sure where the term comes from; in fact it is not as widely used as I was led to believe. But still, most people who have done web programming know that there are a lot of languages that people use for the web, not usually out of choice per se, and that this is somehow a problem.

The phrase “language heterogeneity problem” immediately jarred with me, since some of my work has been looking at heterogeneity of language as a goal, not a problem. Surely, we want to choose the best language for each part of our program, and not pay any unnecessary cost when doing so? Of course, the problem is about choice versus imposition. It's not that the ability to use multiple languages is bad. It's that in any given context, we don't have that ability! Consequently, you're forced to use particular languages for a given piece of code. This is the true “heterogeneity problem”. I'd couch the problem as lots of small homogeneity problems, not one big heterogeneity problem.

One of my recent student project ideas, sadly not yet attempted (or indeed advertised), is to develop a compiler back-end and run-time library that would let us compile vanilla C programs into web applications. So, for example, if I do printf() it will write some text to the page, and if I do fgets(..., stdin), it will generate a form field whose submission action is to activate the continuation of the program. There are some interesting extensions to this project. How do we partition a program into its client- and server-side halves, or its client-, server- and database-side thirds? Can we tune the partitioning given a set of requirements for security, interaction latency, client code size, and so on?

(There is also an interesting converse to this problem. Most programming languages' standard libraries are designed around a Unix-like model of I/O. And the first programs we teach---like the Hello World program---use this facility explicitly, by printing to streams and reading from them. But we now live in a world where most familiar forms of computing don't have an obvious terminal- or stream-style of I/O evident in their interface. So perhaps clinging to these examples is creating a barrier in front of potential students---who won't relate to the idea of a program doing I/O through a terminal?)

At SPLASH, I discovered that one chunk of my proposed student project effort has been scooped by Emscripten, an LLVM-to-Javascript compiler. However, since writing such a compiler would be too much work for a single project anyway, this might actually be helpful in enabling a single student project to achieve more working stuff. In other words, they could focus on matters other than the compiler, or on doing interesting domain-specific analyses on user code. Alternatively, perhaps a keen student could try to make their own compiler that does a better job than Emscripten, in some way. Hopefully I will managed to advertise the project in time for the next academic year.

[/research] permanent link contact

Mon, 05 Dec 2011

Refactoring refactoring

A little while ago I embarrassed myself in conversation by blurting out a sceptical opinion of refactoring. In this post I'll explain some opinions on refactoring and related research, hopefully in a more considered and coherent manner that I managed on that occasion.

I admit that my inclination to be a bit negative comes from prejudice, with a couple of origins. One is that a while ago, I had to fend off (rhetorical) claims that refactoring solved what my PhD was doing. It clearly didn't, but then again, it was well worth writing an explanation of why not. (These claims were from my supervisor, not my examiners, thankfully, and I think were being advanced rhetorically.) In that context, interface evolution scenarios were the issue. I still contend that refactoring is not the solution to interface evolution. (Endlessly editing code to stay in sync with “one true” “current” version of an interface, whether with or without the help of refactoring tools, is an unnecessarily burdensome approach; an automated assist for editing code doesn't make that editing work- or risk-free.)

More refactoring

Happily, most uses of refactoring are quite different from interface evolution: they're about the internal structure of code, not “edge” interface details. I'm a big fan of refactoring in these cases. As a practitioner I'd love to have more and better refactoring. In that space, one clear improvement would be refactoring tools for more languages. This doesn't mean starting again; most languages are more alike than they are different. At the moment, the popularity of refactoring serves to cement the Java hegemony. This is my other unreasonable prejudice: I dislike this hegemony, and so refactoring culture is tained by association (unfairly) in my mind. It'd be really nice to have some decent refactorings available for C++, but I'm not holding my breath. That said, I might not know about them if they do exist.

(Aside: the real problem with C++ is not pointer arithmetic or lack of garbage collection or lack of type safety or anything else that people usually trot out; it's complexity. I'll rant about that in a future post. By contrast, Java does well because it's a simple language. Actually it does unfairly well because researchers abuse its syntax tree as an intermediate representation for program analyses. I might rant about that in yet another post.)

That's my practitioner's view over with. As a researcher, I have one qualm remaining about refactoring. Rather than doing research on making refactoring work in more and more scenarios, I want to see some exploration of a few bigger ideas in the same general space. There are ideas that are more powerful, more “revolutionary” than refactoring but have (so far) less currency.

Language-independent refactoring

Language-independent refactoring is an obvious goal. The means by which to achieve it is less obvious. A shared metamodel seems sensible. The need for a shared metamodel is arguably a limitation. But I don't buy that argument! My reasoning is based on the observation that most languages have large sets of features that are cognate. By this I mean they are not just theoretically equivalent in what they can express (or perhaps not at all equivalent in that way), but rather, a human user understands them in the same way. (I wish we had the empirical foundations to substantiate that, but that's another matter.) So if we can capture these, we can probably come up with a metamodel that works well in practice, even if it fails for adversarially-constructed cases.

Conversely, just to throw another controversial claim into the mix: languages that have such a pared-down set of primitives that they don't offer a cognate of some recurring feature---like purely functional languages without mutable objects, or C without virtual calls---in practice do have cognates, but appearing as patterns of use rather than delineated language features. So I seem positing some sort of Chomskyan “universal language feature set” that is in programmers' minds if not explicitly in all languages. That feels a bit strong; I'll have to take some time to see whether I agree with myself there.

(As an aside: of course, many languages have too many features, in that they are mutually cognate: do I use static variables, or a singleton object, e.g. in Java? Do I use typeclasses or traits, e.g. in Scala? Do I specialise using template specialisation, overloading or overriding in C++? These self-cognate features usually have arbitrary limitations, and their diversity exists for implementation reasons. Exposing a choice among them leaks implementation details to the programmer, along with consequent performance characteristics. So, forcing the programmer to select among these early on, as these languages all do, is an evil akin to the evil of premature optimisation. Fortunately, it's an evil that refactoring exists to fight!

(Continuing the aside: we perceive “clean” languages to have few mutually cognate features. Conversely, most mutually-cognate features differ along some potentially-separable dimensions: each feature mixes a particular setting on each dimension. For the “static versus singleton”, having a chunk of data that is “one per program instance” is the main concern, and dynamic-versus-static allocation is the orthogonal issue that is unhelpfully mixed with it. In a Java-like implementation, object identity is another mixed concern: it's something you get in the singleton case, not the static field case, and effectively for reasons of implementation leakage. Conversely, in C and C++, statically-allocated stuff can still have its address taken, so there is better separation of concerns in that case.)

Non-behaviour-preserving transformations

Digging deeper than language-independent refactoring, it seems that refactoring's main value is in its ability to improve code by reversing bad decisions that were somehow expedient earlier. But underneath that, there are two cases. Firstly there are cases where you refactor because you got the abstract design wrong earlier (e.g. you assumed there was only one Frob per Widget, and in fact there might be many). Secondly are the cases where you got the abstract design right, but the code-level design wrong, i.e. you didn't map the abstract design optimally onto language features (with respect to maintainability, efficiency, ...). To me, it feels like there is too much emphasis on the second case, while the first one is harder and more interesting.

I think this is because automated refactorings aim to be behaviour-preserving. But since the two problems are very close---they both arise from forced premature commitment and the programmer's failure to anticipate the future---we should perhaps use the same tools to combat both of them. In other words, the requirement that refactorings should be behaviour-preserving actively limits what we can do. So how about some bolder approaches that might sidestep the problem? While these approaches might violate the letter of the definition of refactoring, for me, they retain the most useful charateristic of refactoring: by a single localised change, we can effect global changes on our codebase.

The only work I know that does automated non-local code edits that can change program behaviour is Coccinelle, based on the “semantic patch” idea. Aspect-oriented programming is a similar technique, but works by effectively (and controversially) delocalising run-time semantics rather than performing non-local code edits. I'd like to know if there are others more like Coccinelle already in existence.

So, suppose we discard the restriction of limiting ourselves to behaviour-preserving edits. One direction under this auspice is to creep closer towards model-driven development. I want the ability to change my “model” (either in my head, or in some modelling notation) and see changes reflected in source code. And also, vice-versa: if we do have a modelling notation, code changes should update the model. This is a hard but interesting problem in bidirectional transformation, which has something of a currency at the moment (witness the BX workshop).

Logic metaprogramming

A final thought is about logic metaprogramming. This is a very cool idea that I have not yet got up to speed on. In fact, almost all I know about it is from the abstract of a paper I saw at SPLASH last year, which said: “In logic metaprogramming, programs are... derived from a deductive database.” But this one sentence is so intriguing that I want to run for a while with what I think it might entail, before I find out what is actually done in existing systems (of which there are few!).

I've often wanted to program not by writing code directly---since I'm often aware that the code I'm writing will probably turn out “wrong” or “bad” once I've done a bunch more coding---but by making a sequence of simpler statements that I have more confidence in. Each statement should be small, freestanding and less of a commitment than writing a line of code would be. These statements might be such that none of them, by itself confers enough information to write a "known good" piece of source code. E.g. I might write that each instance of class A “has a[n]” associated instance of class B, but I don't yet know whether this association should be expressed as pointers, or by some associative data structure, say. This decision could be determined later, by solving constraits originated by other small statements. Ties could be broken (i.e. multiple candidate solutions selected among) by extrafunctional requirements such as performance (which might favour pointers over associative structures).

This is related to program synthesis and refinement methodologies, I guess. But I am particularly interested in making it exploratory. By having a tool explore the implications of the programmer's statements, we can potentially refine our understanding of the problem (a.k.a. “debug the design”) without going through the circuitous path of first writing some “bad” code and then either finding it's not the right design (and cleaning it up by heavyweight code changes) or finding it's incidentally messy (and cleaning it up, just by automatic refactoring if we're lucky). We can also have a tool tell us what the “right way” to code something is, but early. If the only solution to a particular set of requirements is to use a particular language feature, then the tool can tell us this, rather than letting us find it out by making the wrong choice and then backtracking. Of course, we need to get the requirements right up front, so this technique will only ever be a complement to more backtracking-oriented techniques.

Multi-dimensional representations of software

It is a very classical notion that programs have one true form, being their source code in the form of a string of symbols. Refactoring sticks with this idea but tries to make it easier to alter that form, by abstracting and automating certain common complex non-local edit patterns. But we can go further by rejecting the notion of “one true form” altogether, at least in the sense that that form is manipulated by programmers.

Of course, this is the MDSoC holy grail. I think the idea is just slightly too big for its own good, at present. Ironically, or fittingly, it has not been decomposed properly: aspects, refactoring and typeclasses are the main programming weapons that share its spirit, but none has its power or elegance. It's a shame that work on the idea seems to have fizzled out. (It's also a shame that the paper isn't on more reading lists!)

Somewhat relatedly, there's been some interesting work on subjective/transient/dual encodings of language features, as with the registration-based stuff at last year's Onward!, or Rob Ennals' Jekyll. But I'm still not aware of any mature tools that can really rip apart the modular structure of code and transform it on demand. Perhaps one problem is that we need to be able to define what primitive entities these queries “select”, and how they reformulate them into the appropriate bigger chunks---ideally in a language-agnostic way. So it's back to the shared metamodel. Again, better understanding of “cognate” language features, and indeed of less intuitive correspondences between language features (like the nontrivial correspondences between algebraic data types and class hierarchies), will help here.

[/research] permanent link contact

Guided by folklore

In a recent chat with my internal examiner, Andy Rice, I had a few thoughts which I decided to write down. It turns out he reads my blog---along with (his words) “everyone in the department” ---so, hi Andy and everyone. One day I might stop writing as if my audience consists only of myself, but not right now.

In summary, I want to rant about two weird things that go on in the research world. One is that there are some memes that seem to have a real influence on how PhDs are examined, but seem to have no origin other than folklore, and are different from the standards used to judge other research. The second rant, and perhaps the more oft-repeated, is that we actively encourage boring research.

(I should add that although this post is rather ranty, the chat was not an argumentative one. So, this is mostly post-hoc ranting about related topics, and not a direct reflection of our conversation.)

A thesis is judged on criteria from folklore, beyond what applies to “normal” research. At various points in my PhD, I heard it said that “a thesis should... [do X]”. Usually, X was something to do with telling a complete story, strongly substantiating a succinct hypothesis, and so on. And now I have heard the same from my examiners. Unfortunately, these statements continue to be just that---hearsay. They're different from the ways in which other research is judged. There are no regulations or official guidance to support them. There are no clear scientific or moral justifications for them either. The research community happily publishes many papers that do not tick these boxes, and at good venues. My own OOPSLA '10 paper is one example, but there are lots of others. But despite this, PhD examination seems to give a lot of currency to these criteria, for apparently no reason other than their having been handed down through the generations.

During my PhD I didn't worry myself much about this, since, like most researchers, I don't put much weight on unsourced claims. Besides, there seemed to be enough data downplaying their significance anyhow---several other theses seemed to break the rules, and plenty of published, respected research papers did too. Surely if a PhD is training for research, the qualifying criterion should be focused on doing good research? From my very limited experience, and from what I gather from listening to others, this is not how things currently are. Fortunately, I am of the bloody-minded type. I was aware that I might be “creating trouble” for myself, but I personally preferred to risk creating that trouble, thereby at least gathering some evidence about it, rather than swerving to avoid an obstacle that was at best nonexistent (I didn't know it would cause trouble) and at worst, worth challenging. So, consider it challenged! If you think a thesis needs to be anything more or different than good research, I challenge you to justify that position.

Now, on to my second rant. The evaluability problem has an irrational hold on many practical computer scientists, to the extent that research into many important problems is deliberately avoided. I spoke to many experienced researchers about my PhD work as it went along. Several of them suggested that I might have some trouble at examination. This seemed odd to me, for the reasons I just ranted about. Nevertheless, I didn't disbelieve them. But I had no intention of applying the fix they suggested. Rather than developing an alternative evaluation strategy or (the best advice in hindsight) to maximise the persuasiveness of the presentation of whatever evaluation data I did have, the only “advice” I ever received on this point was a not-so-veiled encouragement to abandon my current problem and work on something else. “Up and to the right” was what one researcher told me---about the kind of graph that should be in my evaluation chapter. (My evaluation chapter has no graphs, and is staying that way.)

This attitude is the tail wagging the dog. If a problem is important, and we do some research that is not conclusive, we should damn well work harder at it, not give up. The problems and curiosities of humankind are not regulated by how easy it is to collect data and draw graphs about them. If we avoid working on important but difficult-to-evaluate problems, or discourage such work, it shows the worst kind of ivory tower mentality. It is far from a pragmatic position, despite how (I'm sure) many of its adopters would try to spin it. What is pragmatic about ignoring the real problems?

I'm not downplaying the importance of evaluation. It goes without saying that measuring the value of innovations is important. Moreover, our ability to measure is something we need to work on actively. After all, many of those physicists and other “hard” scientists seem to spend nearly all their time working out ways of measuring stuff. So I'm completely in favour of rigorous evaluation. On the other hand, I'm not sure that a lot of evaluation that currently passes muster is really rigorous anyway. We need to recognise evaluation as a problem in its own right, whose hardness varies with the problem---and make allowances for that. For many hard problems, evaluation of a solution is comparably hard. That shouldn't mean that we give up any attempt to tackle those problems. The preference for conclusive results in published research has a deceptive influence, being essentially the same phenomenon as the “decline effect”, described in this very interesting article from the New Yorker.

There are some other problems with evaluation in particular kinds of CS research. One is what I call “evaluation by irrelevant measurement”: if you develop something that is supposed to help programmers, but you can't measure that, how about measuring its performance or proving its type-soundness? It says nothing about whether you've achieved your goals, but it still ticks those evaluation boxes. And of course we have a big problem with reproducibility of experimental results---at the VMIL workshop at SPLASH, Yossi Gil gave a great talk about the non-reproducibility of VM-based microbenchmarks, and Jeremy Singer's Literate experimentation manifesto was a nice counterblast to the wider problem.

I have found programming language researchers to be more sympathetic than “systems” researchers to work “towards” a goal, as distinct from work telling a complete story about some problem. This is partly because the nature of programming language research makes reliable evaluation a very high-latency endeavour. In other words, until real programmers have used your idea in a large number of projects, there will be no clear experience about how well it works. So, being computer scientists, we mitigate that latency, using pipelining. Rather than a slow stop-and-forward algorithm which waits 20 years between research projects, we have to be more amenable to two approaches: argument, in the sense of paying attention to the reasoning that justifies the approach of a particular piece of work, and speculation, meaning allowing the research discourse to explore many alternative approaches concurrently, and letting time tell which ones will “stick” out of the many that have been given a chance. The job of the researcher is less to conclusively show a problem as solved, but to show that a technique is feasible and has some potential for wide and successful application.

Going back to the first point, perhaps I should add that I'm not saying that my thesis would have stood up any more strongly by “good research” criteria. But having said that, a very large chunk of it appeared at a top-tier venue, so it can't be all that bad. Both of my examiners seemed to miss this fact, so the lesson is: always put a prominent summary of your publications in your thesis! Personally I can be very critical of my thesis work. But it seems bizarre to me that folklore should have so much sway in the way that theses are examined.

[/research] permanent link contact

Sat, 12 Nov 2011

Static versus dynamic analysis---an illusory distinction?

When writing a recent talk, I found myself arguing that static and dynamic analysis are not really that different. At least, people don't really agree on the distinction. Model checking people frequently argue that what they're doing is dynamic analysis, because it directly explores paths through a system's state space. Meanwhile, abstract interpretation people would argue the opposite, since clearly model checking is an instance of abstract interpretation, and so is all other static analysis.

I'd much rather avoid the debate entirely. Since model checking is a far cry from run-time checking or testing, my sympathies initially lay with the abstract interpretation camp on this particular issue. But the distinction becomes even more ill-defined in other cases. In particular, I've been thinking a lot about symbolic execution, of the kind done by KLEE and other tools. Is it doing a static or a dynamic analysis? I'd challenge you to defend either position.

(Meanwhile, execution environments which are somewhat speculative, like transactional memories, lazy evaluators, or even plain old branch prediction, can be considered as partial runs of a static analysis. But I should save that line of thinking for another post.)

So rather than talking about static or dynamic analyses, here are some dimensions of analyses that I think are more interesting.

That's all for now. Let me know if you can think of any more noteworthy dimensions!

[/research] permanent link contact

Mon, 22 Aug 2011

Pipelines (are lazy functional composition with recombination)

Unix pipelines are often held up as a paragon of compositional virtue. It's amazing how much they can do, how efficiently and how (relatively) simple to use they can be.

Programming languages people especially like this sort of thing, and functional programming clearly has some synergy with pipelines---which are, clearly, a kind of functional composition. Microsoft's F# language even uses the pipe to denote functional composition.

But pipes are not just compositions of functions mapped over lists. They have two other properties that are critical. The first is well-known, and the second less so.

First is laziness. Since each process in the pipeline explicitly forces its input, pipelines can be used on infinite streams (or portions thereof) just the same as with finite data.

The second is what I call recombination. Each stage in the pipeline can have radically different ideas about the structure of data. For example, pipelines typically combine characterwise (tr) linewise (sed) and whole-file (sort) operations in one go.

This is much harder to achieve in a functional setting because you don't have access to an underlying representation (cf. character streams on Unix): if your part of the pipeline understands data differently than the previous one, you have to map one abstraction to the other by writing a function.

Meanwhile, the shell programmer has a lower-level set of tools: familiar recipes for dealing with lower-level idioms such as line-breaking and field separation that recur across many different kinds of abstract data but may be recombined using the same tricks in each.

The upside of the abstract, functional approach is that you preserve the meaning of your data, even in the face of some heavyweight transformations. Meanwhile, shell programmers are often driven to frustration by their separator-munging not working properly once things get complex.

The downside is that it's more work, and more repeated work, because you might (in the worst case) have the quadratic problem of having to map n abstractions to n-1 other abstractions. By contrast, simple pipelines can often be quickly hacked together using a few standard idioms. They can even be written to be robust to changes in the underlying abstraction (such as numbers of fields), although this takes some extra care.

A middle ground might be to characterise those idioms and formalise them as a functional abstraction, into which many higher-level abstractions could be serialized and deserialized, but without going all the way down to bytes. Perhaps such a system already exists... it sounds a bit like a reflective metamodel of functional records or somesuch. Hmm... perhaps Lispers have been doing this for decades?

[/research] permanent link contact

Fri, 08 Jul 2011

In praise of (good) workshops

Publishing has at least two roles in science. On the one hand, it is a means of disseminating results, concretising the “progress” made by completed work. On the other hand, it is a source of feedback: it's a pillar of the peer review system that rejected papers receive feedback. Meanwhile, the idea behind conferences and workshops is, at least in theory, that each presentation will stimulate discussion and further feedback.

During my PhD I learnt the value of a pattern which seemed to suit my style of work, as follows. When work is under way, write an in-progress style of paper. This gathers early feedback. If it gets accepted (which it usually will, if the idea is good), you also get the benefit of the presentation and the subsequent feedback and discussion. Later, once you have results, write a full research paper to present them. Inevitably, you will have a lot more to say this time round. Some things will have changed, too. You will be able to write a better paper than before, because the earlier paper gave you some idea how to present the work and how to address its perceived weaknesses. (I admit I've only completed this bit of the cycle once so far! There is more in the pipeline....)

When I first went to a conference, I was surprised at how little conferring was going on. Many talks received only a couple of questions. Workshops, on the other hand---at least the good ones---set aside more time for discussion. Smaller audiences at workshops make it more likely that people will initiate discussion as a talk goes along. The lunch and other break times tend to have a more discussion-heavy vibe than those between conference sessions. This is perhaps, again, because a small number encourages more discussion. Also. the workshop group tends to “stick together”, rather than in a conference where people diffuse between sessions. I guess single-track conferences are better in this respect, but I've only been to one of those, and I don't recall a lot of high-quality discussion that time.

(Poster sessions are not bad either, for discussion, if your poster can grab people's attention. But they are painful to present at... never again, I have vowed.)

Recently I had it put to me by an experienced researcher that workshops are not worth bothering with: they're just for people who are starting out, or for less good work, and they stop you from publishing at a better venue. I sympathise because I've been to some bad workshops, and seen some decidedly poor “research” presented at them. But that's an argument for greater workshop participation, not less. Submitting interesting ideas to workshops, for discussion, is exactly what's supposed to happen. The reason that they degenerate into small conferences for mediocre-or-worse work is precisely because they don't get enough good submissions by good people. Some workshops are established, and get good participation, and work very well in that form.

Prior publication at workshops is a subtle thing, but in short, is not something I worry about. I have certainly seen workshops having (online) digital proceedings but from which it's common to see follow-up papers appear later at conferences. I'm not sure whether this is because workshop papers, being more preliminary presentations of work, simply “don't count” (an opinion I've heard voiced) or because those follow-up papers present quite a large delta. For the kind of work I do, a big delta is not hard to achieve anyhow---the contributions of the workshop paper would mostly be in argument, “position” or “idea”, together with perhaps some motivating experiments and preliminary results. Implementation and ensuing experimental work is saved for a full paper. Archival is cheap nowadays, so the convenience of having a printed proceedings accessible from the same place where we can find all the other papers shouldn't be seen as giving equal contribution-weight to these papers. (Suggesting otherwise seems to me to be endorsing a “numbers game” approach to the evaluation of research. Heaven forbid that we actually decide for ourselves what the contribution of some paper is, by reading it.)

I can see this split being less applicable for more theoretical work. The more abstract the formulation of the problem, the less there is to argue about. For practical work, discussing the problem set-up and high-level approach is very valuable. Even when work seeks to build big systems, the idea behind some work is often much bigger than the part that you are actually able to implement in practice. It's nice to have an opportunity for the bigger ideas to be discussed, reviewed and recognised.

A final reason for me to enthuse about workshops is that I'm one of the “little guys”. So far I've worked only on my own. I don't have big collaborative projects whose conference papers I can parachute onto. And I don't have very many coworkers who I can discuss my ideas in detail with. Workshops are a support infrastructure that I particularly need---for feedback, and also, perhaps slightly cynically, to maximise the exposure my work gets. Ultimately I want to convince people that my research vision is worth investing in. It's important that I take up opportunities for conveying my potential---which I believe to be great!---as well as what I've achieved, which will never match up to those who are habitual collaborators. Of course I'm not opposed to collaborating---far from it, but I just can't seem to find the right person....

[/research] permanent link contact

Wed, 06 Jul 2011

Functionality, not (just) reliability

I'm still a newcomer to verification. The more reading on verification I do, the more evident it is that most work in the area is interested in checking fairly boring properties of programs: usually that it doesn't crash, don't corrupt memory, or that some simple assertions hold. In this post I'll argue that thinking of a provable absence of these properties as “job done”, or even a primary goal, is counterproductive: it overlooks the hidden costs of verification, and overstates the value of such proof. There are far more important top-level goals than verifying these properties, yet there seems to be a real danger that research is overlooking these almost entirely.

To explain myself, I'm going to highlight three distinctions I've come up with about different kinds of work that can loosely be called “verification” or “formal reasoning about programs”. None of them are commonly mentioned, and certainly not by the names I've given them. But they seem fairly important to me, and in each case, one side of the divide is sorely neglected.

Positive specification versus negative specification

I call this kind of “doesn't fail” property “negative specifications”---they say what the software shouldn't do. I'm not trying to denigrate any work that verifies against these specs. It's often more than difficult enough to check these “boring” properties statically, without making things any more “interesting”. Nevertheless, focusing only on negative properties seems to neglect the classical goal of verification, which is to check that an implementation satisfies a specification which captures its intended functionality.

Tackling this means addressing “positive specifications”: what the software should do. This is a similar distinction to that between liveness properties and safety properties. But it is not the same: specifying liveness in the sense of “not deadlocking” or “not hanging” is still a negative specification. Really, what defines a positive property is that it has something to do with functionality, and not whether it's stated in a way that uses logical negation. We should aim to specify positive properties that capture an isolated facet of our system's intended functionality, expressed in application-level terms: perhaps small behaviours that a program simulates (but doesn't bisimulate), or small functional dependencies that a program reproduces.

Conservative versus “liberal” software construction

I had a hard time thinking of the right antonym to “conservative”, and “liberal” will have to do (as “reckless” was my other candidate).

Some classical approaches to programming have a highly conservative property that I call compulsory proof: the programmer cannot make progress without ensuring that some specification remains satisfied. Static typing is the most familiar example: type-checking is, from Curry-Howard, proving a property about your program. You can't do much with a program that doesn't type-check. Refinement-based synthesis methods are another example: a refinement step is only valid if it preserves the specification, and provably so.

As we scale up to larger and more complex programs, and larger and more complex specifications, these approaches start to inhibit progress. It's interesting how dependent types are still a “nearly there” feature---nobody has made them usable yet. I'm told that making programs type-check becomes harder and harder under increasingly dependent types. This is unsurprising: we're making the specifications (a.k.a. types) more complex, so harder to prove satisfaction for. It seems that the most pragmatic solution so far, is to relinquish the insistence on proof at all times, a compromise adopted by Edwin Brady's Idris.

The idea of compulsory proof is flawed because programming is usually exploratory. As a programmer, I often start coding before I fully understand the problem. The process of programming provides necessary illumination. So, presupposing any specification that I must doggedly adhere to at each stage, whether from the tyranny of a type-checker or the tyranny of a refinement procedure, is anathema---it actively stops me from acquiring the understanding I need to complete the task. In the worst case, compulsory proof is a net loss: it slows us down far more than it helps us. What's unfortunate is that our research evaluation methods don't account for this. No programming research considers the net worth to a human programmer. Instead it prefers the mathematical orthodoxy of derivable properties such as preservation of type safety. I suspect these are poorly correlated.

The polar opposite to these conservative approaches, which I call “liberal”, is exemplified by dynamic languages. Here no proof of adherence to any specification (type-based or otherwise) is statically enforced. The programmer is free to break his abstractions at any time---even at run time. The only specifications of that are enforced are those of pre-existing machine-level abstractions---integers, pointers and floating-point numbers---whose integrity is enforced by dynamic checks only.

Like most polar extremes, neither fully-conservative nor fully-liberal approaches are often optimal in practice. I think the consensus from mainstream languages (C, Java, C++) is that static checking using a simple type system is a good idea (and I'm sure even most die-hard Python programmers often yearn for some basic static checking). The jury is still out on the cost/benefit ratio of more complex type systems, even such as Java's generics. Ken Arnold vociferously argues against Java generics , and when I first read his opinion, I was sceptical of his view---surely more static checking can't hurt? Nowadays I can see his points about the cost in complexity and, particularly, in the proof burden on programmers.

Meanwhile, from the liberal end of the spectrum, bug-finding tools like KLEE are interesting: we liberally (or recklessly) allow bugs into our program, then try to whittle them down. It's bug-finding and not verification because we can happily specify properties that the tool can't conclusively show are not violated. A consequence is that in practice KLEE doesn't terminate for most nontrivial programs. On the other hand, the space of assertions that we can use KLEE to check is large: at least in theory it subsumes type-safety (noting that a specification of type-safety can be encoded by instrumenting an untyped program with assertions using a suitably-defined instanceof-like operator) and arbitrary application-specific assertions. There are a few problems with KLEE: it's limited by what we can express as assertions (or built-in properties); it doesn't terminate in enough cases; and it tends to find uninteresting bugs which, while not technically false positives, might well be if we could write a more refined input specification. Nevertheless, I like it because it treats perfection (i.e. proof) as the limit case, not the immediate goal. (It's no coincidence that I'm basically working on addressing these weaknesses right now, including making it terminate for more programs---or at least, I would be if I wasn't wrestling with LLVM bugs the whole time.)

Constructed reliability versus emergent reliability

Using any of these techniques takes time. Writing specifications takes time. Even if you say “my verifier requires no annotations!” (like a lot of Dawson Engler's work, including KLEE), you probably rely on assertions. Even if you restrict yourself to “belief”-style latent specifications (like Engler's 2001 SOSP paper), they got there by somebody writing code. If you rely on “mined” specifications, as recovered by a bunch of work (like this and this and this) you have to gather a corpus and run a mining algorithm and check for false positives and then use some other technique to hunt the false negatives.

In other words, despite half-claims to the contrary, we make our software reliable by pouring some amount of labour into it. New, exciting techniques are exciting because they deliver more with less labour.

But there is an appalling lack of holism here. (Aside: Thomas Ball's call for holism is an interesting read and I applaud it; here I like to think I'm going much further!) What other approaches do we have for making software reliable? How about making it easier to write the code, so that we have more labour to pour into the reliability side?

In other words, nonfunctional concerns trade off against each other, and also against functional concerns! Time spent coding new features is time taken away from static analysis or testing or any other reliability-focused work, and vice-versa. It's also time taken away from profiling and optimisation, from refactoring and documentation, and so on. So in other words, it all helps. Software's value isn't a sum of independent parts. The parts are interdependent; what benefits one benefits all. Reliability can be deliberately constructed by applying specific reliability-focused tools, and hacking at code until it passes muster by these tools. But also, it can emerge from superior development processes that made it easier for programmers to build in reliability in the first place.

Now let me lament my own position. In the research world, only a relatively narrow selection of approaches get any funding. Reliability is a perennially hot topic. It's unquestionedly considered sound research motivation to trot out lines about the importance of reliable cars or reliable trains or reliable nuclear power plants. Similarly, it's routine to trot out analogies with civil engineering, bridge-building and the like. Reliability is important, for sure. But thinking holistically, that doesn't mean we have to attack these problems by building tools with the narrow remit of making sure nothing bad happens in whatever code they're given. Improving all parts of the development process can contribute to these goals, and have untold other economic benefits in the process. Too many researchers' mission statements list reliable software as top priority. But reliability is just a means to economic (or “value”) gain. Hardly any researchers will say their goals are “quality software”, or “functioning software”, or “economical software development”. Why not?

A senior researcher here in Oxford recently pitched his work (to industrial visitors) by saying that verification is ensuring that “nothing happens”. I hope that was a gross simplification for pitching purposes, because we can do a lot better than that.

To finish, let me shamelessly bring on some of my current “background” research interest and activities. I'm interested in good ways of re-using existing code; good ways of adopting programming language innovations without rewriting the world; good editing tools (including refactoring and a whole lot more---I'll blog shortly there); good dynamic analysis tools, including a good debugger (again, I'll blog more shortly). Of course, I didn't manage to find a job on any of these ideas. Since my PhD, days, I've felt as though I was perhaps the only programming researcher in the world whose top priority is not specifically code that is reliable, per se, but the bigger yet seemingly more obvious goal of code that does what you want it to.

So, am I crazy? I took a job in verification because I thought I'd learn some program analysis skills that would be generally useful, including (later) application to the problems nearest my heart, not just to approaches sanctioned by the reliability orthodoxy. But it's a draining experience to be railing against orthodoxy all the time, especially when you feel like the only lunatic in the asylum. I'm not sure how much longer I can take it.

[/research] permanent link contact

Tue, 14 Jun 2011

Post post viva

I blogged previously about my PhD viva. I've finally got the examiners' reports, and was quite surprised by the difference between the two. Suffice it to say that one was much more positive than the other, and reassuringly for me, the more positive one was also from the examiner who is both more experienced in the role, and more familiar with my research area. (It's probably not hard for most readers to work out which one is which.)

I'm only kicking myself since I could, given this information, perhaps have steered quite a different path through the viva that would have resulted in far less extra work being demanded of me. Nevertheless, the same points in favour of doing the “corrections” that I am doing (a.k.a. self-financed development work) still stand from my last post, so I shouldn't kick myself too hard.

[/research] permanent link contact

Wed, 13 Apr 2011

PhD examination

So I passed my PhD viva a couple of weeks ago. I do, however, have a lot of corrections to do. In fact I have about the most corrections I could have, in hours of work terms, without having to resubmit my thesis. Thank God I don't have to do that. As it happens, the actual corrections to my thesis are not very many. I have to add the odd paragraph here and there, and collect a small amount of extra data. The killer is the non-thesis bit. I'll talk about that in a moment.

There's a lot I could say to summarise my feelings about the viva. Here are the two words I've been using most when people have asked me how it went: “reasonable” and “annoying”.

For the “reasonable” part, I have to thank my examiners, Andy Rice and Alex Wolf, who deserve credit for the depth at which they got to grips with my thesis. I was quite impressed with their attention to detail. Although I can (and will, shortly) disagree with their take on what is necessary or sufficient to substantiate my thesis, I also appreciate how my doing so is very much challenging a norm... and the examination process isn't the right place to do this. Examination is a pragmatic business, and when considered less on intellectual high ground and more in terms of personal risk and reputation, I could not reasonably have expected (at least not with high probability) their taking a different position.

For the “annoying” part, in short, I was far too idealistic in my conception of the PhD examination process. Of course it has some room for intellectual rigour; but virtually no research in any practical field has such unthreatened validity that examination doesn't fall back on “due diligence” to some extent. Another word for “due diligence” is “hoop-jumping”, and that really sums up why I think my thesis attracted the complaints that it did: it didn't jump enough established hoops to make the examiners feel comfortable rubber-stamping it. I'm not saying my thesis is great; it's fairly weak really---but it's no weaker than a lot of other theses which seem to pass without problem. I suppose the examiners did rubber-stamp it in the end, given that I passed---but subject to corrections which, unsurprisingly, make it jump an additional hoop. I don't feel that jumping this hoop substantiates the thesis any more strongly, and this is the centre of my annoyance.

A new rant about an old problem

My problem is not a new one. Introducing a new language is a relatively common thing for a CS researcher to do. Assuming the claimed benefit of the language is a practical one, rather than a theoretical one, then evaluating the language is a huge problem. PhD students don't have the time or the budget to carry out large field studies. Anyway, instead of this, the usual approaches are to prove something about the language, to show that it has reasonable performance, and/or to apply it to case studies. I'm going to be bold and claim that the first two are hoop-jumping in most cases. It's a rare case indeed where a language's goal is actually to realise the theoretical property in question or to “do X really fast”. (Of course, other kinds of work, in theory and systems respectively, do have these as express goals, but I'm talking about languages here, where “language” is distinct from “calculus”.)

It's reasonable to set for your language a performance or theoretical goals in addition to your main goal, as this can be a source of interesting problems and brings the work closer to applicability in practice or interest in theory. However, it really annoys me when people confuse these goals. I hate seeing papers which introduce some new language feature that is claimed to help programmers---the usual end goal of any language---and then evaluate it either by an irrelevant proof or irrelevant performance measurement. This has the effect of encouraging both a confusion between the main goal of a language and these side-goals, and moreover, encouraging a culture where evaluating the main goal is neglected in favour of the side-goals, or where the side goals are seen to imply the main goals.

Trouble with case studies

Case study evaluation is unsurprisingly the approach I chose. This might have passed muster, except that the other hoop I didn't jump through was producing a complete working implementation. This doesn't mean I didn't implement anything: I did a lot of implementation work during my PhD. But for various reasons, my reach had exceeded my grasp. I had plenty of working examples of the techniques I wrote about, but the code generation side of my compiler had got hairy enough that I decided that it should suffice to show implementability rather than implementation. I think I did this, and I don't think my examiners doubted it either, although they did mince some words on the subject. In the end, they were reluctant to accept this implementability evidence as sufficient defence of the thesis. I couldn't put my finger on why, and I wouldn't say they could, either. Instead, I only got some quite vague questions, in essentially four forms.

The first was: “How do you know your language features are sufficient?” Of course, I don't. Since I spent a whole chapter talking about cases that aren't handled, clearly I make no such claim (although I do identify what needs fixing and how this doesn't break the key abstractions of the language). I do claim that they're sufficient for the case studies, and that since these are representative of other code, that they will be sufficient for a wider range of code. This is demonstrated by argument and careful analysis of code rather than saying “we ran it”. But saying “we ran it” is still subject to error---since realistically, how much testing did you do, and how can you be sure it was enough? The case the examiners seemed to worry most about was the one where, by failing to account for some unspecified detail, some new language feature or altered semantics would be necessary just to handle the case studies themselves, never mind other examples to which I claimed it generalised. I think I provided quite a weight of evidence that this wasn't going to happen. However, even if it did, it would still a matter of error bars, not validity.

The second was: “How do you know you haven't forgotten something in your implementation?” Again, I don't, but I have implemented enough that the implementability isn't in doubt. Even if a fully working version would turn up one or two extra details that need addressing, this wouldn't undermine the thesis.

A final question: “How do you know your language features are necessary?” I still find this question bizarre. The language features exist to handle common cases in a way that saves programmer effort. Every feature is illustrated with a plausibly common example, and any experienced programmer would recognise its usefulness. This doesn't mean they couldn't be eliminated, but doing so would bring a strict degradation in what the language offers the programmer.

What didn't help was that the examiners didn't ask me these questions one at a time, but rather rotated among them with dizzying speed. It was though they themselves hadn't yet separated them in their own heads. Without this, I might have been able to fend them off better, along the above lines. As it was, I can't help feel I did well not to get too put out by it all. I nearly did lose my cool at one point where one examiner suddenly claimed that I needed to do a performance evaluation. I had very explicitly and very clearly excluded performance from any but informal consideration very early in the dissertation, precisely in order to prevent my task from blowing up even further than it already had. Fortunately I managed to argue this one down, although annoyingly, I still have to gather some (meaningless, but fairly trivial to collect) performance data for my corrections.

The “solution”

So, how did the examiners propose that I answer their objections? In the time-honoured hoop-jumping way: to finish the implementation, of course, so that I can say “I ran it”! Actually I only have to get it up to a certain level, rather than finishing everything, which I suppose is something to be glad about. But I had failed to complete my implementation for very good reasons: it's a ton of work, and it was already past the point where its feasibility was established. In hindsight I could have written up this fact better. But I think it was still clear that what remains is a matter of development---which I wasn't prepared to spend any more of my own money to fund, given that I'd already spent 6 months living off savings and consultancy work. Fortunately, circumstances now mean that I have a job which pays enough that by going part-time I can get it done while remaining solvent. (It also had to happen this way round, since if I hadn't been able to submit my thesis without a full implementation, I wouldn't have been able to get the job that is now, indirectly, paying for the implementation's completion.) Of course, my financial situation is an irrelevance as far examination goes, and it has to be that way. The moral is that there is no safety net, and nobody who is truly responsible for your thesis than yourself. The system is accountable to nobody, and it has no incentive for self-improvement... except maybe to the extent that (and over the timescales by which) PhD examinees who suffer negative experiences become examiners who can still remember them. “It's not fair!” as Jennifer Connolly once declaimed, “... but that's the way it is”.

The role of empirical rigour

At the moment, and probably since time immemorial, there is a cohort of CS researchers in the fields of programming languages and software engineering who are vociferously advocating greater empirical rigour in research. Early on in my PhD, I thought that this movement could only be bad news for little old me. I barely had the resources to produce an implementation within one PhD, never mind do an empirically rigorous user study. However, now I think that this movement is actually on my side (as well as the side of “truth” and good science, which I didn't doubt). The hoop-jumping that would have satisfied my examiners---producing a working implementation and running it---doesn't actually strengthen my thesis, and in an empirically rigorous discipline, this would be clear. In turn, it would probably be a more “done thing” to submit theses that don't tell a complete story---because telling a complete story on something complex as complex as a practical programming language, and doing so with empirical rigour, is too much work for one PhD. Perhaps it would be more acceptable to package research “towards” a goal, evidence but not yet conclusive evidence, with its outstanding threats to validity clearly explained, yet unresolved. Instead, in our empirically immature discipline, we try to brush these unresolved threats aside by arbitrary hoop-jumping.

The downside of a more empirically rigorous discipline would of course be that each researcher can't race ahead quite so fast. Within the scope of one PhD, there is far less prospect of telling a neat, complete story. In my case, this would have been both good and bad. For empirical rigour's sake, I would have to have spent much longer on case study, including (probably) starting my thesis with an empirical study. Perhaps all implementation would have to be left for the future, and my thesis's contribution would mostly be on understanding the problem empirically, with a paper sketch of the solution validated by detailed analysis of examples. Of course, this paper sketch would have a weight of evidence behind it. The downside is that I actually like the idea of implementing stuff, and even though I haven't (yet) finished the job (and I am now working on it, again), I would have found it frustrating to embark on a PhD with no intention of completing an implementation.


This post probably sounds like a lot of sour grapes, although I hope it doesn't. It's actually a positive thing for me that circumstances have conspired to give me a chance to finish the Cake implementation, since it will be a useful springboard for future work and perhaps even (gasp) impact. Previously, when I was resigned to not finishing it, it was looking like this would become an albatross. More generally though, I can't pretend not to be a little bit sour about the course my PhD took. Despite making what were defensible and reasonable moves at each stage, the whole thing turned into a bit of a mess and has caused me a lot of pain. However, the mess of the work (which could have been better, but I think was comfortably “good enough”) is a different mess from that of the examination. I am now very strongly convinced that there really is a problem with the attitudes to evidence, rigour and the mythical “completeness” in computer science. If I last long enough in this line of work, perhaps I can help do something about it.

[/research] permanent link contact

Thu, 17 Mar 2011

Everything is a somehow-interpreted file

My last post was ranting about the difficulty of maintaining and debugging configuration of complex software. I somewhat naively advocated the position of using the filesystem to the greatest degree possible: as the structuring medium as well as a container of data. This is good because it maximises the range of existing familiar tools that can be used to manage configuration. But at least in some cases, the filesystem---as an implementation, although sometimes as an interface too---is manifestly not good enough (e.g. in terms of space-efficiency for small files). Instead, we want to make our old tools capable of accessing, through the filesystem interface they already know and love, these diverse and wonderful implementations of structured data.

I have heard many researchers claim that binary encodings of data are bad, that XML is good, or even that revision control systems are bad ways of storing code, for the same recurring semi-reason that good things are things that conventional tools work with---sometimes the phrase “human-readable” crops up instead---and bad things are things they don't work with. You can search XML or plain source code using grep, or so the reasoning goes; you can't generally do the same for binary or diffed-text or otherwise munged encodings. This argument is classic tail-wagging-dog material. Plain text is only “human-readable” because there is some widely deployed software that knows how to decode binary data representing characters (in ASCII or UTF-8 or some other popular encoding) into glyphs that can be displayed graphically on a screen or on a terminal. If other encodings of data do not have this property, it's foremost a problem with our software and not with the encoding.

Unix is arguably to blame, as it is obsessed with byte-streams and the bizarre claim that byte streams are “universal”. The myriad encodings that a byte-stream might model are less often mentioned. I'd argue that a major mistake of Unix is a failure to build in any descriptive channel in its model of communication and storage. Without this, we're left with what I call the “zgrep problem”: each file encoding requires its own tool, and the system offers no help in matching files with tools. If I have a mixture of gzipped and uncompressed text files in a directory---like /usr/share/doc on any Debian system---recursively searching is a pain because for some files we need a different grep than for others. Some approach combining analogues of HTTP's Content-Type and Accept-Encoding with make's inference of multi-hop file transformations, could allow the operating system to transparently construct the correct input filters for searching this mixture of files from a single toplevel instance of our plain old grep tool.

For this to work we need an actually sane way of describing data encodings. (Call them “file formats” if you will, although I prefer the more generic term.) Later attempts at describing the structure of byte-streams, notably file extensions or MIME types, have been ill-conceived and half-baked, mainly because they are based on a simplistic model where a file has “a type” and that's that. So we get the bizarre outcome that running file on /dev/sda1 says “block special” whereas taking an image of the same disk partition and running file on that will say something like “ext3 filesystem data”. Doing file -s might be the immediate workaround, but in reality, any given filesystem object has many different interpretations, any of which may be useful in different contexts. A zip file, say, is a bunch of bytes; it's also a directory structure; it might also be a software package, or a photo album, or an album of music.

Another interesting observation is that these encodings layer on top of one another: describing a way of encoding an album of music as a directory structure needn't target a particular encoding of that structure, but presumes some encoding is used---any that conceptually models the album encoding is sufficient. So we want to capture them in a mixin-like form, and have some tool support for deriving different compositions of them. What would be really neat is if a tool like file, instead of doing relatively dumb inode-type and magic-number analyses, actually did a search for encodings that the file (or files, or directory) could satisfy. Each encoding is a compositional construction, so the search is through a potentially infinite domain---but one that can usually be pruned to something small and finite by the constraints provided by the file data. But by giving it more mixins to play with, as configuration data, it could find more valid interpretations of our data. This sort of discovery process would solve the zgrep problem in far more complex cases than the one I described. Implementing (and, er, somehow evaluating) these ideas might make a nice student project at some point.

These ideas have been with me right from my earliest forays into research-like work. I've yet to really nail them though. My bachelor's dissertation was about a filesystem exposing arbitrary in-memory data. And my PhD work addressed some of the problems in making software accessible through interfaces other than the ones it exposes as written. I've yet to do anything with persistent data or other IPC channels so far, but it has been on the list for a long time. (It's become clear with hindsight that this agenda is a characteristically postmodern one: it's one of building tools and systems that provide room for multiple interpretations of artifacts, that don't insist on a grand and neatly-fitting system being constructed in one go, and that accommodate incomplete, open or partially described artifacts.)

The power-management problem that provoked my previous post actually gets worse, because even when not on AC power, closing the lid only usually makes the laptop suspend. Sometimes it just stays on, with the lid closed, until the battery runs down. For my next trick, I may be forced to come up with some bug-finding approaches that apply to scripts, filesystems and the like. If we're allowed to assert that a file has a particular abstract structure, and check that separately, then we can probably factor out large chunks of state space from real software. In turn, that might shift our focus away from “inputs of death”, fuzz-testing and other crash-focussed bug-finding techniques, and towards the harder but more interesting ground of finding functionality bugs. I'll rant about that in a forthcoming post.

[Update, 2011-3-18: I just ran into an interesting example of this same debate, where Ted Ts'o is actively advocating using a database in preference to maintaining lots of small files, for performance and robustness reasons, in relation to ext4's semantic differences from ext3. Understandably, he provokes a lot of discussion, notably here, where people complain that binary files are difficult to maintain. There is a mix of views and a wide variance of clue level, but the real issue is that no one interface---where “interface” includes robustness and performance characteristics--- is the globally “right” one.]

[/research] permanent link contact

Wed, 16 Mar 2011

Config filesystems, not config files

Like most computer scientists, I really hate tinkering with computers. Actually, that's not true. Like many computer scientists, I like tinkering with computers in a constructive way that achieves something interesting and novel (read: practical CS research). But I hate tinkering that is provoked by stuff not working. I use a lot of software that has episodes of the latter kind---most of it is free software (and, erm, arguably most free software is of that kind, but that's another story).

One recurring pain is that I learn how to configure software the way I like it, but then all that hard-learnt knowledge is eroded by change: change in what software is “current” or “supported”, and also change in the way any given bit of software manages its configuration. If you like, software's semantics often change, particularly at the configuration level.

So often, I'm faced with a bunch of hassle just to keep my configuration working the way I like it. Recent examples have included: KDE 4 clobbering my X resources when it starts up (in a way that KDE 3 didn't); Xorg forcing me now to use xrandr to set up multiple monitors; wireless networks now being preferentially handled using Network Manager not ifupdown.

In dealing with this complexity, one recurring principle has been clear to me. The closer a configuration system stays to the Unix filesystem abstraction, and the less abstraction it tries to re-build on top of the filesystem, the easier it is to deal with. This is because using a direct filesystem encoding, I can use standard tools, from a shell, to inspect and search and modify (and even generate) my configuration, and to debug problems. This is also why gconf sucks, just as the Windows registry sucks: they represent hierarchical data using a custom encoding layered on flat files, rather than embracing the hierarchy already given to them by the filesystem. (This approach is even less excusable on Unix than on Windows, because Unix filesystems are somewhat optimised for small files in a way that Windows-land filesystems traditionally weren't, as exemplified by FAT.)

In some quarters there's been a drive to actively embrace the filesystem as a means of capturing the structure of configuration data. Configuration directories (like xorg.conf.d) are one example, although it has now gone away; symlink structures like the traditional System V init runlevel directories are another; the run-parts idea of directories encoding control structures is a third. Configuration is easy to understand and modify when it's laid out transparently in the filesystem. When it's instead recorded as opaque data in random files somewhere, this is not true.

Unfortunately this drive towards transparency is not universal. Today I've been debugging a configuration issue with power management on the recent Ubuntu. When I close my laptop lid with no AC power, it suspends to RAM. When I close the lid on AC power, it doesn't---but I want it to. I had assumed the matter was under the control of the scripts in /etc/acpi/, but a quick inspection of the lid.sh script revealed that it didn't deal with suspending to RAM at all. It turns out that KDE 4 has something called “PowerDevil” and that this is responsible. I can configure it using KDE's graphical systemsettings application. But this whole set-up is unsatisfactory. How does it interact with other system settings, such as the /etc/acpi/ scripts? Why is a KDE-specific tool replicating the functionality that is already provided at a lower level? And now I have one more chunk of configuration to deal with, at one more path in the filesystem, and one more model of the settings domain to understand---squirreled away inside its own configuration file (mercifully in plain-text format).

Now, the researcher will say that there's a problem here: why should a simple need, such as the gconf-like desire to implement a familiar abstraction (or something close to it) with different performance characteristics, bring such a huge cost in terms of tool support, convenience and integration cost? It's not really an answer, as I have proposed, to “just use the filesystem, stupid”. For the same reason, even filesystem-embracing approaches don't go so far as to have one file per setting, say---there is some amount of filesystem-opaque flat structure. I'll save some comments on this for a (near)-future post.

[/research] permanent link contact

Wed, 09 Mar 2011

Program specialization (is not just partial evaluation)

I've been thinking a lot about various techniques in program analysis, transformation and verification recently. There's certainly a lot to think about.

One idea I'm exploring is looking at verification problems as program specialization exercises. There is a recurring two-stage process in verification. First, transform your program so that a single execution captures all possible inputs. For an explicit-state model checker like CMC, we do this by putting our program in a harness that systematically explores its state space. Alternatively, for approaches based on predicate abstraction, we replace all input-dependent transitions in the program with nondeterministic choice. The effect is the same: we now have one program encoding all possible behaviours. The second step is then to specialize our program for answering the question we care about, such as “does this assertion ever fail?”. We rely on this specialization to give us a new, simpler, faster program that we can exhaustively check, or can check to greater depth, without exhausting resources (time or memory).

It's the specialization step I'm thinking about right now. How much of our program's computation can we throw away, while still computing the answer to our question? CEGAR approaches work from the bottom up: we start from a trivial abstraction and refine it compute something close to the smallest program which finds either an absence of bugs or at least one non-spurious bug. This process need not terminate; I'm not yet clear on its other failure modes, but am fairly sure there are some. Meanwhile, a top-down approach also exists. CMC is a useful tool even though it doesn't do any specialization of the computation per se. (It does support some other kinds of abstraction for reducing the state space by defining equivalences, which have a similar effect but are of limited applicability.) To improve on this, we could exploit the fact that throwing away unwanted computation is something we know something about. Compilers have been doing this since compilers began. “Program specialization” is a term used mainly by compiler-minded people rather than verification people. Can we apply ideas from one world to the other?

“Program specialization” in the literature is often used to mean partial evaluation. With partial evaluation, we take a program of n inputs, say, and then produce a smaller, simpler, faster version where some of these inputs are replaced by fixed values. This is typical of optimisation problems, where “faster” is the key requirement, and the input constraints have usually been derived from some other analysis. However, there is a converse case of program specialization which the same literature often ignores. This is where we take a program of n outputs, and then produce a smaller, simpler, faster version where we “don't care” about some of these outputs. This is typical of verification problems, where “simpler” is the key requirement, and the selection of don't-care outputs is a consequence of the specification being considered.

Predicate abstraction is doing this, but with some added room for manoeuvre---since it's open to finding sound approximations rather than precise specializations---and also with some added constraints, since it's interested in predicates that can be input to a SAT or SMT solver to perform the abstraction-refinement. Dave provided a valuable link in a productive coffee break this morning, by noting that program slicing is also an instance of specializing for don't-care outputs. What happens if we use slicing techniques to do a top-down specialization? I'm worried the answer is “not enough” or “strictly worse than abstraction-refinement”, but I'll keep thinking about it.

[/research] permanent link contact

Greek talk

One of the reasons why I'm not a theoretical computer scientist is that I am very, very averse to mathematical notation. “It's like Greek to me!”---no applause, please. Certainly, it's common to see very highly abbreviated notation that takes some serious cognitive gear-turning to decode. If I'm faced with a Greek-heavy paper, I usually skim over the symbolic stuff and look for an explanation in words. Sometimes it's there, and sometimes it isn't. In the cases where it's not, I rarely have the stamina to wade through the Greek.

Natural language, for all its imprecision, is---unsurprisingly---more “natural”! In fact, I'll wager that most of the infamous imprecision found in natural language specifications could be fixed by more precise natural language. Perhaps a semantic checker for English is in order. Diagrams are even better than natural language, of course, although they rarely stand alone.

It strikes me that formalism is primarily useful for avoiding mistakes. By turning complex reasoning into simple pattern-recognition and symbol-pushing, correctness can be checked fairly dumbly. The cost is that although it's hard to make mistakes, it's hard to make progress: there are reams of applicable rules, and expressing anything complex requires a whole lot of symbols. So I'm going to go out on a limb and claim that formalism is notably not very good for acquiring understanding. In a lecture, diagrams and examples and words have always been far more useful to me than slides full of Greek. I'm also going to assert (without proof!) that formalism is not useful for artifact construction, except where mistake-avoidance is paramount. We should allow programmers to make imprecise statements, and refine them later, because humans can be a lot more productive this way. In particular, we can make progress before we fully understand the problem! Only when the cost of the smallest mistake is so great that we really want to rein things in should we resort to fully rigorous constructive methods (such as formal refinement processes, the B method, etc.). This argument also encompasses many of the usual arguments in favour of dynamic languages over statically typed ones.

Of course, that doesn't mean that any formal notation is to be avoided. For whatever quirk of evolution, humans have some aptitude for written language---and that includes more mathematical-style symbolic notations just as well as plain old words made of letters. So mathematical notation is fine if it stays within a particular comfort zone. I can read basic logic and basic algebra without much cognitive burden. Only when the formal notation passes a certain density threshold do I suddenly hit problems. I suspect that most theoretical computer scientists (and mathematicians) have a much higher threshold than I do.

[/research] permanent link contact

Thu, 03 Mar 2011

The end-to-end razor

Most of us have heard of Occam's razor. Usually it is invoked as the principle that given two plausible theories, explanations or solutions to a problem, we should prefer to believe the simpler one.

I've always been a fan of “Einstein's razor”, which paraphrases a longer quotation of Einstein by the snappy dictum “Everything should be made as simple as possible, but no simpler.”. The appeal is in its counterbalancing: there is value in simplicity, but there is harm in oversimplification.

A third razor-like object occurs more often in system design. Most practical CS researchers will have read the “End-to-end arguments” paper. Usually, the end-to-end arguments are dumbly invoked to criticise any design which pushes a feature into the lower layers of a complex system (notably the Internet) when it could be implemented higher up. This interpretation is unfortunate. For one, it overlooks at least two subtleties expounded in the original paper: that a key criterion is whether the feature can be implemented completely and correctly at the lower layer, and also whether doing so brings any compulsory overheads (detrimental to applications not requiring the feature). But more importantly, it omits a vital counterbalancing concern: by implementing features higher up, we nearly always end up with not one but many variants of the same feature. Agreeing on which one to use is a hopeless problem of distributed (human) consensus, so we end up with a huge mess of interoperability problems brought on by this unnecessary diversity. So in fact, there are very real incentives for implementing functionality at the lowest sensible layer. The traditional end-to-end arguments don't bring these incentives out.

In fact we should have been paying more attention to Occam all along, because his original statement that entia non sunt multiplicanda praeter necessitatem---“entities must not be multiplied beyond necessity”---is extremely suggestive of the cost of unnecessary diversity. Combining this razor and Einstein's, I prefer a different formulation of the end-to-end arguments, which I hereby name the “end-to-end-razor” (with apologies to anyone who's used that name previously to mean something else). “Everything should be implemented at the lowest sensible layer, but no lower.” You can argue about what's “sensible”, but the criteria are the same as in the original end-to-end arguments. The difference is that the counterbalancing of considerations is explicit: there may be both value and harm in building at lower levels.

Personally, as a programming researcher, I relish the challenge of working at lower levels. Solving a general problem by building a system which is tied to one programming language, for example, seems unsatisfying to me. Not only did the decision to make Cake target object code mean that it provides a somwhat language-independent solution to its problem, but, for me at least, it was just a lot more fun hacking around the OS, linker and C library than it would have been tinkering with a JVM or munging source code. I'm not entirely sure why....

[/research] permanent link contact

Mon, 28 Feb 2011

Why I am not (yet) a functional programming enthusiast -- part 1

I suffer from a particular disposition which, for a programming languages researcher is quite an unfortunate one. When I hear my fellow researchers expounding the virtues of functional programming, I start to feel grumbly. Functional programming is really neat in a lot of ways. But there are some in which I find it unpalatable. Here is my first selection of complaints. They are mostly to do with the generally poor comprehensibility of functional code. I have more complaints in reserve, which will follow in due course when I'm feeling sufficiently grumpy.

I concede that these are all elements of style, not language features per se. It's possible to write clean functional code which doesn't suffer from any of the problems I've mentioned. This hints at the fact that part of my problem is the culture among functional programmers, rather than the technology itself. That's still a showstopper though, because in practice we are reliant on other programmers. Without other programmers' having written tools and libraries that relieve us from writing everything from scratch, and documentation to explain them to us, then no programming language is of practical use. Accordingly, I'll be back later with a more concrete example where this went wrong for me in an earlier foray into functional programming.

[/research] permanent link contact

Sun, 14 Nov 2010


As researchers, we are naturally wary of addressing problems which don't have a well-defined end point. We like to be able to say, “we've solved it”, or perhaps “we've solved it for the following formally-characterized subclass of problem”. In my work, that doesn't really make sense. Cake is a practical tool for tackling interface diversity. No such tool can be complete, because for any nontrivial language, or interface encoding scheme, there is no limit to the crazy encodings that could be dreamed up. When Kent enumerated “the many forms of a single fact”, he wasn't making a complete list, which would clearly be impossible---but he was trying to come up with a catalogue that included a reasonably high proportion of what people actually do in practice.

Although I could try to formally characterise the class of mismatch which my current Cake language is capable of reconciling, I've never seen much worth in doing so. It's hard, if not impossible, to come up with a formal characterisation that actually has practical meaning. In any such exercise, what tends to happen is that a bunch of practically meaningful cases are excluded, and a bunch of practically meaningless cases are included. One reason for this is that humans brains are messy, and don't respect the formal boundaries induced by conventional mathematical thinking. Mathematical formalisms strive to tackle complexity primarily by purposely engineered compositionality; human evolution has done so by a random walk. The point is that humans are full of specialisations, resource limitations and arbitrary cut-offs. Too often, we embrace the formal concerns and ignore the human reality. This is the formalism wagging the research, and it's something that irritates me.

One of my favourite examples of formalism-versus-reality is our use of nesting in sentences, and in particular, what linguists call “centre-embedded” or “self-embedding” structures. (The concepts are distinct, but the distinction is somewhat confused in the literature.) Sentences such as “The rat the cat the dog chased ate died.” are perfectly grammatical according to any formal grammar of English you might reasonably come up with---that is, any grammar that could also generate the sentences we actually use in real English. If you buy that the role of grammars is to model the languages that people actually use, then this is clearly a failure of modelling---yet is a consequence of the recursively generative nature of grammars loved by formalists. In practice, we humans don't work in this neatly recursive way---the complex, messy architecture of our brains means that only some kinds of embedding are easily processed, and it turns out, as expounded by Richard Hudson, that the actual criteria are far more complex than anyone would have thought. For example, it appears to make a difference whether the subjects of the clauses being nested are pronouns or not.

A related and much more familiar example for most readers will be the halting problem. We know that no tool which attempts to answer an undecidable question can be complete. Therefore, a lot of researchers just avoid those problems. Byron Cook, who in recent years has been doing stellar and pioneering work on proving program termination, has been known to talk about the variety of bafflement, disdain and ridicule which his work provoked in its inception. It's simply anathema to conventional CS research approaches to attack problems for which it's known that there can be no complete solution. In fact, far too many self-professing computer scientists don't even understand the distinction between a partial and total solution. We have to get over this, simply because lots of important problems are in this class! I like to think my work on Cake is doing its bit. The halting problem is somewhat different from Cake-like problems, in that its' a formally defined, provably-undecidable problem, whereas the incompleteness of systems like Cake lies in that we can't even completely define the problem space in any formal way. I should add that I'm not putting my work in the same bracket as Byron's! But there is at least this small thematic similarity in the flavour of criticism that tends to come up. Speaking of which, I could do with remembering this argument when it's time for my viva....

[/research] permanent link contact

Wed, 02 Jun 2010

Making a SPLASH

So my paper about Cake was accepted for OOPSLA at SPLASH---hooray! You can find a preprint in my publications section. Overall the reviews were positive, modulo a few legitimate grumbles about related work and evaluation. I still have implementation work to do, but I'm hoping to make a big software release at the end of the summer---as well as, hopefully, submitting my dissertation. The acceptance is a nice vindication of my work (and, not being completely unvindictive, I have to say it's a welcome rebuttal to the naysayers!).

This is the first full-length research paper I've published, and has taken me a despairingly long time, but I finally feel as though I'm getting the hang of it all. I'm also finding that the infrastructure I've built can be applied to many different problems, and I have a giant and growing list of ideas to pursue in future work. The one thing that lets me down is my ability to implement my ideas to any reasonable schedule! Perhaps the knack is being very careful (not to mention experienced) about choosing which ideas to pursue, and how. Oh, another thing that lets me down is not having a job lined up for after I finish this pesky PhD....

[/research] permanent link contact

Wed, 21 Apr 2010

Multi-core madness

A while back I posted a rant to our netos mailing list a while ago, which I think says enough about my attitude to multi-core programming research that I should blog it here.

My rant was prompted by the following call for papers for a special issue of IEEE software.

“Software for the Multiprocessor Desktop: Applications, Environments, Platforms”

Guest Editors:

  • Victor Pankratius (Karlsruhe Institute of Technology)
  • Wolfram Schulte (Microsoft Research)
  • Kurt Keutzer (Univ. California Berkeley)

Multicore processors, like Nehalem or Opteron, and manycore processors, like Larrabee or GeForce, are becoming a de facto standard for every new desktop PC. Exploiting the full hardware potential of these processors will require parallel programming. Thus, many developers will need to parallelize desktop applications, ranging from browsers and business applications to media processors and domain-specific applications. This is likely to result in the largest rewrite of software in the history of the desktop. To be successful, systematic engineering principles must be applied to parallelize these applications and environments.


My rant: what desktop applications actually are there which need to take advantage of this wonderful new hardware? The CfP eagerly suggests rewriting a shedload of existing software, but that seems like a giant waste of effort -- at least in the common case where the existing software runs perfectly well enough on not-so-many-core hardware. This is true of pretty much all existing desktop software as far as I can see.

There might be new application classes out there, or new compute-intensive features that'd benefit existing applications, but that wouldn't be rewriting, and in any case the CfP doesn't identify any....

[/research] permanent link contact

Mon, 19 Apr 2010

Separating computation from storage

One of the nice things about laziness is that it eliminates the distinction between stored and computed values. You can write the same code without caring whether the values it manipulates were discovered by on-demand computation or were retrieved from some storage location. In purely functional languages, this works by throwing away all explicit notion of storage, then relying on the runtime to come up with a sensible execution strategy which exploits the underlying machine's storage. Experience shows that runtimes aren't yet clever enough to do especially good jobs of this: Haskell programs tend to use lots of memory, and/or to include programmer-inserted strictness annotations which hint at a better strategy.

The distinction between computation and storage in languages is well-known to be problematic. Stored data representations are a very change-prone design decision, hence justifying the common practice in object-oriented code (and other!) of using getters to access remote modules' state, rather than having them expose member variables directly. The computation-oriented interface is more general in that intervening computation can be inserted, or not---if the “got” representation matches what's in memory, the getter can just redirect to storage. Conversely, interposing on storage accesses, while possible using memory protection techniques (like the pairing of mprotect() and an appropriate SIGSEGV handler on Unix platforms) is inefficient on conventional hardware, violates most languages' abstraction boundaries and is not easy for user-level programmers to get working. This interposability motivation for getters and setters (and and the change-resilience it enables) is far stronger than a purely abstraction-oriented motivation. The argument is essentially the same as Parnas's from 1972, but this line of thinking still evades some bloggers.

Recently I've been writing some code using ptrace() and libunwind and finding myself with a particular requirement: implementing data structures that can work with various implementations of storage. Specifically, one feature of libunwind is that it can unwind the stack in your current process, or in another, using the same interface in each case. This kind of design is a Good Thing in much runtime infrastructure and debugging support generally, because you may or may not want a process separation in the picture: separation is good for isolation, but badly affects performance. Now, libunwind abstracts storage using a read--write pair of memory access functions. This is fine for simple reads and writes. Unfortunately I want something more demanding: I want to traverse some data structure residing in the target process. (As it happens, this data structure is some heap bookkeeping information that is maintained by a set of glibc malloc hooks I wrote as part of the Cake runtime.)

Generic programming ought to be great for this. Unfortunately, at least in the form of contemporary C++, it isn't enough. In C++, the notion of memory is so pervasive that it can't be fully abstracted. That's not to say you can't try to get most of the way there, and the STL's allocators go some way---but not far enough. Although we can alter how they allocate storage, since STL containers are not parameterised in the pointer types they use internally, we can't make them access their own implementation-specific data structures in a customised way. (See this snippet about overloading the & operator, from an article by Andrei Alexandrescu, for more evidence and examples.)

If we fall back on hand-coding data structures and algorithms ourselves, we can make some headway. The first step is to define a C++ “pointer-like thing” which actually uses the accessor functions. Here's my first attempt. Among other limitations, it can only read, not write, its pointed-to data.

template <typename Target>
class unw_read_ptr
    unw_addr_space_t as;
    Target *ptr;
    mutable Target buf; // HACK: temporary to make operator-> work
    typedef unw_read_ptr<Target> self_type;
    unw_read_ptr(unw_addr_space_t as, Target *ptr) : as(as), ptr(ptr) {}
    Target operator*() const 
        Target tmp; 
        assert(sizeof tmp % sizeof (unw_word_t) == 0); // simplifying assumption
        unw_word_t *tmp_base = reinterpret_cast<unw_word_t*>(&tmp);
        for (unw_word_t *tmp_tgt = reinterpret_cast<unw_word_t*>(&tmp);
            tmp_tgt - tmp_base < sizeof tmp / sizeof (unw_word_t);
            off_t byte_offset 
             = reinterpret_cast<char*>(tmp_tgt) - reinterpret_cast<char*>(tmp_base);
                reinterpret_cast<unw_word_t>(reinterpret_cast<char*>(ptr) + byte_offset), 
        return tmp;
    // HACK: operator-> brokenly demands return of a real pointer...
    // ... so use a per-object temporary. FIXME
    Target *operator->() const { this->buf = this->operator*(); return &this->buf; } 
    self_type& operator++() // prefix
    { ptr++; return *this; }
    self_type  operator++(int) // postfix ++
    { Target *tmp; ptr--; return self_type(as, tmp); }
    self_type& operator--() // prefix
    { ptr++; return *this; }
    self_type  operator--(int) // postfix ++
    { Target *tmp; ptr--; return self_type(as, tmp); }
    // we have two flavours of equality comparison: against ourselves,
    // and against unadorned pointers (risky, but useful for NULL testing)
    bool operator==(const self_type arg) { 
    	return this->as == arg.as
        && this->ptr == arg.ptr; 
    bool operator==(void *arg) { return this->ptr == arg; }
    bool operator!=(const self_type arg) { return !(*this == arg); }
    bool operator!=(void *arg) { return !(this->ptr == arg); }

	// default operator= and copy constructor work for us
    // but add another: assign from a raw ptr
    self_type& operator=(Target *ptr) { this->ptr = ptr; return *this; }

    self_type operator+(int arg)
    { return self_type(as, ptr + arg); }

    self_type operator-(int arg)
    { return self_type(as, ptr - arg); }

    ptrdiff_t operator-(const self_type arg)
    { return this->ptr - arg.ptr; }
    operator void*() { return ptr; }

This has got me as far as being able to traverse a linked list residing in the target process using the same code you'd use to traverse a local one. Unfortunately, a linked list doesn't cut it for my performance requirements: the target process heap contains many thousands of allocated blocks, and I need to be able to resolve a heap address to a particular block quickly. So, perhaps a hash table or a red--black tree would be a good choice. This is where the pain hits: I really don't want to create my own implementations of either of these. I could cannibalise the source of existing one (and I think that's just what I'm going to do) but it'd be nice to take an STL-like container and just use it as-is. (I am planning to use google-sparsehash, and create a hacked version of the lookup function, using my special pointer class above, for the “separate process” case.)

A final conclusion is that polymorphism is all very well, but only when the programmer can be oblivious to it. Polymorphism is after all a very low-level concept. Why should we require separate implementations of a function for operating on different kinds of data? From low-level programming we have got used to the idea that data comes in different forms, like ints and floats and lists and arrays and so on, and that these are treated separately unless some polymorphic cleverness unifies them. But in a truly high-level programming language, it should be a given that when your code is abstract with respect to your data structures' representations, or with respect to any other underlying logic (such as memory access, in my example), then you can mix-and-match any implementation you like for that underlying logic.

Under this criterion, ML-style parametric polymorphism wins nicely, because type inference means that the programmer doesn't need to care about the machinery surrounding polymorphism. In languages where programmer anticipation is required---such as by adding particular type parameters in the case of C++ templates, or by writing particular type annotations as one might in Haskell---then we are forcing the programmer to be aware of these low-level distinctions, so have not yet delivered obliviousness. (I distinguish Haskell from ML because idiomatically, my limited experience suggests that Haskell programs seem to contain an awful lot more type annotations than ML programs do. I will stand corrected if this turns out not so or not important!) Even ML inflicts polymorphism on the programmer in its indecipherable compile-time type errors, but maybe someone can write a compiler which makes things comprehensible.

[/research] permanent link contact

Tue, 12 Jan 2010

Thinking time

A few years ago I attended an excellent seminar by Brad Karp about how to do a (successful) PhD. I really should have listened more carefully. However, one of the things I do remember was a quotation from his advisor, H.T. Kung, which in paraphrase was “if you can make time for two hours' thinking each day, your career is made”. It's a nice quotation.

Of course what I didn't notice then is that the statement is ambiguous. I thought it meant that if you make a point of reserving two hours' thinking time each day, you'd be certain to have a successful career. Now that I know how impossible it is to schedule two hours' thinking per day, I am forced into the converse interpretation: if you can afford two hours' thinking time per day, you must already have job security.

Okay, so I'm somewhat embittered by my career's non-starting (and the remaining 10% of me is joking). Like I said, I should have listened more carefully....

[/research] permanent link contact

Mon, 23 Nov 2009

Back in the USSR

My work aims to show that object code is more tractable than you think---tractable enough that a reasonable number of programming tasks could be abstracted from this level rather than returning to source level. In particular, rewiring references in object files is a powerful technique for interposing new logic to adapt and compose existing code in interesting ways. I recently discovered this excellent talk by Kevin Pulo (slides linked from this page) motivating some very practical uses of LD_PRELOAD.

One weakness of LD_PRELOAD is that libraries' internal references are out-of-bounds---so, for example, if your C library implements execv and friends as wrappers around execve(), and you want to change the behaviour of the whole family of exec functions, it's probably not enough to wrap execve(), because references to execve() are internal to the .text section in libc.so, so are not subject to dynamic linking. Here's the objdump -Rd to prove it.

009bb2e0 <execv>:
  9bb2e0:       53                      push   %ebx
  9bb2e1:       e8 ea 2e f8 ff          call   93e1d0 <__i686.get_pc_thunk.bx>
  9bb2e6:       81 c3 0e 1d 0c 00       add    $0xc1d0e,%ebx
  9bb2ec:       83 ec 0c                sub    $0xc,%esp
  9bb2ef:       8b 83 9c ff ff ff       mov    0xffffff9c(%ebx),%eax
  9bb2f5:       8b 00                   mov    (%eax),%eax
  9bb2f7:       89 44 24 08             mov    %eax,0x8(%esp)
  9bb2fb:       8b 44 24 18             mov    0x18(%esp),%eax
  9bb2ff:       89 44 24 04             mov    %eax,0x4(%esp)
  9bb303:       8b 44 24 14             mov    0x14(%esp),%eax
  9bb307:       89 04 24                mov    %eax,(%esp)
  9bb30a:       e8 71 fe ff ff          call   9bb180 <execve>
  9bb30f:       83 c4 0c                add    $0xc,%esp
  9bb312:       5b                      pop    %ebx
  9bb313:       c3                      ret    

Notice that the call to execve has no relocation record---it's already bound.

What if we want to interpose on these pre-bound references? It's quite easy to identify these references, as the disassembler has done above, when they're in code. But suppose we want to interpose on some data--code or data--data references? Say we want to add a new element to a statically-defined pointer structure (maybe a linked list or something). Given some bytes in a data section, it's intractable to deduce what the interpretation of those bytes will be at run time (in contrast to code sections), so we can't work out which ones are pointers that we might want to interpose on and which are just some bytes that happen to look like an address.

But wait! Actually we can do it. The reference shown above can only be bound before link-time because PC-relative branches are available... which is only in text sections. In data sections, even internal references need relocation records, because pointers are always represented as absolute addresses. This means that it's only code--code references that we need to unpick without the aid of relocation records---and we can do this by disassembling the instructions.

In my existing work I have some extensions modifications to GNU binutils which can put an object file into what I call “unbound static single reference form” (although I'm working on a name that contracts to something less embarrassing)---a form where every reference in the file (internal or external) has its own relocation record with a unique symbol. This is neat because these symbols can then be bound independently using a simple linker invocation (ld -r --defsym name=value), allowing each reference in the file's linkage to be independently controlled. The discovery that the same can probably be done for shared libraries is heartening. Watch this space, or contact me to find out more.

[/research] permanent link contact

Sat, 07 Nov 2009

OOPSLA 2009: the bits that stood out

I was fortunate enough to go to OOPSLA this year---the last year of its present headline form, before it slips undner the federated SPLASH banner, alongside Onward!, another event I was fortunate enough to participate in this year.

Curricula for Concurrency

On the Monday I was at the Curricula for Concurrency workshop. It was a really interesting venue which I learnt something by attending. Firstly I gained a clearer distinction in my mind between so-called “deterministic concurrency” (essentially data-parallel computation, as far as I could work out) and the more usual shared-resource concurrency problems. It's not that I hadn't differentiated them before, but my experience is mainly with the latter, and I'd written my position targetting it almost exclusively.

My talk went down fairly well, as far as I could tell. Mid-talk I found I was focussing on a balding grey-haired man in the middle of the audience, who nodded along to some of my points. I hadn't noticed him before. Only after my talk did my face recognition circuits reactivate, and I realised that it had been Bjarne Stroustrup. I'm sure he didn't come for my talk per se, but it was nice to have him in the room. The event itself was chock-full of distinguished people (Guy Steele, Doug Lea, Vijay Saraswat) which was a privilege. I intend to keep a presence (albeit fairly passive) in that community to see what comes about... it's a really interesting topic with a diversity of viewpoints. In particular, Peter Welch's talk (and a subsequent chat) has forced me to rethink the times when concurrency has a simplifying power. I still think that sequential reasoning is easier for most humans, but I'll grant that concurrency is often a very nice abstraction and does model the universe fairly directly. I don't yet buy Peter's claim that it's a truer model than a state-based model, necessarily, but I'm open to argument.


Barbara Liskov's keynote was an interesting retrospective, and gave me a reading list to catch up on, but a bit unremarkable as far as future plans go. She even gave the impression, wrongly I believe, that she hadn't been interested in programming language advances since she began concentrating on systems. I don't think she meant that, so never mind.

There was some fairly interesting stuff at the first two Onward! session, but nothing that struck a major chord with me, although Pi (recently featured on LtU) is something I'd like to understand better. It sounds too much like a mathematician's language to me, hence probably no good for most programmers, but there are some interesting principles at work (meta-completeness made an impression on me in the talk, but there are others). It's a pity they called it Pi---surely they know something about the preexisting process calculus of a very similar name?

The Onward! keynote by Thomas Malone was pretty interesting. In the spirit of Onward!, it left more questions than answers, mainly about the preconditions for the various “genes” of collective and collaborative structures that he described. His talk also reminded me that Chris Dellarocas had proposed the core idea of my own work, a separation of functionality from integration (although in a rather different context, and he didn't describe it in those terms) in his mid-90s work. Malone was a great speaker and handled some slightly hostile questioning about his political beliefs rather well. In response to an aggressive lefty questioner, he criticised business schools for treating “making money” as the sole aim of business, as opposed to more human-oriented values. The only foot he put wrong in my eyes was saying that “making money is a valid goal, but not the only one”. Call me radical, but making money is only ever a means to an end, and so never a goal, except among those who've really lost their perspective---tokens of wealth as an end in themselves can only make life a sad and meaningless game of Monopoly. Most people are interested in what money can buy them, of course....

The OOPSLA session on concurrency was interesting enough: JChorus (from Penn State) is a neat adaptive heap-partitioning scheme which allows sequential logic to run amid exclusively-owned heap regions which grow and split as the aliassing structure of the heap changes. They used only one fairly obscure algorithm as an example throughout (some triangulation thing) which wasn't terribly convincing, but it's at least a neat idea. The second talk, however, from Emery Berger of UMass, delivered far less than it promised. It promised safe concurrent programming for C and C++, but as far as the talk made out, it delivered only an umimpressive serializing commit manager (a bit like TSO) which, unless I'm missing something, can't possibly perform as well as an STM (and STM performance still isn't great).

Next it was my talk at the Onward! short presentations session. As usual I was underwhelmed by the response---for what I consider to be a big idea, the reaction I got was pathetically tiny. I must be doing something wrong. The audience was very small though, and talking at 5pm is never going to draw a huge crowd.


Jeanette Wing's invited talk was a fairly US-centric look at funding issues, although it did remind me to do something to follow-up on Pamela Zave's ICSE invited talk about next-generation internet funding. More interesting was the Onward! panel about green software. I particularly enjoyed the contributions of Vivian Loftness, an architect from CMU who was soundly scathing about contemporary building practices and very robustly advocated a return to passively heated, passively lit, passively ventilated buildings. Steve Easterbrook, among other very sensible contributions, demoed a “collective intelligence“ tool for modelling carbon costs of various everyday actions (e.g. “should I print this paper or read it on my screen?”---you may be surprised).

Gerd Holzmann's invited talk made the interesting observation that space missions of 40 years ago implemented software that performed the same task (at least in that the mission was the same) but with an order of magnitude less code and using an order of magnitude less in resources. I'm not sure whether that's because today's software does more in non-essential matters (cf. the amount done by hardware and humans, which might be correspondingly less), but still, it's a frightening change. Unfortunately, most of his talk focussed not on the clever verification techniques of his SPIN model checker, but on rather more mundane code quality measures (if effective, as far as they go) which his team check using shell scripts and the like. Like many high-assurance developers he eschews dynamic memory allocation, which I find hard to understand since even without a heap allocator, in a purely static-allocation scenario, a program has to account for the used-or-ununsed state of its resources.

I saw an interesting demo of the “anti-Goldilocks debugger” from Virginia Tech, a tool which recovers an original source-level debug-time view of Java programs that have been bytecode-munged by various weaving tools. The tool itself, while fine in its concept, revolves around a decidedly clunky language describing what the mungers do, and I wasn't convinced that the right solution wasn't for mungers to munge the debug information also. Of course, my usual take applies, namely that debuggers need “interpretation” (ability to recover a customised abstract view of running code) and “subjectivity” (ability to view many such abstractions simultaneously). I should write a position or something.

Continuing the debugging theme, Blink (from various people at various places, the precise list of which I forget, but UTA was definitely one) is a solution to multi-language debugging in which, say, gdb and jdb are run concurrently with a master control process mediating between the two. Again, I'd rather just have “language” as a custom interpretation of object code, but that does demand some homogeneity in language implementation. I should read the paper.

Finally it was back to Onward! and some interesting talks, in particular Tom Mullen's about elements of psychology in programming. He advocated language design with an awareness of “chunking”, “analogies” and other cognitive phenomena. It was very interesting, and another welcome bringing-together of programming and psychology. I need to read this paper.


I loved the paper about describing points-to analyses in Datalog... the “meta-engineering” of recovering order, orthogonality and principle within such a messy domain really deserves credit, and the performance figures were superb.

A real highlight was William Cook's talk about data abstraction versus objects. His take is that objects are a purely behavioural concept, in that they are observed through an interface. Meanwhile, ADTs are a dual, constructive sort of thing, in that they are described by a concrete representation with functions defined in its terms. Hence objects are nice and dynamic, but terribly difficult to verify, whereas ADTs are verifiable but rigid. He also claimed that one advance of objects concerning inheritance is “this”, and its ability to maintain correct self-reference across inheritance-based extension. He called self-reference “Y”, presumably after the combinator, although I'm not yet sure whether it's exactly analogous to the one we know and love from lambda calculus. Anyway, it was a very stimulating and engaging talk, so I really must read the essay.

Finally, I enjoyed the talk about Google's work on “optimisation using intended semantics”. The idea is that programmers know much more about how their program should behave, and hence how it could be optimised, than the language semantics allow the compiler to infer safely. So, the work provides annotations that let programmers assert constraints on their program which the optimiser can use. I wasn't terribly convinced by the examples, particularly after a chat with Eric Eide later, but I'm sure good examples do exist. Again, I must read the paper.

Bjarne Stroustrup also gave a talk about a language implementation trick for code-sharing in parametrically-polymorphic languages which do ahead-of-time elaboration (so C++ and sometimes C#, but not Java). The idea was the iterators pointing inside instances of multiply-parameterised classes needn't necessarily depend on all the type parameters, so implementations could be shared if the definitions allowed. Unfortunately, the typedefs for these iterators are implementation-defined in C++, with no guarantee that they'll support this sharing, but a future standard should fix that. Bjarne is more compelling as a writer than as a speaker, which might be yet again a reason to say “I'll read the paper”, but without meaning to sound harsh I may not get around to that one.


OOPSLA is big... but not ridiculous. Industrial participation is a good thing... even ICSE didn't have the same extent of practitioner participation, and apparently this year was a very down year as far as OOPSLA's industry attendance went (it's normally fifty-fifty, but maybe only half the normal industry participation happened this year).

The food and event arrangements are unusual: lunch isn't paid for (except, as a surprise, on the last day), but the poster reception was done as a very nice paid-for buffet dinner. Student registrants don't get to go to go to the main social event (except if, as some did, they blag a returned/unwanted ticket). It was, as I was told (by Yvonne), a comically imbalanced “beach party” event with optimistic proposals of dancing (with the conference's “impressive”, with regard to my expectations, 14% female attendance).

There's a slickness about OOPSLA... the keynote auditorium featured rockin' music before and after talks at main (keynote) venue. There's also a sense of fun: TOOTS, the OOPSLA trivia show, was rather a fun way to end the Tuesday evening, and there was even something called “OOPSLA Idol” which I didn't have the privilege of witnessing. People seem to enjoy themselves at OOPSLA, and that's a valuable thing.

Work-wise, the emphasis on language (not “automatic tool” or “mega-efficient runtime”, although there is a little of that) fits my viewpoint. Tools are domain-specific languages... so let's make them good ones, by understanding languages, and not insisting on automation (which inherently rules out many of the really hard problems). I suppose what I'm saying is that the theme of languages, even though it's only one theme, brings in a nice mix of both practical and theoretical academics (Phil Wadler was a prominent and sometimes long-winded question-asker) as well as highly concrete practitioners. Of course there are tool papers, and good tools... but the desire to understand programming through the deep and pervasive concept of languages is definitely there, and definitely positive.

And finally, to counter my misgivings before attending, OOPSLA is much more than just an OOP conference. The SPLASH reformulation is interesting and welcome. I think I've found my community... so I really hope I'll be able to attend in Reno next year. Thinking further ahead, I wonder how long conferences will be physical get-togethers anyway... there were at least two instances of “remote participation” this year (one panel participation by Skype and one talk by pre-recorded presentation), and surely virtual worlds are the long-term sustainable way to do conferences. By chance, on the weekend following the conference I had my first taste of World of Warcraft, which was interesting. Anyway, it'll be a while before they're as fun or as engaging as the physical meet-up experience.

[/research] permanent link contact

Tue, 06 Oct 2009

Management bonus

My PhD is still mired in what has been a fairly protracted “development phase”---doing lots of coding. I'm not the quickest coder in the world at the best of times, and for some reason my current progress feels especially slow. At various points I've found myself thinking something a bit unexpected: I wish I had a manager.

What do I mean by a “manager”? I suppose it means someone who knows the technical path that I'm following: the overall structure (dare I say “architecture”) of what I'm building, and what my milestones are. This is useful because when you get stuck in a development rut, such a person can either prod you towards the way out, or suggest an alternative strategy that backs out of it and then avoids it. Motivationally, they also fill the role of someone to report your successes to. It's hard to sustain self-motivation when you're the only person who cares about your “little” practical achievements. That contrasts with the “big” research achievements, which of course your supervisor (and the wider community) should definitely care about. But this is one way in which “manager” is not fulfilled by “supervisor”---I think most PhD supervisors, at least in CS, don't take a close enough view to see those internal milestones (and for sound enough reasons, I'd probably add).

Being concrete for a moment, I'm doing a lot of work with DWARF debugging information right now. It was only through a chance chat with a fellow PhD student that the idea of using debugging info occurred to me, whereas previously I'd been wasting time experimenting with mining my own binary metadata by exercising the C compiler. Recently, I had got into a major rut in my consumption of DWARF data. I made a major resolution which, had I had a manager, I might have found myself doing much earlier: firstly, that building a better abstraction layer over DWARF is a jolly good idea, and secondly, that I can make all-important progress in my mainline Cake compiler without doing a complete job on this up-front, simply by choosing a different example use-case as my first target. I'm feeling a lot more optimistic thanks to these resolutions, but it took me far too long to make them---I was too close to the coal face to see them.

It might sound like a weakness that I'm not succeeding in “being my own manager”. Doesn't that mean I'd be a poor manager myself? There might be some truth in this, but in my defence, I think the gain is mainly from having two heads to knock together. Being the developer, it's very easy to get close enough to what you're doing that the longer-term rut-avoidance path just isn't very apparent. Maintaining the further-in and further-out vantage points is hard to do in one head, which is probably why management roles exist in the first place.

[/research] permanent link contact

Thu, 28 May 2009

ICSE day 3, and subsequent talks

It's been a few days since the end of the conference already, so I should probably write up my reflections on the final day. Friday began with a very interesting keynote by Pamela Zave, arguing that more software engineers should be paying attention to the current work (and funding drive) on next-generation internet architectures. The rationale is that the networking community doesn't pay much attention to the concerns of application authors, and arguably don't necessarily come up with designs that have good engineering properties in the sense of compositionality, abstraction and so on. I liked the sentiment but was less sure about her argument... it was more at a tentative “possible approaches” stage than an inspirational, rabble-rousing keynote. I'm about to write up a summary of her argument and send it round the netos list at Cambridge to see what those hungry hounds make of it.

Later sessions were a mixed bag, as always, but contained some nice stuff. In the code generation session, some guys from MIT presented a lower-effort, higher-assurance way of generating correct implementations for “standard” methods found in OO languages (like equals() or hashCode()). I missed the start of the talk, so didn't quite follow the details, and am not sure how significant a contribution this is, but it was thought-provoking nonetheless: was it a poor-value decision for the Java and .NET library designers to mandate these methods, given that they're difficult to get right? What can we learn from this? Well, probably something, but I should read the paper first. Second in the code generation session was a paper from Peking University about applying some analyses to identify which string constants, in a large program, make it to user output devices (i.e. a certain set of UI library calls), accounting for substrings and concatenation and various other complications. It was a decent paper tackling a small but nontrivial problem, although again I wasn't sure whether there was a huge contribution in bolting together a few preexisting analyses for solving such a specific problem. And lastly there was a prize-winning paper about using genetic programming techniques to automatically synthesise a patch for fixing a newly discovered bug, given only a test case triggering the bug. It works by finding snippets of code that already exist elsewhere in the source, and randomly splicing them in near your bug site, or similarly randomly removing lines not found on test-passing paths, then applying an iterative process to get closer to a solution, using the test case and a control-path weighting derived from test case executions, to guide the mutations. This is very clever stuff and deservedly won a couple of prizes. If I had to make a criticism, then rather like the (admittedly awesome) Steve Reiss paper of the first day, these approaches feel somewhat unsatisfactory in that they're proposing the use of source code that is the output of essentially random processes. But of course, human programmers can be modelled as random processes fairly well, in that they naturally insert errors every so often for no clear reason, so rationally there's no reason to object to these approaches on those grounds.

The concurrency session was pretty interesting. First there was something from Hong Kong University of Science and Technology about using aspects to insert synchronization logic. There was more to it than that, and the talk was delivered fairly well, but for some reason I lost it early on so should really read the paper. Secondly was a prize-winning paper about Effective Static Deadlock Detection, by some Intel/UC Berkeley guys. This was a pretty neat tool which combines a bunch of well-known static analyses in surprisingly simple but neatly complementary ways to provide a decent static deadlock detector for Java programs. As often happens, the talk managed to make the work seem trivial, where in fact there's a major contribution simply in the problem set-up and approach (in this paper, the six necessary conditions for deadlock that their analyses target). Something which marks out the “software engineering” approach from a more “programming language” approach is that their tool is neither sound nor complete---it just does a decent job in practice for a decent range of input programs. Finally in this session was a talk by Danny Dig about a refactoring tool which can automatically make the necessary source-level updates for migrating a concurrent program to use a new set of concurrency-optimised library interfaces like those in java.util.concurrent. I found the example use-cases very limited. There were three: use of AtomicInteger instead of ++, use of the additional atomic operations in ConcurrentHashMap, and a fairly simplistic conversion of easily-detectable divide-and-conquer algorithms to use a parallel evaluation strategy. All fair enough, but it's not clear whether the refactoring support for these generalises in any way, and the detection of divide-and-conquer approach seemed terribly simplistic. (I'd argue that it's something best done on an intermediate representation, perhaps using some sort of CFG pattern-matching techniques much like a decompiler---rather than the easily-foiled source-level detection.)

In the Software Process session there was a paper from MSR attempting to answer the question of whether “distributed development harms software quality”. Unfortunately the presentation didn't really answer it---it could only say “not the way Microsoft did it for Windows Vista, but that's probably because they took several measures to minimise the problems, and we're not sure which ones are the important ones”. So all a bit unsatisfactory, though perhaps there's more in the paper. I then skipped over to the second Program Analysis session (happening concurrently) and caught a couple of talks there, but with it being the last session on the Friday, not much stayed with me, so I won't defame the authors by attempting to regurgitate any of their work here.

Reflecting on ICSE: it was good, although I didn't enjoy it quite as much as I enjoyed FSE in 2007, nor did I find quite so much exciting work in it. Perhaps it's because in those days I was a wide-eyed first-year student, whereas now I'm a hard-headed should-be-writing-a-thesis cynic. But I'd also venture to say that ICSE is slightly unwieldy in its size, and would be better with a smaller, slightly narrower program that reduces the concurrency factor in the session scheduling.

Following the conference I've been sticking around in Beautiful British Columbia (as the license plates rightly describe it) to give my talk about ongoing work on Cake. On Monday I went to UBC, hosted by Andrew Warfield, and today I've been at UVic hosted by Yvonne Coady after Andy very kindly put me in contact. Both talks were useful experiences, and between the two, the contrast could not have been more marked. After UBC I was left contemplating quitting my PhD. I turned up, gave my talk and felt like I did a reasonable job delivery-wise, but conspicuously absent was any sign from the audience of the slightest appreciation, or any nonnegative remark. Perhaps I'm paranoid, but the questions seemed to come with an undercurrent of contempt. The audience buggered off without any attempt at even a pretence of sociability, and I went home in a rather frayed state. Today was a much better story: following extrenely helpful pre-arrival e-mails, I received an almost (but not!) embarrassingly warm welcome at UVic. Even better, my ambition has been mostly restored thanks to the stunning generosity, enthusiasm and genuine helpfulness of Yvonne and the other UVic people. Victoria is a particularly beautiful place too, with a quaint character of its own, so it's very sad that I have to leave tomorrow. Hopefully I'll get time to poke around in the morning, and I'm also hoping it won't be terribly long before I come back here.

[/research] permanent link contact

Fri, 22 May 2009

ICSE days one and two: a story of two thirds

As I write, two of the three days of conference programme are over, so it's time for a quick report. Overall it's been good. As usual, I can both acclaim and lambast the breadth of subject area covered by this community that calls itself “software engineering”. On the one hand, it's great to take a wide view of things, and is in the best possible academic and scientific spirit. On the other hand, one can question the productivity of gathering together so many diverse people (with diverse interests) under the one roof. Three sessions in parallel is the norm, and that's rather too many for me, because it means that I inevitably end up missing some papers that I'd be interested in. No grouping is perfect, and of course, the tyranny of dominant decomposition means that there has to be one---and without the ability to be in two places at once it's fairly restrictive.

Oh well. I've still seen several really good papers. Yesterday in the first NIER session there was a huge amount of stuff, some fascinating, some less fascinating and some just a bit baffling from the six-minute presentations. Codebook is a fairly neat idea, although perhaps a bit bandwagonesque, and there's a thematically similar thing called Tesseract being demoed tomorrow. Also although I'm perhaps not a disinterested observer, the bug-reporting stuff of Silvi et al is quite promising (albeit very early days). In the program analysis session, there was a very interesting paper about “dimension inference” (type inference but focussing on providing a richer set of types for expressions denoting primitive values, like ints or floats, rather than structured values or functions) in the program analysis session. This kind of typing is an idea that has probably occurred to a lot of us, but it was done well in this paper, particularly thanks to the inference cleverness, and it's heartening that such ideas can still be novel enough to be publishable. (That's “novel enough” not necessarily “novel” in absolute terms---one questioner implied that some chap had done something very similar in his thesis in 2000 or 2001, and I know that one of the Nicks or Andrews at MSR has also done some work in a similar vein.)

In the development tools session were three very neat talks---first by Sven Apel about a system for composition by source-level superimposition (since which I've since been marshalling my thoughts into arguing why it's not a great idea, but thought-provoking nonetheless), second by Emily Hill about a smarter keyword search over source code, and last but not least, a truly unforgettable talk by Steve Reiss. Both the work he was presenting and the manner in which he presented it shared two properties: bonkers but extremely likeable. The idea is that rather than writing new code, we should be able to search for existing code which meets our requirements, and have a tool which automatically munges it to fit our specification (described mainly as unit tests), our compilation context and our coding style. His system gets raw results from code search engines like Google Code and then explores thousands of variations in parallel---seeing whether they compile, seeing whether they pass the unit tests and, if so, transforming them to fit a supplied description of code style. It's chock-full of ad-hoc kludges, but it seem to work at least sufficiently to be somewhat useful... I should really read the paper. (I also enjoyed his sense of humour, not only in his eccentric delivery but also the fact that his slide backgrounds were brown, or Brown, to reflect his affiliation.)

All that was yesterday. Today was slightly leaner for me, not helped by the six concurrent sessions in the 2pm slot, but there was a very interesting talk about a “genetic programming” approach to automatically generating patches for fixing observed bugs. Tools which automatically find and fix bugs are a favourite these days, not only in the SE community but also elsewhere (I forget where Dawson Engler publishes his stuff but it appears not to be here). “Genetic programming” seems just to mean speculative or pseudo-random mutation of code, combined with selection based on some fitness function (i.e. whether it passes the test). The approach is fairly clever and seems to have been thoroughly worked through---the distinguished paper award appears deserved. Later in the awards plenary session, I was particularly interested by the retrospective on the famous Multi-Dimensional Separation of Concerns paper of Peri Tarr et al. All four of the authors were present and participated jointly in the presentation, and although the technical content is of course excellent, more unexpectedly I found some other messages particularly interesting and encouraging on a personal level. Firstly was the suggestion that “if you have an idea you believe in, you should go ahead and write it up even if you're not sure you know what you're talking about”, in reference to the hurried and somewhat evaluation-free nature of the submission, which nevertheless deservedly managed both to get accepted and to exert huge influence. Secondly was William Harrison's critique of the current state of programming languages and the importance of revisiting the distinction between language and middleware. This is particularly close to the heart of my own work and attitude, so it was nice to hear him coming out with some of the same arguments I've made to myself.

Of course, conferences are also a social occasion, and yesterday I wasn't really enjoying this aspect. I became suddenly aware of how few people I knew andh how hard it is to “get in” on meeting more in such situations. Following the NIER session---after having sat through one too many presentation about requirements engineering---I was also feeling that my work was marginal and uninteresting to the crowd present. Essentially my worry was that I didn't belong here---and not because there was any more appropriate research community for my work, but just that few people are interested in what I do full stop. I'm not sure how true this is, but it prevailed in my mind into the poster session, as I was initially disappointed in the apparent lack of interest in my poster. But by the end, a half-dozen people or so had come up and shown interest---some even said some nice words---so it wasn't so bad. Somewhat less enjoyably though, the session was held outside in the pool area, rather overcrowded with too many posters and too many people in too small a space, while the breeze was getting up and leaving the easels in danger of blowing over. Food and drinks were circulating among the audience, but us poor presenters were left starving with only the latter, and a mini-disaster struck as the wind blew my poster and easel towards me---I instinctively raised my right hand to stop it, thereby managing to spill a lot of wine down the side of my poster. That didn't help my frame of mind, and as the temperature declined rapidly around my jumperless self, I couldn't wait to get out of there and get some dinner. A large crew of us, mostly previously unacquainted, eventually found the Cactus Cafe, and by the time I finally ate it was approximately a starving 10pm.

Ultimately though, no matter how discouraged I was feeling on the first day, you can't beat a conference for getting ideas and encouragement. I've already mentioned the encouragement I got out of the influential paper session. More generally I've seen and heard plenty to get me thinking, and have had plenty of thoughts about my own work to jot down as a result. At lunch today I chatted to Sudheendra Hangal, author of the dimension-checking paper, and he's keen to come to Cambridge to give a talk when in the UK later this year. So at least that's some evidence that I'm not having a completely networking-free trip. Hopefully there will be more of this kind of thing in store tomorrow.

Oh, and organisationally, the conference is not bad. The only grumbles are the aforementioned overcrowding, that the wireless doesn't work (neither for me nor for others I've spoken to) and that there's a bit too much crosstalk between the talk rooms (you can hear applause, and sometimes mic'd voices, from neighbouring rooms). Oh, and I was also irritated when, having turned up on Tuesday evening, I had no way of telling when I needed to get up the next morning, because I couldn't register and get my printed programme until the morning itself---which is rather too late to decide when to set one's alarm. However, the food is excellent, and in an enlightened move there's even decaf coffee available. Tonight's trip to the aquarium was fun, modulo the same hardly-knowing-anyone issues as usual, and the food was again excellent, although I would hardly call it a “banquet” given that it was fairly hard to find somewhere to sit down.

[/research] permanent link contact

Tue, 07 Apr 2009

The numbers game

I just ran across this truly excellent article by David Parnas about the folly of measuring success and impact by publication counts and citation counts. As you'd expect from the man, he says it all far better than I could, so I won't repeat anything. I couldn't let it go without blogging about it though. (I love the appropriateness of the one reference in the article, which is, very deliberately I'm sure, a negative citation.)

Early in one's career it's easy to feel insecure about one's competitive standing and, consequently, to end up participating to some extent in the various games he mentions. I think I'm fairly hard-line about “not playing the game” generally, but can't claim I haven't at least half-caved to one or two of the things he mentions (mainly the “submitting half-baked work” thing). So the article is quite helpful as a “sin list” to remind yourself why you're really doing research, and how it should work.

Of course I can't see things changing for the better any time soon on any of the counts he lists. But here's a thought: researchers typically want to work with like-minded individuals. Mismatched attitudes on this topic are easily detected---those who want tons of dubious publications will go and get them, achieving that flabbier CV that they always wanted. Employers who value this will immediately find the people they're looking for in this way. So in effect, attitudes to publicatinos become a filtering criterion by which mismatched pairings of researcher--institution or research--professor can be avoided. I may not have many publications, but if that just means I'm only likely to be employed by someone who understands the fraud of the game, and actually reads and likes my work, that suits me just fine. Of course I'm making the dangerous assumption that sufficiently many senior researchers actually have sensible views on the topic, so if I'm wrong, I may be out of a career....

[/research] permanent link contact

Sun, 05 Apr 2009

Algebraic data types and typecasing

Yes, that's typecasing (sic) in the title.

People often say that functional programming languages are great for writing compilers. (Hardly a coincidence, I always remark, since the people who design them tend to be compiler guys of one sort or another.) One of their strengths is that they provide algebraic data types, and have neat syntax for pattern-matching. An ADT is just a possibly-recursive data structure featuring one or more unions (discriminated, naturally).

Relative to functional languages, in conventional languages pattern-matching is tedious---we have to do switch, which in C-like languages can only be on integers. We can also only test one value at a time. In Java and the like, it's even worse because we don't have explicit unions---we only have subclassing. It's not hard to see how we can use subclassing to get the effect of discriminated unions. But then, supposing we have some ADT Expr in the form of a set of subclasses of class Expr, and want to pattern-match against the constructor (subclass) that our value is an instance of. How do we do this test? Well, as my wording gave away, we use instanceof. But wait a minute---isn't that bad form? Weren't we taught that we should be using virtual functions instead of a series of instanceof tests (which is what I mean by typecasing)? How can those functional programming guys get away with saying that algebraic data types and pattern matching are so great, if they amount to typecasing?

The answer is that typecasing isn't always bad---it depends on whether you're building a system that is open or closed. ADTs are “closed” in the sense that implicitly, adding another arm (constructor) to the union is very likely going to mean revisiting all the functions that know about your type, and probably adding another pattern-handler. Of course, the default case (the “_” pattern) might do the right thing, but in most cases it won't. If your functions are inherently tightly coupled to the complete ADT definition, then you have a closed system, and typecasing is okay. This is common where you want to perform some complex algorithm over a well-defined input domain---which is exactly the case for compilers, the classic application of functional languages.

Meanwhile, in OO-land, we usually push the functions into the definition of each arm. That way, our set of arms is “open”---we can add a new one, together with its own definitions of the various functions it provides (as overrides), and any algorithm that walks the structure and invokes the functions will work without change. On the other hand, we've moved the coupling to a difference place: we've closed the set of functions defined on the data structure. Our walker sees a common interface, the supertype of the ADT, and can only invoke the functions that are defined by it. This is common if you want to perform simple algorithms over some heterogeneous, evolving input domain---which is exactly the case for most business applications (domain modelling, process automation, resource management, and so on), the classic application of imperative languages.

The conclusion, if there is one, is simply that different problem domains are best tackled using different approaches. There's no reason why both functional and OO languages can't support both approaches equally well, but owing to their originating culture and target application classes, their language features have evolved to offer stronger support for one or the other. Imperative languages could definitely do better in their support for rich expression of algorithms, and languages like Python are leading the way---although much as I like Python, I'd be more of a fan if it provided type-checking and compilation to fast code. Meanwhile, functional languages make a lot of “closed”-style assumptions---the very idea that programs are functions, or Turing machines, is only true of a closed system (since a chunk of a Turing machine, communicating with an unspecified outside world, is a computationally different object, sometimes called an interaction machine; the debate rages on). I've yet to really “grok” (as they say) what's good and bad about functional languages in this regard, and particularly their handling of I/O---the monadic approach seems to be just a way of shoehorning explicit sequencing into a Haskell-like language semantics, which is hardly revolutionary, but maybe there's something else to it.

As I've been finding in my recent experiments with antlr, a tool deriving very much from the imperative and, by extension, OO schools, a lot could be done (but isn't) to generate OO syntax trees that are amenable to typecasing. It'd be a start if it actually generated type definitions (I mean classes) for each kind of node, so that I could use names rather than indices to access the child nodes. The best ANTLR does, as far as I've found, is something called “heterogeneous nodes”, which lets you specify, in the grammar definition, the type used for each node of the AST on a per-rule basis. You can optionally specify constructor arguments too. But this only seems to work for grammar rules using tree construction operators (rather than those falling back on the default behaviour of passing through their children in a list), which necessitates some extra grammar tweaking and boilerplate (usually adding more “imaginary tokens”). Even worse, the killer is that you have to define the types yourself! It wouldn't be hard to generate them automatically, especially if the grammar author was forced to provide a name for every top-level alternative in a rule. Tree rewrite rules already look so much like constructor invocations that we really shouldn't have to specify this separately.

Somewhat separately, there's an interesting debate about how tree-walking should be done. Visitors are one well-understood option, and despite being limited (because when the action is fired, you don't get any contextual information about the node being visited), they're still handy for certain tasks, so it'd be nice if antlr generated them. Meanwhile, Terence Parr says that tree grammars are the best approach. You get to match patterns on your AST and fire actions accordingly; since these patterns can span more than one node they're more precise than visitors, and can give you exactly the contextual information you need. On the other hand, Andy Tripp argues that manual tree walkers are better because it doesn't entail yet another language, and yet another set of snippets of your source language embedded into a grammar. (The latter is a practice I too find distasteful, and I've already gone to some length to avoid it both in my use of antlr and my rejection of other parser generators, as I ranted about in an earlier post.) Anyway, in this particular debate, as usual, I can see both arguments and it really depends what you're trying to do. Parr seems to be talking about the specific case of writing a source-to-source translator, whereas Tripp is talking about a more general case.

Now that I've just wasted half an hour writing this post, it's back to some real work. I mean lunch.

[/research] permanent link contact

Powered by blosxom

validate this page