Rambles around computer science

Diverting trains of thought, wasting precious time

Tue, 30 Jun 2015

A position on licensing research software

I have a habit of putting code on my github page without any statement of license. Sometimes this is oversight or laziness, but sometimes it's entirely deliberate. It's not because I don't want to open-source or otherwise license my code; I do. But, as this post covers in some depth, the Devil is in the details.

In particular, I believe that software whose purpose is scientific or researchy—to exist, to prove a concept, to be extended by other researchers perhaps, but not to acquire users—can be better off without an explicit license, at least while it remains relatively immature and has only a small number of contributors. In short, this is because there's little practical loss from not licensing at this stage, whereas by forcing a choice of license, it's possible to choose wrongly.

For some reason, this belief, at least when compressed into 140-character snippets, attracted quite a lot of opprobrium. So allow me to explain my justification and respond to the criticisms. Before I go further, I should add that I'm definitely not pretending to be an expert on licenses, law, tech transfer, aggregate properties of big companies' IP policies, or anything else I'm writing about. Everything you're about to read is informed by limited personal experience and anecdote. In fact that's how I got here: I'm very aware that licensing is something I don't fully understand. For me, that's a strong motivation for avoiding making premature commitments that I might later regret. You have the opportunity of enlightening me, and might even change my position—but first please understand what my position is and why I hold it.

So, here goes: I claim that early-stage scientific proof-of-concept software is often better off without an explicit license. Naysayers (some hypothetical, but mostly real) claim the following.

“But then people can't legally use your software?” They can, but they should ask first.

“But people rarely bother to ask.” That's fine; this software is not aiming at getting lots of users.

“But you might miss out on users.” I don't want users.

“But you might miss out on collaborators.” I doubt it. Collaborators have to communicate, not just consume. They get in touch. Or, more likely, I meet them through other channels than their browsing my repositories online.

“But people aren't allowed to look at it without a license.” No, that's not how copyright works—quite the opposite. The code is published, so you're allowed to look.

“But company policies mean employees won't be allowed look at your code, even though it's allowed by law.” I've worked at two large IP-paranoid companies, and in both cases employees would happily exercise discretion about whether to look at code lacking a permissive license. (This was true even if official guidance left no room for their discretion.) Dealing with third-party code was also a well-established part of corporate culture. Among other things, this makes these developers much more likely to ask for permission than others. During my three months at one such company, in a very junior position, I was encouraged to make enquiries not only about licensing third-party software that appeared useful, but also about perhaps contracting the developer to support our effort. This surprised me at the time. But being a corporate entity dealing in non-permissively-licensed IP makes you more amenable to dealing with other people's IP in whatever way is appropriate. As I'll cover shortly, the well-known copyright paranoia (including what I'll call “GPL terror”) arises because of cases where the license on offer doesn't suit the prospecting company, and where there's good cause to fear litigation.

“But if they look at your code, they risk creating a derived work, so could get sued by your employer!” Companies know the pragmatics of IP better than this. If I were talking about Big Company Labs's github page, I'd agree that explicit licensing would be highly desirable. Big companies tend to be litigious. In all cases, the question is: who is liable to get sued by whom? But we're not talking about that. The risk to a company from simply looking at my code (emphasis important) is miniscule. The risk to me and to the public good, from premature commitment to a given license, is non-trivial, because it risks closing off opportunities later.

“Why not GPL your code?” I'm not against doing so in principle. In fact, even though I'm told it's falling out of popular favour, the GPL is one of my favourite licenses. But if I GPL'd my code, many more companies' employees would be scared off. Many big companies' legal departments have spawned corporate policies which foster a terror of GPL'd code. Partly this terror is justified: much GPL'd code is copyrighted by large competing companies, who don't want to relicense it and do want to sue infringers. However, the terror is not justified in all cases. I'm not a competing company, I am amenable to relicensing, and I'm not likely to sue infrigers anyway. So companies have no reason to be terrified of looking at my code. Unfortunately, many company procedures are written in a blanket style, with rules like “don't look at GPL'd code!”. that are not sensitive to these nuances.

“How is not putting any license possibly better? It confers strictly fewer rights!” I know it's perverse. But since these “don't look at GPL” policies exist, the absence of any GPL notice would actually make it more defensible, according to these policies, for employees to look at my code in any depth. The real problem here is that, as far as my anecdotes can see, many corporate legal people don't even understand the basic properties of the domain. Software doesn't “have” a license; it is licensed and can be licensed differently to different people. I sometimes say that licensing is “bilateral”, in an attempt to convey this multi-party “edge, not vertex” nature. Being GPL'd doesn't exclude other licensing possibilities. Sadly, corporate legal types tend to act as though it does. They're not the only ones. Some of my peers appear keen to (as I'll call it) “licensify” all scientific software. They seem to be acting on similar misunderstandings. I've even seen it written that one should not use the CRAPL because it allegedly makes it “impossible” to re-use the software. If you actually read and understand the CRAPL, it does no such thing (notwithstanding that as a license it is basically pointless). Even if the CRAPL were restrictive, the kind of software it's intended for—scientific software, as in my case—would almost always be available for relicensing anyway.

“Why not say “GPL, or other licenses by negotiation”?” It's not a terrible idea, but again, lack of nuance among corporate legal departments makes me believe that this will scare people off unnecessarily. I could be wrong about that.

“Why not MIT- or BSD-license your code?” I'm not against doing so in principle. But as a blanket policy, I worry that it'd weaken my ability to successfully do tech transfer later. In particular, the option of dual-licensing code as useful if you have already gone with a permissive license. This might weaken the position of any company set up to do the tech transfer. As I see it, this is the key difference between BSD and GPL from a researcher's point of view. In particular, note that pursuing tech transfer can be in the public interest (and bigger competitors are likely to be less public-spirited than a small start-up founded by scientists). I've nothing against permissive licenses in general; they just feel like too great a commitment to make in the early stages of a project. I happily release non-research software under these licenses (or even public domain) from time to time, as you can see. And I'll happily contribute under these licenses to other projects that have an established tech transfer model. The key points are that my early-stage projects don't yet have this, that there's more than one way to do it, and that licenses make a difference to what options are available.

“So you might later close off your research outputs for your own commercial gain?” Not at all. I'm dedicated to using my research outputs to create the greatest possible public social good. The question is how to do that. Whether BSD-style (permissive) or GPL-style (copyleft) licensing creates the most social good is still a matter of debate. It almost certainly varies case by case. Meanwhile, two other social goods need to be considered: being able to do more research, and actually getting research results transferred into practice. The commercial model can be useful in both of those regards, even though, ideologically speaking, it's not the model I prefer. It would be foolish to jeopardise these opportunities at the moment I publish my code, especially since nowadays the norm is to publish code early (no later than publishing the paper). Very often, at that stage it won't yet be clear how best to harness the code for social good.

“Aren't you inviting people to take the unreasonable risk of using your code without a license, which is breaking the law?” No. Anyone concerned with staying within the law is welcome to contact me about licensing terms. Anyone worried that I might sue them is welcome to do the same. This is no hardship: anyone interested in running or building on my experimental team-of-one research code almost certainly needs to contact me anyway—if not to build it, then to do anything useful with it. There's also an important separate point, which I'll return to shortly: scientific endeavour is built on personal contact and trust relationships, far more than either commercial or open-source software are built on those things. Even in the case of people who don't bother contacting me, the “reasonableness” of the alleged risk needs to be assessed in the context of the scientific culture.

“But in practice, if everyone followed your approach, won't people end up breaking the law more often?” Perhaps so. Most people break the law just going about their everyday business. Few of my countryfolk over the age of 30 have not transgressed our laws in countless ways: copying a tape, keeping a recording of a TV programme for longer than 28 days, wheeling a bike along a public footpath, borrowing their neighbour's ladder without asking, backing up a shop-bought DVD to their hard disk, and so on. Fortunately, the process of law enforcement involves discretion at the point of prosecution. This discretion is kept in check by social dynamics which, while mostly outside legal system, are very real. If my neighbour prosecuted me for borrowing his ladder, it'd render future reciprocal altruism considerably less likely. Suing good causes creates bad press, regardless of whether it can legally succeed. The law is overapproximate out of necessity, so that the genuinely problematic infringers can be pursued, not so that absolutely all contravening activity can be shut down. It presents the option to prosecute, not the imperative to do so. That's not an argument for egregious overapproximation in our laws. Obviously, legal overapproximateness can be vulnerable to abuse, and should be minimised by a process of continuous refinement. But aspiring towards eventually reaching a perfectly precise law is a hopeless ideal. The law is a practical tool, not a formal system—it is both ambiguous and inconsistent. The letter of the law needs to be viewed in the context of the legal process—one carried out with the frailty and imprecision that come from human weakness, and the malleability that comes from human discretion. In short, we are doomed to living with a background level of lawbreaking, and it's in our interest to live this way. With licensing, the right question to ask is: how can our use of licenses affect scientific research, and how should it? This means thinking about process.

“But don't big companies have good reason to be conservative? Won't this harm your relationship with them?” Yes, they do, but no, I don't believe so. Conservative policies are justified in a corporate context. For big companies in competitive markets, the stakes are high, and the capital cost of pursuing legal cases is affordable. If an equally large competitor had such a case, it could turn into a very costly court battle (regardless of who wins). That's why big companies are paranoid about IP: they're afraid of competitors. But they're not afraid of small guys, like researchers or universities. If I had a credible case that a big company were infringing my copyright, the company would very likely settle with me out of court. In any case, as I covered earlier, if they want to use my software, they will want to negotiate terms with me anyway.

“Isn't your attitude socially irresponsible?” I believe I have a stronger social conscience than most people I meet. Everything I've just said is a position I hold in the interest of maximising public good, whether or not my beliefs are correct or effective. So no, I don't believe I'm socially irresponsible at all. Here are some things that I believe are socially irresponsible: to allow corporate capture of publicly funded research outputs; to make licensing commitments of publicly funded work without very careful consideration; to needlessly jeopardise the future potential to exploit publicly-funded work to public-good ends; to allow the practices of big-company legal departments to get in the way of scientific endeavour.

“Licences are good, m'kay?” This seems to be some people's position, but it's not an argument.

“All sensible software projects have licenses!” Software that seeks users is one thing. I've already covered why I'm talking about something else.

“Why are you resisting doing the normal, “responsible” thing?” My situation isn't normal. In this abnormal situation, the normal thing is not necessarily the most responsible thing.

“GPL is not free, troll troll...” I'm trying not to get into that. But I do suspect that this whole debate is really GPL versus BSD in disguise. If you don't believe in copyleft, then the only open-source licenses you're likely to use are BSD or MIT, and there's not much useful flexibility to be gained by deferring the decision. So you can just license all your code straight away. BSD in effect chooses a fixed attitude to commercialisation (“anything is fine; what goes around comes around”) whereas GPL tends to create a fork in the road. I want to hang on to my option on that fork, until I know that doing away with it is the right thing to do. The fact that the permissive-versus-copyleft debate never dies is a powerful argument against coming down firmly on either side. It's better to take decisions in context, case by case. That's exactly the position I'm defending. Note that Bruce Montague's article advocating the BSD license model for tech transfer is right if you're talking about big projects—it's harder to commercialise GPL'd code than BSD'd code, and deliberately so—but completely wrong for most small research projects written by a small group of collaborators. In those cases, it's much easier to relicense the lot, in which case your options for transfer include a dual-license arrangement in the manner of Qt or MySQL. This still permits GPLing of the ongoing commercial development (or not), while retaining the option to license it for commercial development. It says a lot about how subtle the issues are that a simple tweak of context can more-or-less invert the up- and down-sides of the two licenses. (I actually find Montague's article quite fuzzy, even FUDdy, about the details—it only really says that BSD is “simpler”, which any old Einstein should be quick to challenge.)

“I once had licensing trouble, bad enough to require lawyers. Therefore, clearly explicit licensing is essential!” This is a complete non-sequitur. I don't doubt how problematic these things can get. But it doesn't mean that any blanket policy will improve things. If IP lawyers have come into a scene where science is supposed to be happening, it means something has already gone wrong. The culture clash is between science—where trust, good faith and cooperation are routinely assumed—and corporate licensing, where you can't trust anyone beyond what their contract or license binds them to. In short, this happens when big companies get involved and impose inappropriate licensing terms. These companies are where you should be directing your opprobrium—not at me.

To properly explain the latter point, I need to ditch the question-and-answer format (which was already becoming tedious, I'm sure). The argument came up in reference to the Richards benchmark. Let me summarise my understanding of it (corrections welcome). Martin Richards wrote and distributed some benchmarking code. It was distributed as a tarball, in the scientific spirit, with no license stated. It turned out (later) that he was happy to release it into the public domain. Trust, goodwill, reputation, responsibility, ethicality are currency among scientists. Martin's release occurred within that culture. My github repositories do likewise.

Some time later, Mario Wolczko produced a Java version of the Richards benchmark, while working at Sun Labs. Working for a big company makes a person more IP-fastidious than average. Mario very conscientiously sought permission from Martin (perhaps unnecessarily—I'm not convinced he was actually making a derived work, but perhaps he was; no matter). He was of course granted Martin's blessing. He circulated his new code, including to Martin. Martin even began distributing it in his tarball.

The catch was that Mario's employer was a large company, and would only allow the code to be released under some restrictive licensing terms. You can still read them. They come in two parts; the first page is statutory US export regulations while the second page contains Sun's own licensing terms. These include the following restriction.

“Licensee may not rent, lease, loan, sell, or distribute the Software, or any part of the Software.”

This is a severe restriction. It's especially inappropriate for a work of this kind—a benchmark. Benchmarking is a concern that cuts across research areas: it's not just people working on a particular technique or problem that want benchmarks. Anyone working on any aspect of a language implementation might want to apply some “standard” benchmarks. A benchmark is more useful the more things it has been applied to. That's why we often like to group them into suites and circulate them widely. This license is effectively incompatible with that use.

I'm sure Mario tried his best against the might of Sun legal. It's telling that his current web page advises the “caveat emptor” approach: make your own mind up about how likely you are to be sued, and act accordingly. I'd agree it's not good at all that we are being forced to take this approach around a big, litigious company (as Sun's new owner Oracle certainly is). But it seems bizarre to blame this on Martin's original sans-license release. The blame is entirely on Sun. The problem is not a lack of license statement from the little guy; it's an inappropriate license imposed by the big guy. The right response is to decline to use Sun's code. (That said, it's worth noting that Martin's redistribution of the Sun code, although technically a breach of the license terms, has not resulted in Sun or Oracle suing him. So perhaps they, too, understand the value of scientific culture and have no interest in disrupting it.)

To put all this more succinctly, the onus to license doesn't weigh equally on all parties. For artifacts coming out of big players, there's good reason to demand licenses—not just any license, but fit-for-purpose licenses—precisely because big players can afford to sue. By contrast, even the University of Cambridge is a comparatively little player. It is vanishingly unlikely to sue anyone pursuing public-good scientific or research agendas, not only because it's expensive to do so, but because the reputational damage would be huge. And even if you disagree that this particular university isn't a big player, that's immaterial because it doesn't own my research work's copyright anyway; I do.

I, like Martin, work within the scientific culture as a non-litigious “little guy”. It should go without saying that redistribution of my software towards scientific ends will always attract my blessing, not lawsuits—even if people don't ask first (mildly rude as that may be). Meanwhile, for commercial use, I'd like to defer fixing the exact terms until there's serious interest—so that I have some idea of how it's going to be used, and can choose terms appropriately. The pragmatics of “no license specified” and “restrictive license specified” are wildly different, even though a hard-headed legal interpretation makes the former a degenerate case of the latter.

The push towards “licensification” of scientific software actually works against this culture, since it erodes the norm that says that permission will be granted for doing reasonable things for scientific purposes. That's yet another reason why I resist this way of working. Licensification is also viral. If you believe strongly that all released artifacts should “have” “a license”, then you have to follow this to its transitive closure: to release anything that builds on anything, you must obtain a precise statement of license from all derived-from works. This is justified for commercially valuable software, but I don't believe it's the way science needs to work. And if you share my belief that premature commitment can do more harm than good, then we agree that science ought not to work that way.

You might complain that this “scientific culture” I've been talking about is nebulous. I'd agree. But capacity and inclination to sue are reasonable proxies. Call me naive, but if you really worry about the legality of doing science with artifacts released publicly by a reputable scientist working for a university, either you worry too much or the world has gone very wrong. On the other hand, it is reasonable to worry about artifacts released by big companies, because the culture they operate in is very different. That's why we need to pressure them into offering reasonable terms for the things they release, and decline to use their artifacts if they don't. That's what conspicuously failed to happen in the Sun/Richards case.

As more and more scientific research involves code, licensing of code among scientists is going to become much more of an issue. Misunderstandings are already rife: a quick bit of web searching will easily turn up things like this and this. We need to have a frank discussion about what the real issues are, and an open acknowledgement of their subtlety. I believe that blanket licensification is not the answer. It forces a decision to be taken too early, when too little information and expertise are available. I also fear that it will encourage a culture of litigiousness and bureaucracy that is neither desirable nor affordable within science.

As one tentative suggestion for how to make the law work better for us, I note that US copyright law's “fair use” allowance already covers educational purposes. Ideally it should also cover public-good non-commercial scientific purposes. In fact, this already is a factor in assessing fair use. But it's ill-defined, and I'm not aware of any useful case law on that front (are you?). If we want something to work towards, this kind of fair use law is what it should be—not the up-front licensification of every published scientific artifact.

Just as a coda: the above is what I currently believe, based on limited study and my very limited awareness of evidence, case law and the like. Although that belief has withstood a certain exposure to others telling me how wrong I am, it's not a conviction I'm strongly attached to. I went to the effort of writing the above not because I'm convinced I'm right, but because I don't like being branded “socially irresponsible”. Challenge my argument if you like, but don't doubt that I treat this issue with due care and attention. I'm still waiting for the counter-argument that will make me change my position. You might well have it. If you do, please let me know.

[/research] permanent link contact

Powered by blosxom

validate this page