Open Source in CL
The official motivation behind Delph-in is described on its
web pages, but I would like to explain my personal perspective here
(in fact, this considerably predates DELPH-IN since
we decided to make the work we were doing on the CSLI ERGO/LinGO
project Open Source in the mid-1990s). So, here are
the main reasons I can think of for making resources Open Source:
-
University research should be open in principle:
- ... the principle of openness in research - the principle of freedom of access by all interested persons to the underlying data, to the processes, and to the final results of research - is one of overriding importance.
Openness in Research (Stanford University, Research Policy Handbook 2.6)
- Making resources Open Sources
ensures that the researchers
do not run into any institutional difficulties
in distributing work or in continuing work at another institution
(academic or otherwise). This is of most concern to those without
permanent academic jobs.
-
CL needs serious resources, which cannot be
built and maintained
by a single university group (or even industry group, unless
the group is of reasonable size and persists over many years).
Making resources Open Source is the only practical way of achieving
flexible multi-group collaboration, especially if groups are
on different continents.
-
CL needs standards and historically
the successful ones have been created bottom-up.
Once a system is available and becomes reasonably widely used,
other systems may well be built which use the same interfaces etc.
At this point there is a de facto standard.
-
Seeing one's work used by other groups is very rewarding. Students
and other researchers are generally enthusiastic about contributing
to work that is going to be used outside a small group. While this
can be achieved without making the sofware Open Source, the question
of individual rights can be very complex if the work is not open.
- Academics are evaluated by the impact that their work has.
Open Source work generally has more impact.
- Licensing attempts are extremely time-consuming.
Some of the possible downsides are as follows:
- Open Source systems cannot be licensed to companies for monetary
gain. However, I think that the number of times an opportunity to
make a worthwhile gain has been missed is massively outweighed by the
amount of time that has been wasted in chasing licensing deals. Very
few academic resources are ever licensed for worthwhile sums of money.
Nearly all require the active participation of the developers to be of
use. Given this, I think that it makes more sense to ask companies
for consultancy and/or research support for the developers than to
expect them to pay just to license technology.
- University personnel in research administration can become very
unhappy about Open Source. It is very important to know your
university's official policy and to be prepared to stand your ground.
Remember that, although administrators like to think of academics as
unworldly, the chances are that you know far more about commercial
prospects in your area of research than they do, simply because they
are not specialists. If you want to make a resource that you are
developing Open Source, specify this on any connected grant applications.
- Some researchers are worried that others may take advantage of their
work and compete for grants or get earlier publication of results.
This is a possibility, but in most cases, it will be too difficult for
other researchers to exploit work in this way without the active
participation of the developers. Note that Open Source does not mean
that the author loses copyright and obviously use of someone else's
research work requires acknowledgement in any event.
- For Open Source to be of full benefit, researchers have to allow
others to make modifications to their code and accept that they will
sometimes screw up.
There are, however, some serious issues to think about in terms of
ongoing support. A system is only really useful if there is a
commitment to keep maintaining it, including supporting users who are
making use of aspects of the system that you are no longer interested
in yourself. Some users will have completely unrealistic expectations,
others may have reasonable expectations that you simply cannot meet
because of time pressure. Development of code and `lingware' does not
have the same status as writing papers. A piece of code used by 1000
people does not seem to carry the same number of brownie points as a
published paper cited by 10 people even though it may require many
times more effort to produce. One possible (though admittedly
partial) solution is to write decent documentation for the software
and an accessible explanation of the underlying theory and to publish
that as a book. (In this context, please see the CSLI CL series: we are very interested
in publishing books with associated software.) Another (partial)
solution is to link up with other people doing related work
and agree to jointly maintain the software.