Rambles around computer science

Diverting trains of thought, wasting precious time

Tue, 26 Nov 2013

(Tell me why) I don't like Java

I can just about cast my mind back to when I learnt Java. It was 2001, when I was a 17-year-old C++ programmer. (Actually the particular version of Java I first learnt was called C#, but let's not complicate things.) I remember being fairly impressed. It struck me as a C-family language that had kept enough of C++'s features to do abstraction neatly enough, modulo the occasional grumble, while making some impressive complexity savings by carefully-chosen omissions. (I knew nothing about Smalltalk at the time.)

At the time, many of these omissions were bold, and the way they worked together was, to my 17-year-old self, ingenious. Lack of subobject structure keeps garbage collection simple. In turn, garbage collection (with compaction) allows faster heap allocation, clawing back some of the cost of all these extra objects. Even doing away with templates, by relying on checked downcasts and a common supertype, had its elegance, despite now being supplanted by generics. Grumble-wise, I won't say I wasn't alarmed by the lack of multiple inheritance and operator overloading. But overall I wasn't unsympathetic to what Java was doing.

Nowadays, the mention of Java makes me groan (in my mind). It seems to be a combination of the intrinsic and extrinsic properties that makes it so. Thinking extrinsically, Java has a huge, huge mindshare. Like market share, mindshare is dangerous once one player gets hold of too much of it. Many intrinsic details of Java would be unremarkable warts in another context, but have a giant significance, because they're what a large proportion of programmers think are “the way programming is”. In reality, Java contains a selection of somewhat-arbitrary design choices made by clever but flawed human beings some time within the last couple of decades. By letting them become ingrained, we are closing people's minds and inhibiting adoption of better alternatives. What follows is a quick run-down of how I see the problems.

As a language, it's boring. This is actually one of the least bothersome things about Java, but it needs mentioning here. As a researcher, doing any work that is specific to the Java language feels like work not done properly, because it avoids a lot of potentially more interesting cases. (I'm being a little facetious here, but I hope the underlying point is visible.) Against today's competition, Java seems remarkably inexpressive as a language. C++ is cleaner than ever, Scala is achieving a C++-like complexity curve with nicer syntax and marginally less unclear semantics, and dynamic languages are faster than ever. What keeps Java going is its self-perpetuating ubiquity. It also helps that Java is in a sweet spot regarding IDE assistance: Java code has enough statically-derivable properties to do automations like refactoring and autocompletion reasonably precisely, while being simple enough—unlike C++ and, I venture, Scala—to implement them reasonably correctly without giant effort. Is this a good thing or not? It has no doubt delivered certain innovations to practitioners faster than would otherwise be possible. But right now, to me, its net effect seems to be to stifle innovation in better languages.

As a learning device, it's a dog's breakfast. This is probably my number-one beef. Java is too complex to cleanly expose concepts like procedures, structured data, or pure object-oriented programming—never mind functional styles of programming. Yet it is too semantically simplified (constrained) to reveal essential contrasts and orthogonalities, such as between by-value versus by-reference, heap versus stack, dynamic versus static type checking, and inheritance versus subtyping. Various conflations of these are the deliberate features of Java's design that make it simple (relative to C++, say). But now that Java is ubiquitous, they start to be harmful too. I concede that just because a language isn't the ideal vehicle for exploring conceptual distinctions doesn't make it worthless as a programming tool—far from it. But the effect of mindshare is huge. The concepts of programming are harder to teach in a world where we must teach Java, because it is not a vehicle for cleanly conveying any of these concepts. My impression is that today's programmers learn fewer languages than ever, so it is harder to establish contrasts. Even diverse languages are being shoehorned onto the JVM, further enshrining Java's [bytecode's] limited vocabulary as a set of fundamentals. It's like some perverse “worse is better” situation, except that the usual point in favour of the “worse” solution, namely simplicity, it not in evidence much. As I'll rant in a moment, Java is a very complex beast below the language level.

Portability is a myth. Java's portability was its key selling point, but it's not clear that it has succeeded. Although Java makes it hard to write code with nonportable semantics, that is only a small part of the portability puzzle. Libraries are a big issue. JVMs are complex enough to suffer compatibility problems among each other, too. So now we just have portabilty problems one level up. I've been sceptical about the drive for portability because in general, no two portability requirements are quite the same. Having “the same” behaviour among every program built using technology X”, which is my paraphrase of Java's portability sell, is primarily a benefit to the authors of technology X, not to the software's users, nor even to the application developers. For example, as Ian Lance Taylor pithily blogged (talking about Tk), saying that applications look the same on any platform means they look odd on every platform. Attempting to insulate the JVM and its libraries from all details of the host system has become a dogma pursued far beyond its usefulness. This theme continues....

As a deployment infrastructure, it's a mess. Again in the name of “portability”, the Java platform tries to hide all trace of the underlying system, defining its own conventions for configuration (classpath, system properties), resource limits (Java heap size), its own archive formats, its own security model, and so on. The problem is that the underlying system always has some way of doing these, and by duplicating this functionality, the end result is excess complexity and hard-to-understand interactions. Even despite massive uptake, the JVM isn't the only runtime. The approach of eliminating complexity by defining “one platform to rule them all” is a hopeless modernist ideal. In practice, it just adds one more platform to the mix, causing a strict increase in complexity. This is the key paradox of attempting to achieving simplicity through portability. It can only succeed if the portable thing completely hides the underlying complexity. This is less and less likely the higher up the stack you try it. Instead we should aim to build systems out of simple (but not necessarily portability-providing) pieces, then achieve systemic assurances by reasoning about whole compositions. Typically, the kind of “whole composition” I'm talking about would be a runtime plus an operating system plus an instruction set architecture. (Interestingly, I contend that stopping at the ISA is sensible; we don't have to go all the way down to microarchitecture. Unlike JVMs, ISAs are low-down enough that they can hide the underlying complexity fairly well.)

As a development infrastructure, it is diabolical. Debugging and dynamic analysis on the Java platform are hugely, horribly flawed. I wrote a paper about one aspect of this a while back. Primary evidence is how the leading implementation (namely Hotspot) implements various profiling and debugging tools (like hprof) using private interfaces (namely the “serviceability agent”), because the pseudo-standard interfaces aren't good enough. And that's in spite of their unfathomable complexity. These interfaces—JVMTI, JDI and friends—are not officially part of the JVM specification, and are pseudo-standards in that no two implementations are quite alike. They also offer inherently poor coverage, because two large chunks of the process's code—namely natives and the VM itself—are respectively not covered, or covered in limited fashion using a disjoint mechanism (which may or may not be implemented). As a result, if you want to do something as simple as observing all allocations of Java objects in your program, you have to do three different (and fiddly) things: bytecode instrumentation for a selection of bytecodes (creating objects and creating arrays), handle JVMTI's VMObjectAlloc callback, write JNI function inteceptors to catch creation from native code. Even then, you're a long way from understanding the memory behaviour of your program, since, surprise surprise, native allocations—including (on most VMs) those made by the VM itself—are actually very significant. There was even a paper at OOPSLA 2010 about this. There are analogous problems in other kinds of dynamic analysis in Java. Try implementing an information flow analysis at the JVM level, and you will be stymied, because information is continually flowing through native code and through the VM itself, and these are wide, complex, undocumented interfaces. (By contrast, doing this at the whole-process level requires only modelling the system call interface, which is manageably-sized and stable.) Java-specific interfaces simply cannot cover, by definition, the whole behaviour of your program.

Debugging technology has gone backwards This is a bit of a specialist rant, so forgive me. It builds on what I just wrote about dynamic analysis. The conventional approach to debugging is takes a simple and high-coverage abstraction, namely the memory image of the debugged process, as the baseline. On top of this, we selectively describe how source-level language features are realised. We does this by documenting the compiler's (and runtime's) implementation decisions, using standard (albeit quirky and complex) formats like DWARF. Apart from GNU Java, no Java implementation I know does this. Instead, they rely on Java-specific interfaces. In so doing, they take a big technological step backwards, for no better reason than expedience. I can excuse VM prototype developers for taking the short cuts of knocking up an in-process debug server with a fixed wire protocol, and building shim “tool interfaces” as thin abstractions of their internal APIs. But in a platform that is the outcome of billions-of-dollars product development, there is no such excuse. As with portability, Java adopts an approach which can only work if it owns the entire system. Since it does not, it fails to eliminate any complexity, and instead just adds more.

As a culture, it's conspicuously herd-like. I suppose this is a universal property of communities. Languages as practical tools tend to take on the properties of the individuals using them. That's why Haskell is over-clever, Python is hack-filled, C is rarely well commented and Java is bureaucratic and verbose. In each case you could say the same for these languages' advocates. My distate for Java owes partly to the fact that it is favoured by not-so-good programmers and middle managers. The effect permeates the community. Java people love reinventing things which existed before—only this time, it'd “pure Java!”. They also seems to love XML, a technology I despise. They outdid themselves by combining these two properties to spectacularly ill effect, as known to anyone who's ever edited a file called build.xml. Reading about Eclipse plugins makes me jaded. I don't know whether it's the chicken (programmers) or egg (Java language) that's to blame for all these problems. Either way, practice has forgotten far too many principles: of keeping it simple, of not duplicating mechanisms, of using information hiding to hide what is change-prone rather than just to make your code more complicated. The principles of object-orientation itself are long forgotten. To illustrate this latter point, I defer to the very nice (developer-focused) talk by Kevlin Henney titled after William Cook's careful statement that “it is possible to do object-oriented programming in Java”.

Here's an anecdote to finish. I just read a research paper containing the following two sentences.

The first kind of behaviors is interface behaviors. (Please note that the name has nothing to do with Java Interfaces.)

Sadly, I can understand why the qualification is necessary. But it really shouldn't be. If we're doing research, what we're doing should transcend the immediate technology, and we should be accustomed to a presentation which reflects that. But even in research communities, too many people equate doing X “for software” with doing X “for Java”, and this is a sad state of affairs.

[/research] permanent link contact


Powered by blosxom

validate this page