Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 14 Dec 2011

Heterogeneity or homogeneity: what's the problem?

My attention was recently drawn to a problem that some web developers call the “language heterogeneity problem”. I'm not sure where the term comes from; in fact it is not as widely used as I was led to believe. But still, most people who have done web programming know that there are a lot of languages that people use for the web, not usually out of choice per se, and that this is somehow a problem.

The phrase “language heterogeneity problem” immediately jarred with me, since some of my work has been looking at heterogeneity of language as a goal, not a problem. Surely, we want to choose the best language for each part of our program, and not pay any unnecessary cost when doing so? Of course, the problem is about choice versus imposition. It's not that the ability to use multiple languages is bad. It's that in any given context, we don't have that ability! Consequently, you're forced to use particular languages for a given piece of code. This is the true “heterogeneity problem”. I'd couch the problem as lots of small homogeneity problems, not one big heterogeneity problem.

One of my recent student project ideas, sadly not yet attempted (or indeed advertised), is to develop a compiler back-end and run-time library that would let us compile vanilla C programs into web applications. So, for example, if I do printf() it will write some text to the page, and if I do fgets(..., stdin), it will generate a form field whose submission action is to activate the continuation of the program. There are some interesting extensions to this project. How do we partition a program into its client- and server-side halves, or its client-, server- and database-side thirds? Can we tune the partitioning given a set of requirements for security, interaction latency, client code size, and so on?

(There is also an interesting converse to this problem. Most programming languages' standard libraries are designed around a Unix-like model of I/O. And the first programs we teach---like the Hello World program---use this facility explicitly, by printing to streams and reading from them. But we now live in a world where most familiar forms of computing don't have an obvious terminal- or stream-style of I/O evident in their interface. So perhaps clinging to these examples is creating a barrier in front of potential students---who won't relate to the idea of a program doing I/O through a terminal?)

At SPLASH, I discovered that one chunk of my proposed student project effort has been scooped by Emscripten, an LLVM-to-Javascript compiler. However, since writing such a compiler would be too much work for a single project anyway, this might actually be helpful in enabling a single student project to achieve more working stuff. In other words, they could focus on matters other than the compiler, or on doing interesting domain-specific analyses on user code. Alternatively, perhaps a keen student could try to make their own compiler that does a better job than Emscripten, in some way. Hopefully I will managed to advertise the project in time for the next academic year.

[/research] permanent link contact

validate this page