Rambles around computer science

Diverting trains of thought, wasting precious time

Fri, 05 Sep 2008

One True Language

Trying to write a parser in Haskell, I'd have thought that there'd be some nice parser generator available. Well, there are at least two attempts: Happy and Frown. Sadly, Happy is a bit on the inexpressive side---no sugared support for sequences, for example. Frown seems a lot more expressive, though not particularly well-maintained (it currently fails its own self-tests).

The really annoying thing, however, is that I'm going against the grain by doing this: nobody in the Haskell community seems much to like using parser generators. Instead, they prefer “parser combinators”. These are monads which essentially implement a recursive descent parser directly in Haskell. This is touted as an advantage, e.g. by the Parsec library which proudly declares that “users learn just one language!”. This argument is sometimes valid, but not here: everybody who is even thinking about writing a parser will know BNF. If the Haskell code were simple, everything would be fine. However, I already have a combinator-based parser for the grammar I'm trying to parse, and it's over 500 lines of almost comment-free Haskell, whereas the grammar itself is only 150 lines of EBNF-like notation. The killer problem is that the BNF is presented in the program documentation, and I want both the documentation and the parser to be generated from the same source. That's a fairly sane requirement, but I'm having no luck with it so far.

More generally, when will people give up on the ridiculous idea of “one true language” and accept that different jobs require different tools? The state of the art concerning language interoperability is truly lamentable, and new languages are being invented every day. Now, inventing a new language for research purposes, as a proof-of-concept for new features, is fine. But I have far, far more time for people who then work on integrating those features into existing well-supported languages than who persist in advocating an entirely new language. (Microsoft seem to be doing a pretty good job of the former with C#, whatever my doubts about a few of their proposed features.) Most language designers' idea of “interoperability” is an interface to C, and a painful one at that, but this just isn't good enough. I like the Haskell language, and respect a lot of the people involved; they've taken these issues fairly seriously, but from my experience so far, they still fall well short. Haskell code from five years ago doesn't compile today; the standard of documentation is poor (try finding documentation for runP, an apparently “standard” function); tools and libraries are flaky or nonexistent (a good debugger, anyone?) and there's a distinct lack of comprehensive texts (though Real World Haskell might be changing that as I write---it looks good). I've yet to try the FFI, but am not getting my hopes up.

When Bjarne Stroustrup last came to talk at the Computer Lab, I was surprised by the hostility with which some (nameless) programming languages researchers greeted his talk. He was essentially arguing that inventing new languages is not something to be done lightly, if you can get the same effect by extending an existing toolchain. He advocated an approach where you prototype your new features as a library for an existing base language, and then capture the additional semantic constraints (plus convenient syntax, useful error messages and the like) by writing a preprocessor or frontend for the base language. His mistake was perhaps that he didn't quite nail down the context of his argument: it was clearly aimed at languages which expect some industrial uptake (he cited R as an example of a failure), but this context remained unstated, and duly prompted undeserved bile. What's ironic is that, although many Haskell-heads would be horrified at the suggestion, the designs of C++ and Haskell have an awful lot in common. They both seek (successfully) to separate out algorithms from the data structures they operate on, and are designed around the important idea that built-in datatypes should not be special. Haskell goes further than C++, in that most of its standard operators are defined in a very small kernel, whereas in C++ you can just get “very close” to the same effect. It's also interesting that Haskell's typeclasses are exactly equivalent to C++'s concepts---admittedly a new-ish feature---despite the braindead attempts of much introductory Haskell literature to liken them to C++ classes, to which they are not at all similar.

That's enough ranting for now. All I want is a parser generator that works....

[/research] permanent link contact

validate this page