Rambles around computer science

Diverting trains of thought, wasting precious time

Thu, 19 May 2011

Namespace problems

It's always been a theoretical problem with C that there is no namespacing. I'd often wondered how much of a practical problem this really was, with “nothing major” my tentative answer. I've finally run into my first bona-fide gotcha arising out of this problem. In short: wxWidgets and GLib both define a GSocket.

Annoying as this is, it wouldn't be in my top ten grumbles about the experience of programming in C. A far bigger related problem is versioning. This doesn't get cited as a weakness of C because it's also a weakness of most other programming languages. The very reason I ran into the namespace problem was because I had to compile wxWidgets 2.6, rather than using the 2.8 revision that's packaged for my distribution. Version mismatches can be seen as namespace collisions too. Instead of getting the version you want, the namespace has been populated with slightly different stuff that is, despite its close relationship to what you actually require, still incompatible, much the same as if the namespace were polluted with random third-party stuff.

Versioning issues could perhaps be brought more under the programmer's control. Most programming languages don't have an explicit notion of “version” when importing stuff. But when explicitly consuming some target API, you are always assuming at least something about its version. Having the programmer declare which version of a set of declarations they want to import would be straightforward. In C, it could even be done quite neatly with just the preprocessor---say, #define __LIBFOO_REQUESTED_VERSION 4.2) before the relevant #include.

Of course, pessimistically refusing to link across nominal mismatches of version would be a bad solution. We want a more structural and, indeed, behavioural or “semantic” approach. With the C preprocessor approach I outlined, it becomes the header file author's responsibility to embed a test about which prior API version the associated implementation is compatible with, most likely using a simple #if test. This responsibility is not unreasonable I'd say---the developers are in the best place to say what has changed with a new revision. And since it's in a header file, if the maintainers are lazy, the client programmer can override it.

One shortcoming of this approach is that the client programmer might be too lazy to work out which is the earliest library version their code will work with, and will instead select whatever version they are developing with. This is safe, but prevents some valid compositions. On a different system with a slightly older version of the library, the header might conservatively conclude that it's not compatible with the client, even though it could work. Anyway, I don't worry about this too much. Lots of researchers have thought about versioning before, so there's probably some good solutions knocking around.

Back to the sockets example, it's perhaps unsurprising that the name collision occurred when linking two chunks of infrastructure code. Name collisions are most likely when abstracting the same domain, having the same natural language vocabulary---namely sockets in this case. This is much more likely to happen in infrastructure software (i.e. modelling system resources) than application level software (modelling circles or ellipses or airline reservations or health records and so on), simply because you're less likely to link multiple instances of the latter together. Whereas application-level code is at or near the top of the software dependency graph, the infrastructure stuff is lower down so more likely to get sucked into a program through dependency.

I was interested to note Nick Nethercote's recent blog entry about a problem with duplication (generally) and bloat (specifically) associated with multiple wrapper layers for system calls and other nonportable interfaces. He was talking about mmap(), but the socket abstraction is another example. I have some research proto-ideas that might help with this problem. Essentially I'm interested in recovering a more finer-grained style of interface description from code, based on the idea of “relational interfaces”. You could then use this description to infer that two sets of functions had very similar behaviour, and factor out the duplication (with appropriate refactoring or adaptation tools).

This whole problem is another consequence of our fragile direct-interfacing, in-order methods for constructing of software. If we had a more flexible way of constructing software, the problem wouldn't arise. Rather than slavishly building on predefined interfaces that are specific to one underlying component---like one mmap() abstraction layer, or one socket abstraction--- we need smarter tools for specifying our requirements abstractly and finding customised ways of satisfying them using a range of “found” code. This is what my Onward! '09 proto-paper was ranting about. I guess it's good that I'm still ranting. Interface hiding is as good an idea as ever, and more work on it will happen, when I get time....

[/devel] permanent link contact

Powered by blosxom

validate this page