Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 19 Nov 2014

How to write vaguely acceptable makefiles

It's not hard to write a clear, orderly makefile if you respect it as a programming job in its own right, and adopt some principles. I rarely see such principles written down. Here's what I've gravitated towards as I've learnt to use make more effectively.

[/devel] permanent link contact

Tue, 01 Jul 2014

Drop-in debugging

If you're debugging a problem that occurs in some process deep in a process tree, getting a debugger onto that process can be hard. I recently wrote a script called gdbrun that can help with this. The idea is that for any command cmd args, you can do gdbrun cmd args and it will run the program under a debugger, while externally behaving just like the original program. The exception is if the program doesn't terminate properly: in that case the debugger sticks around for you to interact with. (If you need a more complex condition than “doesn't terminate properly”, you can code that into the script too.)

This isn't ideal—say, if you only want to debug only one invocation of that binary among many. But it works in many cases. (I'm also working on a solution for the harder case, but one thing at a time.) The script is deceptively simple. Sadly it needs GNU bash, but for any non-GNU people it should be easy enough to port.

#!/bin/bash

exec 7<&0
exec 8>&1
exec 9>&2

quote () {
    sed "s/\([\`\\\\\\\"\\\$]\)/\\\\\\1/g"
}

exe="$1"
shift
argstring=""
ctr=0
while true; do
    if [[ -n "$1" ]]; then
        # don't use 'echo' to generate input for quote, otherwise "-e" will be disappear'd
        argstring="${argstring:+${argstring} }\"$( cat <<<"$1" | quote )\""
    fi
    ctr=$(( $ctr + 1 ))
    shift || break
done

exit_status_file=$(mktemp)

# can add -hold here to help debugging
xterm -e gdb \
    --eval-command "file $exe" \
    --eval-command "run $argstring </dev/fd/7 >/dev/fd/8 2>/dev/fd/9" \
    --eval-command "set logging file ${exit_status_file}" \
    --eval-command "set logging on" \
    --eval-command "print \$_exitcode" \
    --eval-command "quit \$_exitcode" </dev/null

inferior_status=$(cat ${exit_status_file} | sed 's/.* = //' )
exit ${inferior_status:-0}

What was hard? Firstly, it took some head-bashing to realise that sharing a terminal between the debugger and inferior wasn't going to work. That's why the script uses xterm—popping up terminal windows for an instant is a minor annoyance, but tolerable. I can't say for certain that a shared-terminal solution isn't possible, but the usual trick of stty -tostop wasn't enough to stop child processes hanging from contention for the terminal, so I gave up.

The second hard thing, and the hardest, was realising that I could thread duplicated file descriptors right down to the inferior, and open them with /dev/fd/7 and friends. This was non-obvious to poor old me. I first tried a solution with named pipes and cat processes forwarding data from/to the gdbrun script's streams to the inferior's streams. After much gnashing of teeth, this turned out to be unworkable. The reason is that pipes are asynchronous channels, meaning they're buffered. A consequence is that even if the inferior doesn't read from its input, some input data will be buffered in the pipe when it gets opened (actually it's opened by the shell that launches the inferior, using gdb's run <infile syntax), and then discarded when the inferior dies. So if you have a command like echo blah |(gdbrun true; cat) your blah will get gobbled into oblivion. The moral of the tale is that introducing cats and pipes is a recipe for mischief.

[/devel] permanent link contact

Sat, 23 Feb 2013

A curiously recurring explanation

In conversation recently(-ish), I tried to explain the infamous Curiously Recurring Template Pattern (CRTP) of C++ by relating it to mixins. That just turned one surprisingly tricky problem into two. Here I will try to rectify both problems by proivding a somewhat coherent explanation of how to realise mixins using CRTP.

/* defining a mixin */
template <class Instantiator>
class my_mixin
{
	int some_state;
public:
	void invoke_feature() { /* using some_state, somehow */ }
};

/* "mixing in" a mixin */
class built_from_mixins
 : public 
      my_mixin<built_from_mixins> /* , other mixins ... */
{
    /* ... */
};

Unlike normal base classes, a mixin is designed to attach at an arbitrary point in the subtyping hierarchy. It's an orthogonal feature that can be “mixed in” to many potential classes, rather than an increment on a particular class. Therefore, rather than referencing a specific base class, a mixin leaves this reference unbound. Nevertheless, it still gives a name to the class it is extending. Here it is called Instantiator. This can be thought of as a “forward reference” to the class that is “mixing it in”. The name is a placeholder for the class that the programmer will extend using the mixin.

(There are other variations on this theme in other mixin-like constructs. Anyone interested in the many meanings of “mixin” could do worse than to start with Richard Gabriel's essay which is based around this subject— though I note that its actual point about incommensurability is deeper, completely distinct, and very interesting!)

Looking at the code, we see there is a cyclical reference chain: from the mixin user built_from_mixins to a specialisation of the mixin itself my_mixin<built_from_mixins> and back (via Instantiator, which is instantiated to built_from_mixins). These references are a bit strange. We haven't even defined built_from_mixins at the point where we use it to parameterise our mixin instance. Why does the compiler even allow this?

The answer is that of course it's allowed, and the compiler allows it simply by virtue of the usual C++ (and C) rules about incomplete types. Cyclic reference among data type definitions is not unique to mixins. Consider a linked list, where it's no problem to create a recursive “next” pointer inside the list node type. Since pointers-to-incompletes are allowed, and the list node type is just another incomplete type at that location in the code, the code compiles fine.

It takes only a small generalisation to apply this not just to incomplete pointer target types, but more generally to incomplete types used as template parameter instances. Of course we can refer to built_from_mixins inside its own inheritance clause—but we can only do things that we can do with an incomplete type. Using it as a template parameter is one such thing—so long as the template's definition is consistent with its parameter being incomplete. In particular, possible usages of Instantiator inside my_mixin, above, are limited if we want to use the incomplete my_mixin as our Instantiator: we can only do the things we can do with any other incomplete types inside a class definition. Happily, my_mixin's definition sticks to this regime, so is well-formed. Moreover, it itself is a complete data type! (Similarly, if you defined a mutually recursive pair of data types using pointers, whichever one of them you defined first in your source file would be complete immediately, even though it contains a pointer to the yet-to-be-defined second data type.) Being complete, our instantiation of my_mixin is fair game for deriving from. This is what allows us to derive build_from_mixins from my_mixin<built_from_mixins>: the latter is complete, even though its type parameter built_from_mixins (known as Instantiator inside the mixin) isn't.

In fact, we haven't used Instantiator at all inside my_mixin. So, why include it? What can we do with it? Well, we can safely use it in any way we can use any other incomplete type: as a pointer (or reference) target type, or as a type parameter. An example is boost's enable_shared_from_this, a mixin which adds the necessary state and member functions for allowing a class to provide a shared_ptr version of its this pointer. You can't safely create a shared_ptr from a regular pointer in general because you don't know where the target object's reference count lives. The enable_shared_from_this mixin fixes this by embedding a pointer to the refcount, in the guise of a weak_ptr subobject, into the mixing-in class. The guts of enable_shared_from_this are basically as follows.

template<class T> class enable_shared_from_this
{
private:
    mutable weak_ptr<T> weak_this_;
public:
    shared_ptr<T> shared_from_this()
    {
        shared_ptr<T> p( weak_this_ );
        return p;
    }
};

Just as in our first example, we have some private state and a public interface which implement an orthogonal feature that can be “mixed in” to any class. The mixin-instantiating class T is referenced only in an incompleteness-tolerant way, to instantiate other templates and (eventually, inside the definition of weak_ptr, which is not shown) to define a pointer target type.

I've also seen CRTP described as “virtual functions without polymorphic behaviour”. What does that mean? Since our mixin has a reference to its instantiating class, it can call its methods—even though the binding to that specific class has not yet been formed. In other words, we have deferred the binding of methods—but not until run time. Rather, we have deferred them to later in compile time, when our templates are elaborated. Let's look at an example.

Let's try to get this deferred binding without using CRTP, but also without using virtual functions. Unsurprisingly, it doesn't work. The best we can do is to try non-polymorphic inheritance, by writing something like this.

class X
{
public:
	void f() { /* unimplemented */ }
	void g() { f(); }
};

class Y : public X
{
public:
	void f() { cout << "Done something"; }
};

Of course, if we call X::g(), it calls X::f() and not G::f(). Using CRTP, we can get it to call G::f() without resorting to virtual functions.

template <class Derived>
class X
{
public:
	void f() { /* unimplemented */ }
	void g() { Derived::f(); }
};
class Y : public X<Y>
{
public:
	void f() { cout << "Done something"; }
};

CRTP allows the base class to leave certain functions undefined, for definition later in many possible derived classes. The derived classes are not derived the usual way, though: they are derived using CRTP, passing the derived class back to the base class as a type parameter.

This sets up a now-familiar kind of cyclic reference: the base class refers to its (potentially many) derived classes, through its template parameter. Having this “forward” reference, down the inheritance hierarchy, as well as the usual backward reference up it, is what allows derivation-time binding. It's also limiting: we can't subsequently “re-override” Y::f(). Y's method bindings are fixed at derivation time. We have to create a new specialization of X and derive immediately from that, using some other means to get at Y's functionality if we need it.

Interestingly, note that it's okay for us to do Derived::f() in our X class template. This is surprising because at the point where we instantiate X, Derived is an incomplete type. I mentioned earlier that we were restricted in what we could do with incomplete template parameters, but in this case, here we are happily calling a method of ours. At definition time, there are at least two possibilities for the code that must be generated at the site of our call to Derived::f(), because f() might be an instance member function or a static. (It could also be a function object overloading operator().) When we instantiate X, if we read the code strictly top-to-bottom, we haven't yet got to the body of Y, so it is not yet decided whether it will define f() as a static or an instance member function. Somehow, the compiler is examining looking ahead at the definition of Y at the point where it elaborates X<Y>, even though Y cannot be complete yet (because we're in the middle of elaborating its base class). I must confess, this is where my understanding of the C++ language runs out. Thte standard contains complicated rules about name lookup in templates—principally the infamous “two-phase name lookup”. In short, “unqualified names” in templates are looked up at template definition time, whereas “qualified names” are looked up at instantiation time. Clearly our use of Derived::f() is a qualified name. No doubt there is some part of the standard detailing how the second-phase lookup is permitted (and required) to look into an incomplete type, namely Y in our example (incomplete at the time of X's instantiation), to understand the meaning of Y::f(). I haven't yet found a reference to it though. If anyone can point me to it, I'd be much obliged.

[/devel] permanent link contact

Wed, 27 Jun 2012

32 bits should be enough for anyone

For a brief while, 32-bit Intel machines were the de facto standard in commodity hardware , and life was simple. Certainly, it's an ugly architecture, gigantically overcomplicated by backwards-compatibility. Its virtual addressing features are terrifying. But the subset of it which user-level programmers on modern OSes use is fairly comprehensible. There is an ISA-defined stack with its own registers and a well-defined calling convention. Pointers and most integers are both 32 bits, meaning that the “word” is a useful and well-understood unit of storage.

All this changed in the early 2000s as AMD's 64-bit extension of the ISA came into popularity. Upon us were forced bigger integers, bigger pointers, and a more complicated stack and calling convention (in the name of “performance”, but at huge cost in complexity). I believe these were completely retrograde steps. Now that pointers are 64 bits, our software's memory footprint and disk footprint are bloated considerably. To “alleviate” this, and to avoid certain paradoxes about the size relationships between short, int and long, an int in most C compilers stayed at 32 bits. Unfortunately, this is completely braindead, because int is very much supposed to be a word-sized type. This is the reason that C's “defaults to int” semantics, as applying to unprototyped functions and untyped variables, are sane.

Does this matter? Yes! Here is some code that was mysteriously segfaulting for me this morning. It's from DTrace, or more specifically, Paul Fox's Linux port of it.

if ((P->rap = rd_new(P, P->pid)) != NULL)
  (void) rd_loadobj_iter(P->rap, map_iter, P);

Somehow, the pointer returned by the rd_new---which just wraps a simple calloc() call---gets corrupted immediately after returning. Suspiciously, said corruption is that the top four bytes are 0xffffffff, whereas the lower four bytes are those of the pointer returned by calloc(). Inspecting the disassembly around the rd_new call, we see something suspicious.

   0x000000000047ed12 <+150>:   callq  0x462bc6 <rd_new>
=> 0x000000000047ed17 <+155>:   cltq   

What's this cltq thing? Well, it takes the lower 32 bits of %rax (the 64-bit register holding the return value from rd_new()) and sign-extends them to fill the full 64 bits. This is exactly the corruption I was seeing. Why did the compiler insert this unwanted instruction? The answer is revealed if we recompile the file with -Wall.

Psymtab.c:645:3: warning: implicit declaration of function `rd_new' [-Wimplicit-function-declaration]

The function is not prototyped, so its return value defaults to int. But because int is now 32 bits wide, and the register holding the return value is 64 bits wide, the compiler helpfully obliterates the top 32 bits of the return value for us by sign-extending the lower 32 bits into them. If the compiler implementors had stuck with the intention of the int data type, that it be exactly a word in size, and therefore that “defaults to int” is sensible, this would not have arisen.

Now, this is clearly sloppy code. We should just fix it so that rd_new() is prototyped. It probably seems a bit of a nonsequitur that I am blaming this problem on 64-bit architectures. But on the other hand, how often in your code have you actually wanted integers that can store values in excess of 232? If you are a systems programmer, you might value the ability to encode large offsets. But we already had long long for this. In other cases, the vast bulk of our code deals with small integers, characters and pointers. Giving us an extra 32 bits of width in our ALU operations is a waste of transistors.

Why did we waste them this way? Well, we had to waste them somehow. In the early 2000s, we didn't really know what else to do with them, because (I suspect) there was little perceived demand for multiple cores in the commodity market (outside of servers). Nowadays, we have even more transistors, and even hardware guys realise that giving us 128-bit architectures would be a pointless waste. So, they spent some effort convincing us that we really did want multiple cores after all. And now we are busy complicating our software so that we can “exploit” these too. I have ranted before about how that risks generating a generation's worth of bad software. Perhaps I should say “another generation's worth”.

By the way, I'll concede that 64-bit address spaces can be useful, if they are used to support persistence or sharing. No need for pointer swizzling! But AMD's 64-bit x86 extensions do not provide the separation of protection from mapping to realise the sharing use-case. In other words, switching protection domains still means invalidating the TLB entries of shared mappings. Meanwhile, I haven't seen anyone using the extra address space for accessing persistent storage in a radically new way, although I'd love to see some approaches in this space.

I don't completely doubt the value of multiple cores either. The right way to see parallelism is as an enabler for radically more computation-intensive applications---likely to be in domains such as scientific computation, machine learning, and simulation---than what we can currently support. As I have also ranted about before, I am deeply disturbed by the fervour for mass rewriting of everyday software, and the disruption to the infrastructure it runs on, that is resulting from mindless multicore mania, in the same way that the 64-bit shift has disrupted our infrastructure. It's all in the name of performance, but it costs us far more of human beings' time and energy than it saves.

[/devel] permanent link contact

Fri, 01 Jun 2012

Link order

Initialization order of static state is a thorny problem. It's particularly tricky to get right portably. But until recently I didn't realise how tricky it could be even when restricting oneself to GNU tools on Unix platforms. Consider the following three-part program, consisting of an executable prog and two shared libraries lib1 and lib2. The dependency order is left-to-right in that list: prog depends on lib1 which depends on lib2.

/* prog.c */
#include <stdio.h>

/* from lib1 */
void greeting(void);

/* constructor */ 
static void init(void) __attribute__((constructor));
static void init(void)
{
	fprintf(stderr, "Initializing prog\n");
}

int main(void)
{
	greeting();
	return 0;
}

/* end prog.c */

/* lib1.c */
#include <stdio.h>

/* from lib2 */
void hello(void);

/* constructor */ 
static void init(void) __attribute__((constructor));
static void init(void)
{
	fprintf(stderr, "Initializing lib1\n");
}

void greeting(void)
{
	hello();
}

/* end lib1.c */

/* lib2.c */
#include <stdio.h>

/* constructor */ 
static void init(void) __attribute__((constructor));
static void init(void)
{
	fprintf(stderr, "Initializing lib2\n");
}

void hello(void)
{
	printf("Hello, world!\n");
}

/* end lib2.c */

Here's a GNU Makefile to tie it all together.

CFLAGS := -g -fPIC
LDFLAGS := -L$$(pwd) -Wl,-R$$(pwd)
LDLIBS := -l1 -l2

default: lib1.so lib2.so prog

%.so: %.c
	$(CC) $(CFLAGS) -shared -o "$@" "$<"

clean:
	rm -f lib1.so lib2.so prog

Now when you do make (or gmake) it will build a program that initializes its libraries in right-to-left order: from the “most depended on” to the “least depended on”. We can verify this by running the program.

$ ./prog
Initializing lib2
Initializing lib1
Initializing prog
Hello, world!

Moreover, if you try flipping around the link order in the LDLIBS line, the link will fail with undefined reference to `hello', because the reference to hello (in lib1) is introduced after the reference to lib2, and the linker's defined behaviour is to avoid re-scanning for new undefined references---it's up to the invoker to order the libraries so that this works.

Let's try this on a BSD system. I have a NetBSD 5.0 VM hanging around, so I'll try that. It has recent GNU make, GCC and GNU binutils installed.

$ gmake
cc -g -fPIC -shared -o "lib1.so" "lib1.c"
cc -g -fPIC -shared -o "lib2.so" "lib2.c"
cc -g -fPIC  -L$(pwd) -Wl,-R$(pwd)  prog.c  -l1 -l2 -o prog
$ ./prog
Initializing lib1
Initializing lib2
Initializing prog
Hello, world!

Strangely, our initialization order is flipped. This doesn't matter for our program, but if lib1 consumed some static state in lib2, it would matter quite a bit. What happens if we flip the link order around to compensate? We edit the LDLIBS line and re-make.

$ nano Makefile
$ gmake clean && gmake
rm -f lib1.so lib2.so prog
cc -g -fPIC -shared -o "lib1.so" "lib1.c"
cc -g -fPIC -shared -o "lib2.so" "lib2.c"
cc -g -fPIC  -L$(pwd) -Wl,-R$(pwd)  prog.c  -l2 -l1 -o prog
$ ./prog
Initializing lib2
Initializing lib1
Initializing prog
Hello, world!

This has done what we want. But what's going on? This link order didn't even work on GNU/Linux. Not only does it work on BSD, but it's required if we want a sensible initialization order. Our initializers run in left-to-right order, so we need to put the “most depended on” libraries first, not last. This isn't a BSD quirk per se, because we're using the GNU linker in both cases. I suspect the linker scripts are nevertheless different in the two cases. However, I haven't had time to look into the details of why. I'd be interested to hear, if anyone knows. I guess this is the sort of pecularity that gives libtool a reason to exist.

[/devel] permanent link contact

Tue, 13 Dec 2011

Load addresses

For reasons I will only hint at, I want to predict the load address of a set of shared libraries, given an executable that links against them. Naturally, I have turned off address space layout randomization.

At first, I thought I could use ldd for this. It seems to work.

$ ldd /usr/local/src/git-1.7.5.4/git
        linux-vdso.so.1 =>  (0x00007ffff7fdd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7dc3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff7ba5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7806000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fde000)

But also, there is an environment variable called LD_TRACE_LOADED_OBJECTS that is supposed to have the same effect. As it happens, ldd is just a small wrapper script which sets this variable and invokes the dynamic linker, which on my system is /lib64/ld-linux-x86-64.so.2. Let's try doing this directly.

$ LD_TRACE_LOADED_OBJECTS=1 /usr/local/src/git-1.7.5.4/git
        linux-vdso.so.1 =>  (0x00007ffff7ffb000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7bc4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff79a7000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7608000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7ddc000)

That seems to work too. But wait! It's given us different load addresses than ldd did. Have I really turned off randomization? Well, yes. In fact, repeating either of these commands will reliably yield the output above, and they are reliably different from one another. What is going on?

Let's hack ldd so that it prints exactly what command it is going to execute.

$ ldd /usr/local/src/git-1.7.5.4/git
About to eval:  LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION= LD_VERBOSE= /lib64/ld-linux-x86-64.so.2 /usr/local/src/git-1.7.5.4/git
verify_out is 
        linux-vdso.so.1 =>  (0x00007ffff7fdd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7dc3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff7ba5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7806000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fde000)

So, it has set a bunch of other environment variables to empty strings. They look innocuous enough. But also, it is invoking the loader directly, whereas we were just letting execve call the loader for us. Can we reproduce the result of ldd by running the same command it does?

$ LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION= LD_VERBOSE= /lib64/ld-linux-x86-64.so.2 /usr/local/src/git-1.7.5.4/git
        linux-vdso.so.1 =>  (0x00007ffff7fdd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7dc3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff7ba5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7806000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fde000)

Yes, we can. Now, the big question: which one is correct? Let's run our program under gdb and inspect the memory map.

$ gdb --args /usr/local/src/git-1.7.5.4/git
... snipped ...
(gdb) break main
Breakpoint 1 at 0x404bf0: file git.c, line 509.
(gdb) run
Starting program: /usr/local/src/git-1.7.5.4/git 
[Thread debugging using libthread_db enabled]

Breakpoint 1, main (argc=1, argv=0x7fffffffddd8) at git.c:509
509     {
(gdb) print getpid()
$1 = 27023
(gdb) shell cat /proc/27023/maps | grep 'lib.*\.so.*'
7ffff7608000-7ffff779d000 r-xp 00000000 08:07 356590                     /lib/x86_64-linux-gnu/libc-2.13.so
... snipped ...

So, libc-2.13.so has been loaded at address 0x7ffff7608000, which is what we got from running with just the LD_TRACE_LOADED_OBJECTS flag set, and not what we got with ldd or with invoking ld-linux.so.2 specially.

Why the difference? Clearly, first execveing the loader perturbs the address assignment. It's not clear why this should be---isn't the loader itself the first thing to be loaded anyway? I'm not yet sure what is going on.

Another question: is predicting the load address even a sound thing to do? Given that we had to disable randomization in the first place, it seems like a bad idea. In my case, this approach will do for now, but ultimately I should defer my work until application start-up time. Then we can discover the actual loaded addresses of the various libraries, which is much more robust.

[/devel] permanent link contact

Thu, 01 Dec 2011

Weak dynamic symbols

Although I know more than the average bear about linkers, there's always things I don't know. Until now I never had cause to understand the following: how does the linker know which symbols to link dynamically, and which to link statically?

A bit of head-scratching reveals the only possible answer. Given a command-line, it does the usual thing: look at the command-line options, perform the standard library look-up procedure, gathering a list of object files---some static, some dynamic. If a symbol is defined by a dynamic library, make it dynamic. Otherwise, it stays static.

That sounds fairly sensible. But it can mean surprises. Suppose you have a C program that wants to support an optional feature to be linked in at load time. You might write something like the following.
int optional_function(int arg1, void *arg2) __attribute__((weak));

/* ... */

void do_something(void)
{
    if (optional_function) optional_function(42, &some_obj);
    /* else skip the optional part... */
}

If you pull this sort of trick within a shared library, it works fine. But inside an executable: no dice! If you compile this into an executable and look for optional_function in your dynamic symbol table, you'll be disappointed.

$ objdump -T my-program | grep optional_function
$

What is going on? Well, it's in the static symbol table, silly.

$ objdump -t my-program | grep optional_function
0000000000000000  w      *UND*  0000000000000000          optional_function
$

What does it mean to have an undefined symbol in your executable's static symbol table? It means it will silently take the value zero! In fact, the relocation records referencing your symbol have already been discarded.

$ objdump -rRd my-program | grep -A1 -B1 callq
(snip)
--
  400549:      bf 2a 00 00 00        mov    $0x2a,%edi
  40054e:      e8 ad fa bf ff        callq  0 <__init_array_end>
  400553:      b8 00 00 00 00        mov    $0x0,%eax
--
(snip)

Cheerily, the linker has inserted a direct-bound call to address zero in your code. That's not what we want! So, how can we fix it?

The trick is in the linker's (or at least the GNU linker's) --dynamic-list option. First, create a file called whatever you like (mine's called dynamic-list), containing the following.

{ optional_function; };

Now link your program passing --dynamic-list <your-dynamic-list> to the linker.

gcc -Wl,--dynamic-list -Wl,<your-dynamic-list> -o my-program my-program.c

Hey presto! You should now have your weak symbol in the dynamic symbol table.

$ objdump -t my-program | grep optional_function
0000000000000000  w   D  *UND*  0000000000000000          optional_function
$

That's a bit ugly. Recalling the linker behaviour I described at the beginning, the simpler way to do it is just to link your executable against a shared library defining optional_function.

You might wonder (as I do): what is the point of putting undefined symbols in an executable's static symbol table? Once the executable is output, it's too late to link anything with them. Surely they should all be “promoted” to dynamic symbols? [Update, 2012-5-19: there is of course a linker option for doing this, which in the GNU case is --export-dynamic. Still, I'm not sure why it isn't the default.]

It would also be nice to have an objcopy option for adding dynamic symbols in this way, so we can do it after the fact, rather than changing the linker command like we did above. However, this is nontrivial for the reason I mentioned---the relocation records that you would want have already been eliminated. So, we would need to re-create them. This is similar to something I began work on before. At some point I might resurrect my objcopy patches and try to repurpose them to this problem. For now, I will just hack in the extra linker options.

[/devel] permanent link contact

Thu, 06 Oct 2011

LLVM structural typing

I'm learning about the LLVM compiler infrastructure at the moment.

LLVM bitcode includes a notion of data types. These are used to control implicitly the size and encoding of values generated by various operations, to hint about mappings to underlying machine data types (e.g. on architectures that distinguish floating-point from integer registers) and to implicitly cause certain transformations to be effected, such as padding or sign extension. (I'm not yet sure whether all such operations need to be explicitly rendered as an LLVM “bitcast” operation or not. At least, LLVM's notion of types can be used to define validity of these operations, whether or not they happen implicitly.)

Moreover, addresses (pointers) are typed according to the type of the values they reference (point to). The data types are in this sense higher-order. (This is a weaker case of “higher-order” than types specifying the behaviour of functions. But it has some things in common. I will blog about this more in the future.) These data types control implicitly how much data is read or written by indirect loads and stores.

A typical C front-end will encode C data types directly into this type system. However, this is just a convenience. The encoding discards some of the semantics of the C type system, because in LLVM, composite types are treated purely structurally, whereas in C, they are always treated nominally. Consider this program.

#include <stdlib.h>

struct Foo {
  int a;
  int b;
};

struct Bar {
  int x;
  int y;
};

int main(void)
{
  struct Foo *f = (struct Foo *) malloc(sizeof (struct Foo));
  struct Bar *b = (struct Bar *) malloc(sizeof (struct Bar));

  free(f);
  free(b);

  return 0;
}

In LLVM bitcode, using llvm-gcc, we get the following.

; ModuleID = 'test.o'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

%struct.Bar = type { i32, i32 }
%struct.Foo = type { i32, i32 }

define i32 @main() nounwind {
entry:
  %retval = alloca i32
  %0 = alloca i32
  %f = alloca %struct.Bar*
  %b = alloca %struct.Bar*
  %"alloca point" = bitcast i32 0 to i32
  %1 = call noalias i8* @malloc(i64 8) nounwind
  %2 = bitcast i8* %1 to %struct.Bar*
  store %struct.Bar* %2, %struct.Bar** %f, align 8
  %3 = call noalias i8* @malloc(i64 8) nounwind
  %4 = bitcast i8* %3 to %struct.Bar*
  store %struct.Bar* %4, %struct.Bar** %b, align 8
  %5 = load %struct.Bar** %f, align 8
  %6 = bitcast %struct.Bar* %5 to i8*
  call void @free(i8* %6) nounwind
  %7 = load %struct.Bar** %b, align 8
  %8 = bitcast %struct.Bar* %7 to i8*
  call void @free(i8* %8) nounwind
  store i32 0, i32* %0, align 4
  %9 = load i32* %0, align 4
  store i32 %9, i32* %retval, align 4
  br label %return

return:                                           ; preds = %entry
  %retval1 = load i32* %retval
  ret i32 %retval1
}

declare noalias i8* @malloc(i64) nounwind

declare void @free(i8*) nounwind

Notice that although the compiler has emitted two LLVM type definitions, one for each of our struct types, it then proceeds to use only the first one of them. The second is redundant, because the two are structurally equivalent. This starts to look even more peculiar when we make our data types recursive.

#include <stdlib.h>

struct Foo {
  int a;
  int b;
};

struct Bar {
  int x;
  int y;
};

struct FooRecursive {
  int a;
  struct FooRecursive *next;
};

struct BarRecursive {
  int a;
  struct BarRecursive *next;
};

int main(void)
{
  struct Foo *f = (struct Foo *) malloc(sizeof (struct Foo));
  struct Bar *b = (struct Bar *) malloc(sizeof (struct Bar));

  struct FooRecursive *fr = (struct FooRecursive *) malloc(sizeof (struct FooRecursive));
  struct BarRecursive *br = (struct BarRecursive *) malloc(sizeof (struct BarRecursive));
  
  free(f);
  free(b);
  free(fr);
  free(br);
  
  return 0;
}

This gives us the following.

; ModuleID = 'test.o'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

%struct.Bar = type { i32, i32 }
%struct.BarRecursive = type { i32, %struct.BarRecursive* }
%struct.Foo = type { i32, i32 }
%struct.FooRecursive = type { i32, %struct.BarRecursive* }

define i32 @main() nounwind {
entry:
  %retval = alloca i32
  %0 = alloca i32
  %f = alloca %struct.Bar*
  %b = alloca %struct.Bar*
  %fr = alloca %struct.BarRecursive*
  %br = alloca %struct.BarRecursive*
  %"alloca point" = bitcast i32 0 to i32
  %1 = call noalias i8* @malloc(i64 8) nounwind
  %2 = bitcast i8* %1 to %struct.Bar*
  store %struct.Bar* %2, %struct.Bar** %f, align 8
  %3 = call noalias i8* @malloc(i64 8) nounwind
  %4 = bitcast i8* %3 to %struct.Bar*
  store %struct.Bar* %4, %struct.Bar** %b, align 8
  %5 = call noalias i8* @malloc(i64 16) nounwind
  %6 = bitcast i8* %5 to %struct.BarRecursive*
  store %struct.BarRecursive* %6, %struct.BarRecursive** %fr, align 8
  %7 = call noalias i8* @malloc(i64 16) nounwind
  %8 = bitcast i8* %7 to %struct.BarRecursive*
  store %struct.BarRecursive* %8, %struct.BarRecursive** %br, align 8
  %9 = load %struct.Bar** %f, align 8
  %10 = bitcast %struct.Bar* %9 to i8*
  call void @free(i8* %10) nounwind
  %11 = load %struct.Bar** %b, align 8
  %12 = bitcast %struct.Bar* %11 to i8*
  call void @free(i8* %12) nounwind
  %13 = load %struct.BarRecursive** %fr, align 8
  %14 = bitcast %struct.BarRecursive* %13 to i8*
  call void @free(i8* %14) nounwind
  %15 = load %struct.BarRecursive** %br, align 8
  %16 = bitcast %struct.BarRecursive* %15 to i8*
  call void @free(i8* %16) nounwind
  store i32 0, i32* %0, align 4
  %17 = load i32* %0, align 4
  store i32 %17, i32* %retval, align 4
  br label %return

return:                                           ; preds = %entry
  %retval1 = load i32* %retval
  ret i32 %retval1
}

declare noalias i8* @malloc(i64) nounwind

declare void @free(i8*) nounwind

Notice that the self-referencing structure of FooRecursive has been lost, again because a different type is structurally equivalent.

Now for a final experiment: what about singleton structs? Are they structurally equivalent to a single element? I'll throw in a typedef too, to see whether that appears.

#include <stdlib.h>

struct Foo {
  int a;
};
typedef int Baz;

int main(void)
{
  struct Foo *f = (struct Foo *) malloc(sizeof (struct Foo));
         Baz *b =        (Baz *) malloc(sizeof        (Baz));
  
  free(f);
  free(b);
  
  return 0;
}

Here's the code it generates.

; ModuleID = 'test.o'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-f128:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

%struct.Foo = type { i32 }

define i32 @main() nounwind {
entry:
  %retval = alloca i32
  %0 = alloca i32
  %f = alloca %struct.Foo*
  %b = alloca i32*
  %"alloca point" = bitcast i32 0 to i32
  %1 = call noalias i8* @malloc(i64 4) nounwind
  %2 = bitcast i8* %1 to %struct.Foo*
  store %struct.Foo* %2, %struct.Foo** %f, align 8
  %3 = call noalias i8* @malloc(i64 4) nounwind
  %4 = bitcast i8* %3 to i32*
  store i32* %4, i32** %b, align 8
  %5 = load %struct.Foo** %f, align 8
  %6 = bitcast %struct.Foo* %5 to i8*
  call void @free(i8* %6) nounwind
  %7 = load i32** %b, align 8
  %8 = bitcast i32* %7 to i8*
  call void @free(i8* %8) nounwind
  store i32 0, i32* %0, align 4
  %9 = load i32* %0, align 4
  store i32 %9, i32* %retval, align 4
  br label %return

return:                                           ; preds = %entry
  %retval1 = load i32* %retval
  ret i32 %retval1
}

declare noalias i8* @malloc(i64) nounwind

declare void @free(i8*) nounwind

Predictably, the typedef has gone away entirely, because it introduces no new structure. However, our singleton struct has stayed around. This isn't surprising either, because LLVM has instructions for accessing field members, whose semantics are affected by these structural differences. Composite types are not just sugar for arrays of bytes or words.

This does mean that if we wanted to encode nominal types into our LLVM bitcode, we could do it by wrapping nominally distinct types in differing depths of layered singleton structs. This would affect the bitcode that came out, e.g. inserting extra GetElementPtr operations, but shouldn't affect the optimised compiler output.

Overall, we can say that LLVM's data types are an annotation useful primarily for propagating and/or inferring data size and encoding information contextually through load and store operations. They are also used for checking: the bitcode is checked for first-order type errors. Since they are raised at the point of pointer use (e.g. a pointer assignment without appropriate bitcast is an error), they can catch likely first-order errors early, i.e. when a pointer is generated or stored, rather than later when it is dereferenced and its target data is misinterpreted. Here by “first-order type errors” I roughly mean those at the machine level, meaning they always concern the misinterpretation of bit-patterns encoding primitive values (integers, addresses, floating-point numbers). Since nominally distinct data types are conflated when they are structurally equivalent, then without using the encoding trick I mentioned, the bitcode will not capture, say, that one variable encodes polar coordinates by x and r, and another by r and theta. Detecting violation of these abstractions is beyond the reach of any analysis based (only) on LLVM data types using the standard encoding.

This includes dynamic analyses. Right now I'm writing an is_a function for KLEE. KLEE is (effectively) a fancy LLVM interpreter. Without recourse to the source code and/or hacking the C front-end, we can only do structural is_a, which is slightly disappointing. I should add that I'm not criticising LLVM here at all. Intermediate representations are not the place for fancy type systems, and the structural approach works nicely. It just means more work for me, when it looked for a moment as though I could abuse pre-existing functionality.

[/devel] permanent link contact

Wed, 01 Jun 2011

Memtable again

I've finally got a nicely packaged implementation of memtables, the data structure I introduced in a previous blog post. It's in a single header---memtable.h. I've also fixed a couple of stupid bugs that crept into malloc_hooks.c just before I released the initial version. You can see an example of combining these two in heap_index_hooks.c---which you can compile into a simple LD_PRELOADable shared library that will instrument the glibc malloc to keep an indexed table of allocated chunks, keyed on address. It's pretty easy to search the memtable to find the heap chunk for a given pointer anywhere into an object. I'll integrate this into my Cake runtime implementation soon, and the whole lot will appear here in due course (i.e. eventually).

If you use any of these files, please drop me a line to say how you got on---I'd really appreciate it.

[/devel] permanent link contact

Thu, 19 May 2011

Namespace problems

It's always been a theoretical problem with C that there is no namespacing. I'd often wondered how much of a practical problem this really was, with “nothing major” my tentative answer. I've finally run into my first bona-fide gotcha arising out of this problem. In short: wxWidgets and GLib both define a GSocket.

Annoying as this is, it wouldn't be in my top ten grumbles about the experience of programming in C. A far bigger related problem is versioning. This doesn't get cited as a weakness of C because it's also a weakness of most other programming languages. The very reason I ran into the namespace problem was because I had to compile wxWidgets 2.6, rather than using the 2.8 revision that's packaged for my distribution. Version mismatches can be seen as namespace collisions too. Instead of getting the version you want, the namespace has been populated with slightly different stuff that is, despite its close relationship to what you actually require, still incompatible, much the same as if the namespace were polluted with random third-party stuff.

Versioning issues could perhaps be brought more under the programmer's control. Most programming languages don't have an explicit notion of “version” when importing stuff. But when explicitly consuming some target API, you are always assuming at least something about its version. Having the programmer declare which version of a set of declarations they want to import would be straightforward. In C, it could even be done quite neatly with just the preprocessor---say, #define __LIBFOO_REQUESTED_VERSION 4.2) before the relevant #include.

Of course, pessimistically refusing to link across nominal mismatches of version would be a bad solution. We want a more structural and, indeed, behavioural or “semantic” approach. With the C preprocessor approach I outlined, it becomes the header file author's responsibility to embed a test about which prior API version the associated implementation is compatible with, most likely using a simple #if test. This responsibility is not unreasonable I'd say---the developers are in the best place to say what has changed with a new revision. And since it's in a header file, if the maintainers are lazy, the client programmer can override it.

One shortcoming of this approach is that the client programmer might be too lazy to work out which is the earliest library version their code will work with, and will instead select whatever version they are developing with. This is safe, but prevents some valid compositions. On a different system with a slightly older version of the library, the header might conservatively conclude that it's not compatible with the client, even though it could work. Anyway, I don't worry about this too much. Lots of researchers have thought about versioning before, so there's probably some good solutions knocking around.

Back to the sockets example, it's perhaps unsurprising that the name collision occurred when linking two chunks of infrastructure code. Name collisions are most likely when abstracting the same domain, having the same natural language vocabulary---namely sockets in this case. This is much more likely to happen in infrastructure software (i.e. modelling system resources) than application level software (modelling circles or ellipses or airline reservations or health records and so on), simply because you're less likely to link multiple instances of the latter together. Whereas application-level code is at or near the top of the software dependency graph, the infrastructure stuff is lower down so more likely to get sucked into a program through dependency.

I was interested to note Nick Nethercote's recent blog entry about a problem with duplication (generally) and bloat (specifically) associated with multiple wrapper layers for system calls and other nonportable interfaces. He was talking about mmap(), but the socket abstraction is another example. I have some research proto-ideas that might help with this problem. Essentially I'm interested in recovering a more finer-grained style of interface description from code, based on the idea of “relational interfaces”. You could then use this description to infer that two sets of functions had very similar behaviour, and factor out the duplication (with appropriate refactoring or adaptation tools).

This whole problem is another consequence of our fragile direct-interfacing, in-order methods for constructing of software. If we had a more flexible way of constructing software, the problem wouldn't arise. Rather than slavishly building on predefined interfaces that are specific to one underlying component---like one mmap() abstraction layer, or one socket abstraction--- we need smarter tools for specifying our requirements abstractly and finding customised ways of satisfying them using a range of “found” code. This is what my Onward! '09 proto-paper was ranting about. I guess it's good that I'm still ranting. Interface hiding is as good an idea as ever, and more work on it will happen, when I get time....

[/devel] permanent link contact

Memtables

At the MMNet workshop in Glasgow last week, I talked about memtables. These are an efficient associative data structure, built using virtual memory support on modern OSes (currently implemented for Linux only), that are useful whenever you want to key a table on addresses in memory. See my slides for more.

Since entries with numerically similar keys are stored close to each other, memtables are, like certain other associative data structures, amenable to searching within a key range as well as exact-match lookups. By contrast, hash tables can't do this. (That said, a hash table supporting duplicate keys can be used to store items grouped into small equivalence classes. This is sometimes good enough, and could be made to work in my case. Nonuniform key duplication will mess up the O(1) nature of hash tables though.)

Memtables seem like they could be useful in lots of places. I invented them for DwarfPython as a fast way of storing and retrieving metadata given a key that may be an interior pointer (hence the searching requirement). I'm also (soon) using them in Cake as a fast way of tracking what objects are associated with what other objects.

The key space doesn't have to be addresses. It's possible we could even use memtables for free chunk binning, since large sizes are sparsely used. I need to do some more experiments to establish this.

The implementation comes in two parts:

  • A generic set of malloc hooks for glibc: these hooks aree “generic” in that they're designed to be easily specialised for various conceivable kinds of instrumentation. They're not generic with respect to the allocator---sadly they're specific to glibc, but most mature allocators should have some similar mechanism. The key usefulness in these hooks is factoring the various cases of the malloc API ---specifically the complex behaviour of realloc, but also other annoyances including memalign and null frees--- into an easy-to-use set of higher-level hooks. These are likely (but not guaranteed) to be a better match for whatever your instrumentation is doing. For example, defining a post_successful_alloc() function will hook all events that allocate a new heap block, whether they originated in a malloc(), a realloc() or a memalign().
  • a generic memtable library: this will appear soon! It's a set of hook definitions that maintain a memtable, and a lookup function.
  • Memtables are strictly faster than a hash table, at least for lookups, because they are basically a hash table without the hashing. At least for most applications of memtables, the table itself acts as an index for a bunch of linked lists---call them bins or buckets. Rather than mapping the key space onto a smaller hash space in order to keep the index small, we index directly by the key, and rely on the virtual memory trick to keep the index small. Since we can only save page-sized chunks of space, the key-space really needs to be used in a clustered and/or very sparse fashion. Just one used key per page of index is enough to allocate the whole table in physical memory, which we don't want. So if your table has four-byte entries, say, uniform key usage should be a lot less than one per thousand possible keys---but clusters of many are okay, so long as they're thousands apart.

    [/devel] permanent link contact

    Tue, 22 Mar 2011

    How much memory could an mmap() map...

    ...if an mmap() could map memory? I've had to ask this question of Linux on 64-bit x86 platforms recently.

    For reasons I will only hint at, I want to allocate a huge region of virtual address space for a structure a bit like a linear page table (called a “virtualized page table” on Wikipedia). We rely on a particular behaviour of Linux's mmap(): that mapping some memory isn't the same as committing to the availability of any underlying physical memory. Passing the MAP_NORESERVE flag means that memory will only be allocated when written to, hence allowing us to create a sparse table easily.

    I decided my table should have one word per 4KB of memory. For a 32-bit machine, which has 4-byte words and 220 such (aligned) regions, this means I need 4MB of virtual address space for my table (i.e. about a thousandth of the VAS). If we ask mmap() for such a region, it will clearly oblige us. On a 64-bit machine, which has 8-byte words and 252 such regions, I need 255 bytes of virtual address space for my table---32 petabytes, or about eight billion times as much as in the 32-bit case, but again, only a small fraction of the total address space (in this case a 512th, because words are twice as big).

    Here's a quick program you can run to test whether you can do an mmap() of a given size.

    #include <stdio.h>
    #include <errno.h>
    #include <string.h>
    #include <stdlib.h>
    #include <sys/mman.h>
    #include <assert.h>
    
    int main(int argc, char **argv)
    {
        assert(argc > 1);
        size_t mapping_size = strtol(argv[1], NULL, 0);
        assert(errno != ERANGE);
        assert(mapping_size > 0);
        assert(sizeof(size_t) == 8);
            
        void *ret = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, 
            MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);
    
        if (ret == MAP_FAILED)
        {
            fprintf(stderr, "error: %s\n", strerror(errno));
            return 1;
        }
        else 
        {
            fprintf(stderr, "success!\n");
            return 0;
        }
    }
    

    And here's a shell script to drive it with powers of two until it fails.

    #!/bin/bash
    
    for exp in `seq 10 50`; do
        ./test $(( 2 ** $exp )) || break;
        echo "Succeeded for 2 ^ $exp"
    done
    

    I'd be curious to know whether anyone on an x86-64 Linux machine maxes out anywhere different than 246 bytes. The kernel source will have the answer, but I can't be bothered wading through it right now. Interestingly, turning off the overcommit check (i.e. writing "1" to /proc/sys/vm/overcommit_memory) doesn't increase the limit for me.

    By the way, I'm using strtol because atol seemed to be narrowing the result to 32 bits even though a long is 64 bits. Instead of 231 I got -231, which unsurprisingly made mmap() fail. This seems like a bug, but probably isn't for some reason (possibly including a stupid mistake by me).

    As you might have guessed, I'm using this huge region of memory as a big flat structure to record metadata about memory. The main trick of a linear page table is that we can use virtual memory to encode large sparse arrays, without allocating memory for page-sized regions of the table that are empty. This generalises to sparse tables other than page tables. The one I'm building is for tracking allocated heap regions.

    [Update, 2011-3-23: thanks to Malcolm Scott who pointed out that my problem might be more tractable, because current x86-64 processors only implement a 48-bit address space. This also means that the 46-bit limit makes more sense---my mmap() successfully allocated a quarter of the usable virtual address space! Now I'm wondering: are those 48 bits something I can rely on for the nominal x86-64 architecture, or will running the same binaries on future hardware silently issue addresses from the larger 64-bit space? For now it doesn't really matter, but send answers if you have them (on a postcard, if you like) please.]

    [/devel] permanent link contact

    Fri, 19 Feb 2010

    C++ Gotme number 4: constant surprise

    When I was less experienced with C++, I would avoid creating any sort of constness in my code because it invariably led to headaches. When I'd tried, all too often I'd found myself trawling through the source code either adding or removing constness all over the place in order to make things compile. It's certainly true that constness has to be done right or not at all. Like a lot of C++, it's not resilient either to mistakes or to change.

    These days I usually make a reasonable job. I've found the best way to understand constness is as follows. Every object in a C++ program exposes exactly two variants of its interface: the “full” one and the const one. The actual boundary between these, at least in the case of user-defined types, is somewhat flexible (as witnessed by mutable, and by your forgetfulness to add const to method definitions!).

    Grokking the difference between a const pointer and pointer-to-const is of course a rite of passage in C and C++ programming. A more obscure subtlety about const in C++ is that const references let you pass around temporaries' lvalues, but you can't do this in a non-const fashion. It's not clear why not, except that it's something you'd rarely want to do ---unless, of course, you had omitted to add the const annotation in the signature of whatever function you wanted to pass the lvalue to. That's one reason why getting const right is something you can't really opt out of.

    Today I was wrestling with an unfortunate side of const---untraceable compilation errors. I'm using the boost graph library so I can run graph algorithms over my DWARF data structure. Since all my “nodes” live in a std::map, I was specialising the graph_traits giving the map's value_type as the node type. Here's what the compiler said.

    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../incl
    ude/c++/4.4.3/bits/stl_pair.h: In member function ?std::pair<const long long uns
    igned int, boost::shared_ptr<dwarf::encap::die> >& std::pair<const long long uns
    igned int, boost::shared_ptr<dwarf::encap::die> >::operator=(const std::pair<con
    st long long unsigned int, boost::shared_ptr<dwarf::encap::die> >&)?:
    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../incl
    ude/c++/4.4.3/bits/stl_pair.h:68:   instantiated from ?boost::concepts::VertexLi
    stGraph<G>::~VertexListGraph() [with G = dwarf::encap::dieset]?
    /home/srk31/opt/include/boost/graph/graph_concepts.hpp:166:   instantiated from 
    ?static void boost::concept::requirement<Model>::failed() [with Model = boost::c
    oncepts::VertexListGraphConcept<dwarf::encap::dieset>]?
    /home/srk31/opt/include/boost/concept_check.hpp:43:   instantiated from ?void bo
    ost::function_requires(Model*) [with Model = boost::concepts::VertexListGraphCon
    cept<dwarf::encap::dieset>]?
    test-6.cpp:8:   instantiated from here
    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../incl
    ude/c++/4.4.3/bits/stl_pair.h:68: error: non-static const member ?const long lon
    g unsigned int std::pair<const long long unsigned int, boost::shared_ptr<dwarf::
    encap::die> >::first?, can't use default assignment operator
    In file included from test-6.cpp:1:
    /home/srk31/opt/include/boost/graph/graph_concepts.hpp: In destructor ?boost::co
    ncepts::VertexListGraph<G>::~VertexListGraph() [with G = dwarf::encap::dieset]?:
    /home/srk31/opt/include/boost/graph/graph_concepts.hpp:166:   instantiated from 
    ?static void boost::concept::requirement<Model>::failed() [with Model = boost::c
    oncepts::VertexListGraphConcept<dwarf::encap::dieset>]?
    /home/srk31/opt/include/boost/concept_check.hpp:43:   instantiated from ?void bo
    ost::function_requires(Model*) [with Model = boost::concepts::VertexListGraphCon
    cept<dwarf::encap::dieset>]?
    test-6.cpp:8:   instantiated from here
    /home/srk31/opt/include/boost/graph/graph_concepts.hpp:188: note: synthesized me
    thod ?std::pair<const long long unsigned int, boost::shared_ptr<dwarf::encap::di
    e> >& std::pair<const long long unsigned int, boost::shared_ptr<dwarf::encap::di
    e> >::operator=(const std::pair<const long long unsigned int, boost::shared_ptr<
    dwarf::encap::die> >&)? first required here 
    make: *** [test-6.o] Error 1
    

    The bottom error refers to line 188 of graph_concepts.hpp, which is doing an innocuous assignment from a vertex iterator v = *p.first;. The variable v is an instance of my map entry type. Somehow, the compiler can't synthesise a copy constructor for the pointed-to vertex. This was incredibly difficult to track down because it wasn't due to any use of const in my code directly. I scoured my code for suspicious const, but to no avail. What had escaped me is that the value_type in a std::map is as follows (from Stroustrup C++PL section 17.4.1.1).

    typedef pair<const Key, T> value_type;
    

    In other words, maps are designed so that you can't update the key of an entry once they're created. This is very sane, because to do so might violate some invariant of the containing structure (imagining the popular red-black tree implementation). I hadn't noticed this limitation before because although it's common to initialise new pair objects, it's rare to update an existing one. Even though I had purposely avoided making the whole pair const in my code, there was already a const lurking in the header.

    That the compiler couldn't give me a better error message (i.e. one actually pointing to the code at fault) is certainly disappointing, and will be fixed in a better compiler. Whether all this fiddling constness is a weakness with the C++ language I'm not sure. It's certainly self-consistent and well-motivated. Other languages might not have me worrying about const, but they also might not catch certain errors. Is C++'s trade-off good value? It's also not clear whether a better definition of map might not have lifted the const into individual references to the value_type... or not. I'd like to be able to rail and propose some better language design, but part of me suspects that sometimes in programming, the devil really is inescapably in the details.

    [/devel] permanent link contact

    Mon, 01 Feb 2010

    Smart pointers: smart pointers (C++ Gotme number 3)

    I've been writing some C++ code using boost's smart pointers recently. Naturally, I tripped myself up. Memory corruption appeared, and I noticed that my shared objects were the problem. They were getting destructed even though I had a shared_ptr pointing to them the whole time. Here's how it happened.

    boost::shared_ptr<Die_encap_subprogram>
    create_subprogram(Die_encap_base& parent, boost::optional<const std::string&> name)
    {
        return boost::shared_ptr<Die_encap_subprogram>(new Die_encap_subprogram(parent, name));
    }
    

    The return statement is evaluated as follows.

    So far, so good. Here's the calling code.

    	std::cerr << "dies size before: " << ds().size();
    	abstract::factory::get_factory(get_spec()).create_subprogram(
    		(**all_cus.compile_units_begin()),
    		std::string(symname));
    	std::cerr << "dies size after: " << ds().size();
    

    Although my factory method “helpfully” returns a shared pointer, we discard the return value of the factory method. So, for the object to stay alive, we are relying on there being a copy of the boost::shared_ptr<Die_encap_base> somewhere. The code I've shown doesn't take a copy, so where could one come from? Well, one is created inside the object constructor, where it registers itself with a containing data structure. Here's the code which creates that second shared pointer.

    Die_encap_base(Dwarf_Half tag,
            Die_encap_base& parent, 
            boost::optional<const std::string&> name)
         :	die(parent.m_ds, parent.m_offset, tag, parent.m_ds.next_free_offset(), 0, attribute_map(), die_off_list()) // FIXME
        {
            m_ds.insert(std::make_pair(m_offset, boost::shared_ptr<dwarf::encap::die>(this)));
            parent.m_children.push_back(m_offset);
            m_attrs.insert(std::make_pair(DW_AT_name, dwarf::encap::attribute_value(
                std::string(*name))));
    	}
    

    If this second shared pointer is going to be sufficient to keep the object alive, we require some “action at a distance”: the first shared pointer must somehow become aware that we have just created another shared pointer to its object. How might this work? The shared_ptr implementation would have to keep its reference counts in a shared table, accessible to all shared pointers, so that they can look up the relevant refcount by from the pointer. The table must be shared between all shared pointer instances, not just those of a given type parameter, to allow upcasting and downcasting---notice how the target type is different in this snippet than in the earlier one, because we're in the base class. For this reason, our hypothetical table implementation would have to catch pointers-to-subobjects, probably by storing address ranges not just object start addresses. Even this is tricky if the shared pointer was created from an imprecisely-typed pointer (i.e. one whose static type is a supertype) because we might be oblivious to some larger enclosing allocation (i.e. the superobject actually allocated). This is doable, albeit not portably, given boost::shared_ptr's caveat that it only applies to pointers that were returned by new---so we could get the full address range using the heap metadata.

    Instead of all this table business, boost::shared_ptr just uses a shared count created when constructing a smart pointer from a non-smart pointer. Each pointer carries the address of a separate object holding the shared refcount. When you copy the shared pointer, it increments the count and propagates the shared count's address into the new copy of the pointer. Naturally, since it relies on continuity of propagation through shared pointers, it doesn't work when that chain is broken---you end up with multiple counts for the same object. My code suffers exactly that problem: it creates a completely separate shared pointer to what is “coincidentally” the same object referenced by some shared_ptr (i.e. communicated one or more intervening non-shared pointers, in my case the this pointer in the constructor call).

    The fix is simple enough: have the factory return a regular pointer or reference to the constructed object. Of course, what if some client code innocently wants to use shared_ptr on that object? That's not necessarily a problem, but it won't magically extend the life of objects that would otherwise be reclaimed according to the object lifetime policies which my own code's use of shared_ptr creates. This ability to create, out of innocuous-looking code, multiple conflicting policies about when an object should reclaimed, is a limitation of the smart pointer approach, and is stronger than simply “shared_ptr won't collect cycles”.

    Slogan-wise, this is perhaps an argument in favour of “smart objects, not smart pointers”. True per-object reference counts can be implemented in at least two ways: either a Python-like in-band solution, where each object has a refcount built-in, or else the unconventional out-of-band table solution I outlined above. The former isn't an option for C++, because we can't retroactively add a count field to an object, and don't want all instances of all classes to pay the overhead. However, the latter solution is quite appealing to me. Keeping object descriptions out-of-band is a powerful technique for retaining binary compatibility while adding run-time facilities, and is the basis of my proposed implementation of Python, which I might just outline in a future post.

    [/devel] permanent link contact

    Wed, 14 Oct 2009

    C++ Gotme number 2: templates, typename and specialization syntax

    Here's some innocuous-looking template definitions in C++.

    // specialize this template!
    template <unsigned spec_id> class Extended_By 
    {
    public:
        typedef struct {} spec;
    };
            
    // specialization for Dwarf 3
    template <> class Extended_By<0U>
    {
    public:
        typedef empty_def spec;
    };
    
    
    template <unsigned spec_id> 
    class table_def : public Extended_By<spec_id>::spec
    {
    public:
        typedef Extended_By<spec_id>::spec Extended;
    // ...
    };
    

    I am implementing a delegation hierarchy of tables: lookup methods on the tables delegate to their base implementation if the lookup fails. The hierarchy is rooted at a special-cased “empty” definition (not shown). Each non-root table has a similar implementation, but must be a separate class (as it may want to override certain member functions), hence my use of templates. To define a new table, we specialize the latter template with a new integer argument denoting the concrete table being defined. (In case you're wondering, tables are singletons---but they get pointed-to at runtime, hence virtual dispatch.) The Extended_By template is simply a compile-time mapping from integers (which denote tables) to the type of the underlying table (i.e. the table which that table delegates to, if its lookup fails).

    Unfortunately, the code above doesn't compile.

    g++ -Wall -O0 -g3 -I/home/srk31/opt/include -I/usr/include/python2.5 -I/usr/incl
    ude -I../../../libsrk31c++ -c -o "spec.o" "spec.cpp"
    In file included from spec.cpp:8:
    spec.hpp:242: error: type `dwarf::spec::Extended_By<spec_id>' is not derived fro
    m type `dwarf::spec::table_def<spec_id>'
    

    The error message makes no sense at all. In fact, the problem is that the compiler can't tell that Extended_By<spec_id> is a type. If you insert the typename keyword into the latter template definition, as follows....

    template <unsigned spec_id> 
    class table_def : public Extended_By<spec_id>::spec
    {
    public:
        typedef typename Extended_By<spec_id>::spec Extended;
    // ...
    };
    

    ...then it all magically starts working. I must confess I'm not entirely sure why---surely the usual name lookup on Extended_By would work just fine?

    On a similar theme, I was specializing some predicates defined by the table template, using code as follows.

    template<unsigned spec_id> 
    bool table_def<0U>::attr_describes_location(int attr) const
    {
    	return this->Extended::attr_describes_location(attr)
    		|| attr == DW_AT_location
    		|| attr == DW_AT_data_member_location
    		|| attr == DW_AT_vtable_elem_location
    		|| attr == DW_AT_string_length
    		|| attr == DW_AT_use_location
    		|| attr == DW_AT_return_addr;
    }
    

    This also fails to compile, with a very confusing message.

    g++ -Wall -O0 -g3 -I/home/srk31/opt/include -I/usr/include/python2.5 -I/usr
    /include -I../../../libsrk31c++ -c -o "spec.o" "spec.cpp"
    spec.cpp:272: error: prototype for `bool dwarf::spec::table_def<0u>::attr_d
    escribes_location(int) const' does not match any in class `dwarf::spec::tab
    le_def<0u>'
    spec.hpp:278: error: candidate is: bool dwarf::spec::table_def::at
    tr_describes_location(int) const [with unsigned int spec_id = 0u]
    make: *** [spec.o] Error 1
    
    The solution is to eliminate the template parameter.

    template<> 
    bool table_def<0U>::attr_describes_location(int attr) const
    {
    	return this->Extended::attr_describes_location(attr)
    		|| attr == DW_AT_location
    		|| attr == DW_AT_data_member_location
    		|| attr == DW_AT_vtable_elem_location
    		|| attr == DW_AT_string_length
    		|| attr == DW_AT_use_location
    		|| attr == DW_AT_return_addr;
    }
    

    It's not really clear why the latter is a valid syntax for specialization while the former isn't. However, a useful rule of thumb would seem to be that you should only list template arguments up-front (after “template”) if your definition depends on them. It doesn't matter if the originating class template uses more arguments. By contrast, the argument list attached to the classname is what nails down the particular template instance you're specializing for---this argument list should correspond directly to that of the template class definition.

    [/devel] permanent link contact

    Tue, 29 Sep 2009

    Shell Gotme number 1 -- the Heisenbergian process trees

    On my Lab machine, the X login system doesn't clean up stray child processes that your X session may have left behind. (I've a feeling the Debian xinit scripts do this, but I'm not sure.) This was bothering me, because I start some background jobs in my .xsession which I want to die naturally when the session ends. So I put the following towards the end of my .xsession.

    echo -n "End of .xsession: detected living child pids: " 1>&2
    ps --no-header f -o pid --ppid $$ | \
    	while read pid; do kill ; done 2>/dev/null

    Unfortunately, I found that those pesky children were still alive. (Can you tell what's wrong? Yes, that's right.) Both the ps process and the while subshell are among the children which are being killed. So one way or another, the pipeline gets broken before the loop has managed to kill the children I actually wanted to kill. A version which doesn't suffer this problem is as follows.

    child_pids=$( ps --no-header f -o pid --ppid $$ )
    for pid in ; do kill  2>/dev/null; done

    It's another instance of the familiar Heisenbergian nature of inspecting process trees from the shell: doing many basic things from the shell entails creating processes, so inspecting the shell's own process tree is likely to yield perturbed results. Usually it's just an unwanted entry in ps (as with the old ps | grep foo gotcha) but the above shows it sometimes causes subtler problems.

    [/devel] permanent link contact

    Standalone ksmserver, part 2: getting it right

    I blogged previously about my attempts to use KDE's session manager in a standalone fashion (with fvwm, in my case). What I presented there isn't entirely satisfactory, mainly because not all window sizes and positions get restored when restoring the session. I've finally worked out why. Unfortunately, this has also revealed that some content in the previous post is wrong. Allow me to correct myself here.

    So, all you need in your fvwm config is something like the following.

    AddToFunc SessionInitFunction
     + I Exec dcop klauncher klauncher autoStart 3; \
    dcop ksmserver ksmserver autoStart0Done; \
    dcop ksmserver ksmserver kcmPhase1Done; \
    dcop ksmserver ksmserver autoStart1Done; \
    dcop ksmserver ksmserver kcmPhase2Done; \
    sleep 4; dcop ksmserver ksmserver autoStart2Done
    
    # assuming you quit using a Quit-Verify menu...
    AddToMenu Quit-Verify   "Quit"  SaveQuitSession
    

    [/devel] permanent link contact

    Fri, 18 Sep 2009

    Python, threading and the madness of language implementors

    I just stumbled across this video of a talk by David Beazley about the implementation of Python and, in particular, threading. In a nutshell, threading in Python is implemented as a complete minimal-effort hack on top of an interpreter that is essentially single-threaded by design. Static state is again the culprit---there's a big lock, called the Global Interpreter Lock, much like Linux used to have the Big Kernel Lock. But, rather than just protecting some carefully-selected critical regions, it protects all Python computation!

    So, alarmingly, the performance of Python threading is absolutely disastrous. It's ironic that Google are heavy users of Python, given that they work with some of the largest datasets on the planet and generally have to know a thing or two about optimisation and concurrency. Of course they don't use Python for their core systems, and are sponsoring an improvement effort called Unladen Swallow.

    I have some research ideas that predicated heavily on the position that language implementors often don't do a great job, particularly in the case of dynamic languages. So if we have to rewrite a bunch of dynamic language implementations, that's not really a terrible thing. I'm glad to have yet more evidence of this.

    [/devel] permanent link contact

    Tue, 15 Sep 2009

    clone() and the unusual process dynamics

    I had a weird problem with my X login scripts recently on my Lab machine---I noticed that for X sessions, the LD_LIBRARY_PATH environment variable wasn't being set, even though every other variable set from my ˜/.profile was still appearing correctly. After I bit of digging, I discovered why, but that only opened up an extra can of mystery.

    The basic problem was that ssh-agent is a setuid program. so at the start of loading, the Linux loader removes all dynamic linker options (including LD_PRELOAD and LD_LIBRARY_PATH) from the environment to avoid running user code with elevated privileges. (It would make more sense just to ignore them, but still to propagate them to children... but anyway.) I was still a bit puzzled though, because I wasn't knowingly running my session as a child of ssh-agent---my login scripts do start one up if it's not already running, but it's supposed to run as a sibling of my main X session script, rather than using the ssh-agent <command> mechanism to start a child session. And ps agreed with me---my session wasn't descended from an ssh-agent process... but I did have two running, where I thought my scripts went to pains to ensure there was only one.

      PID TTY      STAT   TIME COMMAND
    19233 ?        Ss     0:00 /bin/bash -l /home/srk31/.xsession
    19573 ?        S      0:00  \_ ssh-agent
    19589 ?        S      0:00  \_ fvwm
    19593 ?        Ss     0:00      \_ <... more X clients>
    19433 ?        Ssl    0:00 /bin/dbus-daemon --fork --print-pid 4 --print-address
    19432 ?        S      0:00 /usr/bin/dbus-launch --exit-with-session /home/srk31/

    The explanations for this were somewhat convoluted. Firstly, the Fedora xinit scripts (FC7) do this (in /etc/X11/xinit/xinitrc-common, sourced from /etc/X11/xinit/Xsession):

    # Prefix launch of session with ssh-agent if available and not already running.
    SSH_AGENT=
    if [ -x /usr/bin/ssh-agent -a -z "" ]; then
        if [ "x" != "x" ]; then
            SSH_AGENT="/usr/bin/ssh-agent /bin/env TMPDIR="
        else
            SSH_AGENT="/usr/bin/ssh-agent"
      fi
    fi

    ...and later (in /etc/X11/xinit/Xsession)...

    # otherwise, take default action
    if [ -x "/.xsession" ]; then
        exec -l  -c "  /.xsession"
    elif [ -x "/.Xclients" ]; then
        exec -l  -c "  /.Xclients"
    elif [ -x /etc/X11/xinit/Xclients ]; then
        exec -l  -c "  /etc/X11/xinit/Xclients"
    else
        # should never get here; failsafe fallback
        exec -l  -c "xsm"
    fi

    In other words, they test whether an ssh-agent is running, and arrange to start one if not. But in between testing and starting one, they run a shell---which naturally starts my login scripts. These check for ssh-agent themselves and, finding none, start one. Then later, the Fedora scripts start another one. It's a classic “unrepeatable read” race condition, but without any concurrency---just interleaving of foreign code (my login scripts).

    Next, why wasn't my session showing up as a child of one of the ssh-agent processes? ps's output was doubly confusing because the top of my process tree was a bash -l .xsession process, when that's the last to be launched by the sequence initiated in the Fedora scripts! Well, strace revealed that my processes were using the clone() system call to spawn new processes (which is subtly different from fork(), in that it allows shared address spaces and hence multithreaded programming). As we know, when a process clones itself in order to start a new process, one of the resulting pair replaces itself with the new process image, while the other continues on its merry way. In the case of both ssh-agent and dbus-launch, the parent process was the one which replaced itself, leaving the child to continue the work of SSH agentery or DBUS launchery or whatever. This is really confusing because it contradicts the usual expectations about causal ordering from parent to child processes---but it's perfectly allowed, and has always been possible in Unix.

    What was the fix? Sadly, there isn't a good one---I don't have the permission to edit the Fedora scripts on my Lab machine, and there's no configurable flexibility for disabling the ssh-agent launching or fixing the racey logic. So I added a hack to my .xsession shell script which detects the case where SSH_AGENT_PID is already set to a child of the shell process (since the ssh-agent my scripts created is a sibling) and if so, kills that process and re-sets SSH_AGENT_PID to the one I created earlier (which, handily, I keep stored in ${HOME}/.ssh/agent-$(uname -n)). As usual, completely horrible.

    [/devel] permanent link contact

    Sat, 22 Aug 2009

    C++ Gotme number 1: operator visibility

    I'm doing some C++ coding at the moment. It's a language well-known for “gotchas”, so although it'd be presumptuous for me to assume they'd get you as well, dear reader, I'm going to document some of the things that've caught me out.

    namespace dwarf {
    	namespace encap {
    		typedef Dwarf_Loc expr_instr;
    		bool operator==(const expr_instr& e1, const expr_instr& e2);
    		bool operator!=(const expr_instr& e1, const expr_instr& e2);
    		typedef struct expr
    		{
    			Dwarf_Half hipc;
    			Dwarf_Half lopc;
    			std::vector m_expr;
    			expr(const Dwarf_Locdesc& desc) : hipc(desc.ld_hipc), lopc(desc.ld_lopc),
    				m_expr(desc.ld_s, desc.ld_s + desc.ld_cents) {}
    			bool operator==(const expr& e) const 
    			{ 
    				//expr_instr e1; expr_instr e2; // test
    				return hipc == e.hipc &&
    					lopc == e.lopc &&
    					//e1 == e2; // test
    					m_expr == e.m_expr; // error!
    			}
    			bool operator!=(const expr& e) const { return !(*this == e); }
    		} loc_expr;
    	}
    }
    

    Here we have some code from my C++ binding of libdwarf. I'm trying to use std::vector's builtin == operator to define equality on my struct expr, which is essentially a wrapper for vectors of Dwarf_Loc objects, where Dwarf_Loc is a structure defined by libdwarf. Here's what g++ makes of the above code:

    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.3.3/../../../../incl
    ude/c++/4.3.3/bits/stl_algobase.h: In static member function ?static bool std::_
    _equal<_BoolType>::equal(_II1, _II1, _II2) [with _II1 = const Dwarf_Loc*, _II2 =
     const Dwarf_Loc*, bool _BoolType = false]?:
    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.3.3/../../../../incl
    ude/c++/4.3.3/bits/stl_algobase.h:824:   instantiated from ?bool std::__equal_au
    x(_II1, _II1, _II2) [with _II1 = const Dwarf_Loc*, _II2 = const Dwarf_Loc*]?
    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.3.3/../../../../incl
    ude/c++/4.3.3/bits/stl_algobase.h:956:   instantiated from ?bool std::equal(_II1
    , _II1, _II2) [with _II1 = __gnu_cxx::__normal_iterator > >, _II2 = __gnu_cxx::__normal_itera
    tor > >]?
    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.3.3/../../../../incl
    ude/c++/4.3.3/bits/stl_vector.h:1111:   instantiated from ?bool std::operator==(
    const std::vector<_Tp, _Alloc>&, const std::vector<_Tp, _Alloc>&) [with _Tp = Dw
    arf_Loc, _Alloc = std::allocator]?
    dwarfpp.hpp:341:   instantiated from here
    /local/scratch/srk31/opt/bin/../lib/gcc/i686-pc-linux-gnu/4.3.3/../../../../incl
    ude/c++/4.3.3/bits/stl_algobase.h:795: error: no match for ?operator==? in ?* __
    first1 == * __first2?
    make: *** [dwarfpp.o] Error 1
    

    Clearly, my definition for operator== isn't being found. But if I uncomment the blamed line and replace it with some dummy code (commented above) which invokes the operator for two dummy objects, rather than vectors, it works fine. Why can I compare two Dwarf_Locs but not two vectors thereof, when vector defines a perfectly good operator== that should just invoke mine?

    The answer is that vector's operator== can't see my operator== definition, because of namespaces. According to Stroustrup (C++PL, third edition, section 11.2.4):

    Consider a binary operator @. If x is of type X and y is of type Y, x@y is resolved like this:

    So the code in std::vector clearly can't see my definitions in dwarf::encap::. However, the reason that this isn't such a common problem is that I'm defining operators on a type, namely Dwarf_Loc, that was defined not by me but in the pre-existing C library that I'm wrapping. I lazily dumped all of libdwarf's definitions into the global namespace, so the quick fix is to define my operator in the global namespace too.

    namespace dwarf {
    	namespace encap { typedef Dwarf_Loc expr_instr; }
    }
    bool operator==(const dwarf::encap::expr_instr& e1, const dwarf::encap::expr_instr& e2);
    bool operator!=(const dwarf::encap::expr_instr& e1, const dwarf::encap::expr_instr& e2);
    namespace dwarf {
    	namespace encap {
    		//typedef Dwarf_Loc expr_instr;
    		typedef struct expr
    		{
    			Dwarf_Half hipc;
    			Dwarf_Half lopc;
    			std::vector m_expr;
    			expr(const Dwarf_Locdesc& desc) : hipc(desc.ld_hipc), lopc(desc.ld_lopc),
    				m_expr(desc.ld_s, desc.ld_s + desc.ld_cents) {}
    			bool operator==(const expr& e) const 
    			{ 
    				//expr_instr e1; expr_instr e2;
    				return hipc == e.hipc &&
    					lopc == e.lopc &&
    					//e1 == e2;
    					m_expr == e.m_expr; // okay
    			}
    			bool operator!=(const expr& e) const { return !(*this == e); }
    		} loc_expr;
    	}
    }
    

    Note that if I'd done the right thing and wrapped libdwarf's definitions into some non-global namespace, say dwarf::lib, I'd still need to do something similar, because my operator definition won't be found if I put it in a different namespace like dwarf::encap, even though that's the namespace containing the code which actually needs the operator to be defined.

    Well, it certainly got me... now, on with coding.

    [/devel] permanent link contact

    Mon, 17 Aug 2009

    Ruby fails

    I'm sure Ruby is a nice language, but I get slightly annoyed by two things whenever I try to install a Ruby program. One is that it has its own build system, as a replacement for Make et al---there are things called Rakefiles in the source distributions. The other is that it also has its own configuration and deployment system, based on the setup.rb script. These things annoy me because they're yet more tools that I need to learn, and for no good reason. Python is guilty of the same sins too. A new language shouldn't entail a whole new build system. (I won't go into why, but hopefully you don't need me to. I would also complain that there are too many languages, but I'll save that too.)

    What really annoys me about Ruby is that setup.rb is broken, because it doesn't deal with prefixes properly. If I do ruby setup.rb config --prefix=${HOME}/opt, it still tries to install most of the program under /usr. So I tried giving the --prefix option to ruby setup.rb install too, but that doesn't do the right thing either. Instead it creates me a ${HOME}/opt/usr hierarchy and puts the stuff it was putting in /usr there.

    I might as well come clean and admit that the only Ruby program I routinely install is iplayer-dl. Anyway, my next attempt: configure everything using paths relative to ./, then supply the overall prefix at install time. That doesn't work either---setup.rb interprets the ./ passed with --prefix, so you get install destinations relative to your configure directory. But only for files affected by --prefix, which isn't all of them.

    Next attempt: configure everything relative to the root directory, then supply the prefix at install time. This does work, but you wouldn't guess from the output of ruby setup.rb install.

    $ rubyver=$( ruby --version | sed 's/ruby \([0-9]*\.[0-9]*\)\..*/\1/' )  # horrible
    $ rubyarch=$( uname -i )-$( uname -s | tr '[:upper:]' '[:lower:]' )           # horrible
    $ ruby ./setup.rb config --prefix='/' --sysconfdir=/etc \
    --libruby=/lib/ruby --librubyver=/lib/ruby/ --librubyverarch=/lib/ruby// \
    --siteruby=/lib/ruby/site_ruby --siterubyver=/lib/ruby/site_ruby/ --siterubyverarch=/lib/ruby/site_ruby//
    $ ruby $ ruby ./setup.rb install --prefix=${HOME}/opt
    rm -f InstalledFiles
    ---> bin
    mkdir -p /home/srk31/opt//bin
    install iplayer-dl //bin//iplayer-dl
    install iplayer-dl-gui //bin//iplayer-dl-gui
    <--- bin
    ---> lib
    mkdir -p /home/srk31/opt/lib/ruby/site_ruby/1.8
    install iplayer.rb /lib/ruby/site_ruby/1.8/
    ---> lib/iplayer
    mkdir -p /home/srk31/opt/lib/ruby/site_ruby/1.8/iplayer
    install downloader.rb /lib/ruby/site_ruby/1.8/iplayer
    install subtitles.rb /lib/ruby/site_ruby/1.8/iplayer
    install metadata.rb /lib/ruby/site_ruby/1.8/iplayer
    install preferences.rb /lib/ruby/site_ruby/1.8/iplayer
    install browser.rb /lib/ruby/site_ruby/1.8/iplayer
    install version.rb /lib/ruby/site_ruby/1.8/iplayer
    install errors.rb /lib/ruby/site_ruby/1.8/iplayer
    ---> lib/iplayer/gui
    mkdir -p /home/srk31/opt/lib/ruby/site_ruby/1.8/iplayer/gui
    install main_frame.rb /lib/ruby/site_ruby/1.8/iplayer/gui
    install app.rb /lib/ruby/site_ruby/1.8/iplayer/gui
    <--- lib/iplayer/gui
    <--- lib/iplayer
    <--- lib
    
    So, handily it's told me that it installed a bunch of stuff in /lib/ruby, which is exactly what I didn't want it to do. But, since I ran it without elevated permissions, I know that it can't have succeeded in doing that---yet there were suspiciously no error messages. Lo! Despite what it printed out, it has actually put the lib files in ${HOME}/opt/lib/ruby just as I wanted. Now, why was that so difficult?

    To make matters worse, you of course have to set your Ruby path to get the deployed program to work, and that is also horrifying:

    $ export RUBYLIB=${HOME}/opt/lib/ruby:${HOME}/opt/lib/ruby/site_ruby:${HOME}/opt/lib/ruby/site_ruby/1.8:
    
    ---disgusting, especially embedding the 1.8 version number in the path which will be seen (and interpreted) by any version of Ruby at all. It's following the pattern established Python, of course, and since Python has some reasonably sane people behind it I'm tempted to suspect that this ludicrous scheme has been selected for a reason---but even if this suspicion is correct, it'll have to be a very good one, and I somehow doubt that.

    [/devel] permanent link contact

    Fri, 13 Mar 2009

    ANTLR recipes

    I've been fiddling with ANTLR recently, trying to create a parser for my Cake language. Unfortunately I found it trickier than I was hoping, and this forced me to take a step back and create some simpler grammars so I could get the hang of how to realise some common grammatical features in ANTLR. Specifically, I looked at four features: recursive prefixing (i.e. a recursive unary prefix operator, so right-recursive), recursive suffixing (the same but suffixing, left-recursive), right-associative binary operators and left-associative binary operators.

    The appeal of ANTLR has been its ability to build ASTs in special-purpose syntax (agnostic towards the host language) rather than relying on semantic actions. Unfortunately it took me a while to get the hang of these tree construction features. The basics are here, but here's some helpful extra snippets and emphasis that took me a while to discover.

    I think that's all for now. I still have to get a good handle on operator precedence, which may or may not spawn another post.

    [/devel] permanent link contact

    Fri, 27 Feb 2009

    Standalone ksmserver, part 1

    I've been looking for an X11 session manager that actually works recently (since sadly xsm doesn't fit that bill) and was experimenting with KDE's session manager. It's peculiarly underdocumented but seemed like it should have all the functionality I needed, and should also be reasonably well-used and well-tested. So I was a bit disappointed when my basic set-up attempt at integrating it with fvwm appeared not to work. I was simply replacing the fvwm command in my .xsession with the following:

    ksmserver -r -w fvwm

    which should launch fvwm rather than kwin. Then in my fvwm configuration, to end the session properly when I quit the window manager, I tried the following.

    AddToFunc SessionInitFunction
     + I Exec echo hello from SessionInitFunction
    
    AddToFunc SessionExitFunction # kill ksmserver when we exit
     + I Exec dcop ksmserver ksmserver logout 0 2 2  
    
    ## from http://andrejserafim.wordpress.com/2008/05/16/kde-shutdown-logout-restart/
    #First parameter: confirm
    #Obey the user?s confirmation setting: -1
    #Don?t confirm, shutdown without asking: 0
    #Always confirm, ask even if the user turned it off: 1
    #Second parameter: type
    #Select previous action or the default if it?s the first time: -1
    #Only log out: 0
    #Log out and reboot the machine: 1
    #Log out and halt the machine: 2
    #Third parameter: mode
    

    (Thanks to Andrej Kazakov for the summary of logout's invocation that I pasted above, in turn extracted from the KDE source code.)

    Of course, it didn't work, so I put my developer hat on and had a look. Attaching gdb revealed that the session manager was entering a (non-tight) endless loop when I requested the logout---setting a timer which fired after one second, disabled itself and then recursively re-started the process. The problem was that the session manager has an internal state machine, and in my case it was stuck in state AutoStart0, whereas it was expected to end up in Idle after a while---the only state from which it can initiate a shutdown.

    To get out of AutoStart0, and subsequent start-up states, you have to manually call a bunch of other DCOP methods with names like autoStart0Done, kcmPhase1Done and so on. This is among what startkde does for KDE users, after performing various KDE-specific configuration updates. (These updates are exactly what a non-KDE user doesn't want to happen, at least in my case---since one major reason I don't use KDE is that I like to stick with the simple X11-provided ways of controlling backgrounds, cursors, input devices and so on.) We can manually invoke the DCOP methods to signal that it's time for the next state.

    We successfully avoid most of the KDE interference in this way, because the relevant other processes (like kcminit) haven't been launched. However, we do get some, because the klauncher process does exist -- it's started when running any KDE app if not already running. In particular, ksmserver's signals to klauncher have the unwanted consequence of starting up a bunch of “autostart” applications, like the KDE desktop background and panel, that have shortcuts in ~/.kde/Autostart and /share/autostart/. To avoid this, we tell the launcher to bump our start-up “phase”, which is just an integer, to a high value---since autostart apps are marked with a “phase” attribute, and are only started when moving through that phase. ksmserver only covers as far as phase 2, so we can set the phase to 3 and they won't be started up. So, here's my final fvwm config for use with ksmserver.

    [Update, 2009-03-02: not so fast! What I had here earlier didn't quite work, because fvwm doesn't necessarily execute each part of a compound function in sequence. So instead, it needs to all be rolled into one command. What's more, I had to hack in an arbitrary sleep because, depending on how long it takes to start up all the saved clients, the kcmPhase2Done call may come too soon (state machine not yet progressed into FinishingStartup, I think) and be ignored (causing ksmserver to get stuck in the latter state). Now, what follows does seem to work.]

    AddToFunc SessionInitFunction
     + I Exec dcop klauncher klauncher autoStart 3; \
    dcop ksmserver ksmserver autoStart0Done; \
    dcop ksmserver ksmserver kcmPhase1Done; \
    dcop ksmserver ksmserver autoStart1Done; \
    dcop ksmserver ksmserver kcmPhase2Done; \
    sleep 4; dcop ksmserver ksmserver autoStart2Done
    
    # signal ksmserver when we exit
    AddToFunc SessionExitFunction 
     + I Exec dcop ksmserver ksmserver saveCurrentSessionAs "saved at previous logout"; \
     dcop ksmserver ksmserver logout 0 0 3 
    

    I still haven't actually made the session management functionality work, which I think is going to take some more commands along “save” and “restore” lines. [Update, 2009-03-02: turns out to need just the one “save” command at exit, as shown above -- restoring happens by default, using the magic session name shown.] My ultimate goal is the ability to start multiple ksmserver processes, so that I can independently save and restore separate groups of clients within a single X login session. Fingers crossed... if it's nontrivial I'll write a follow-up post. There's also the joy of KDE 4 to think about later, which has exchanged DCOP for D-BUS and may well alter the undocumented behaviour I'm relying on.

    The Important Science to take away from all this is I suppose that interface protocol specifications and timing specifications are useful! Not just for checking, which seems a long way off here, but just for understanding. It'd also be useful to have more convenient support for interposing on DCOP communication, by having client-specific message routing (or name-bindings) so that the “phase” hack could be handled a bit more elegantly by cutting off the connection to klauncher.

    [/devel] permanent link contact

    Thu, 19 Feb 2009

    Initialization order

    I just spent an age tracking down a bug in a very simple tool I'm writing using the GNU BFD. It turns out that when opening a file for reading with BFD, various API calls are not valid until bfd_check_format_matches has been called and has returned successfully. This doesn't appear to be documented in the current texinfo manual. In fact, even the need to call bfd_init is not very clearly documented---but at least that one is fairly predictable. Most libraries, particularly those which maintain a lot of state behind the scenes, have an init function. Additional initialization steps, like the one I ran afoul of, are not at all guessable and should really be clearly signposted in the documentation. It was only by careful examination of what objcopy does that I could track down the bug.

    It's well-known in the research world that interfaces come with protocol constraints, and it'd be nice to check client code against these just as we already do with function signatures. Initialization constraints are by far the most common example of protocol constraint. But rather than just checking satisfaction of these constraints, why should we even have to write the code at all? Why can't our toolchain insert initialization calls in the right places automatically? In fact at least one research linkage tool (Knit) can already do this.

    How far can we generalise this constraint-solving? Usually protocol specifications use a finite-state model of component state, which are clearly good for a wide range of protocols but not all (consider a stack). Of course, we want our checking/solving to be decidable, and more complex models quickly lose this property, although I'm not sure whether a pushdown automaton model would necessarily be undecidable. More practically, Knit essentially supports a single bit of state (initialized or not) for each component (or actually a tri-state, because it supports finalizers too). If we wanted to generalise this even to a general finite-state model, we'd get ambiguity: say we had to call function A one or more times before calling function B, how many calls would we insert? Maybe “the smallest string” would be a sensible policy, but that would be ambiguous in some cases, and it's not clear it's always a good choice anyway. In protocol adaptation, there is invariably some programmer-provided “specification” to answer these questions, usually by constraining the generated automaton.

    The BFD example is unusual in another way, in that it requires a successful call to the init function, not just any call. So our protocol checker/solver has to understand return values (and of course arguments) as well as function names. It's about time I re-examined the research literature on protocol adaptation and worked out how much of it has actually been turned into practical tools that address these issues. In due course I'm hoping to build a practical subset (or superset) of protocol adaptation functionality into Cake. One thing at a time though... first I must master that pesky BFD.

    [/devel] permanent link contact

    Fri, 23 Jan 2009

    Library path strangenesses

    It's about time I began sharing the hard-learnt arcane development knowledge I've managed to pick up. One of the most annoying “features” I've had to reverse-engineer is in the behaviour of Linux's dynamic linker, ld-linux. I have a lot of directories in my LD_LIBRARY_PATH environment variable.

    srk31@font:~/scratch/kde$ echo 
    /home/srk31/scratch/opt/lib:/home/srk31/scratch/kde/lib:/usr/lib/mysql:/usr/lib/opensync/:/usr/lib/o
    pensync/formats/:/usr/lib/opensync/plugins/:/usr/lib/qt-3.3/lib:/usr/lib/wine/:/home/srk31/opt/lib:/
    lib:/usr/lib 
    

    They all get searched as you'd expect. Each directory is expanded to include searches for various architecture-specific variant subdirectories like tls, sse2, nosegneg etc.

    srk31@font:~/scratch/kde$ LD_DEBUG=libs ldd bin/konsole 2>&1 | head
          5666:     find library=libtinfo.so.5 [0]; searching
          5666:      search path=/home/srk31/scratch/opt/lib/tls/i686/sse2/nosegneg:/home/srk31/scratch/
    opt/lib/tls/i686/sse2:/home/srk31/scratch/opt/lib/tls/i686/nosegneg:/home/srk31/scratch/opt/lib/tls/
    i686:/home/srk31/scratch/opt/lib/tls/sse2/nosegneg:/home/srk31/scratch/opt/lib/tls/sse2:/home/srk31/
    scratch/opt/lib/tls/nosegneg:/home/srk31/scratch/opt/lib/tls:/home/srk31/scratch/opt/lib/i686/sse2/n
    

    (snipped some -- view source for more)

    osegneg:/home/srk31/opt/lib/i686/sse2:/home/srk31/opt/lib/i686/nosegneg:/home/srk31/opt/lib/i686:/ho
    me/srk31/opt/lib/sse2/nosegneg:/home/srk31/opt/lib/sse2:/home/srk31/opt/lib/nosegneg:/home/srk31/opt
    /lib            (LD_LIBRARY_PATH)
          5666:       trying file=/home/srk31/scratch/opt/lib/tls/i686/sse2/nosegneg/libtinfo.so.5
          5666:       trying file=/home/srk31/scratch/opt/lib/tls/i686/sse2/libtinfo.so.5
          5666:       trying file=/home/srk31/scratch/opt/lib/tls/i686/nosegneg/libtinfo.so.5
    

    Notice that in my LD_LIBRARY_PATH, /usr/lib and lib are right at the end. Once I tried tweaking the ordering so that some paths came after these -- they were “backup” libraries I'd compiled myself in case the machine I was using didn't have them, but I wanted to pick up any local ones in preference. Unfortunately, this doesn't work. ld-linux will ignore all directories after the first directory it encounters that is part of the “system library path”.

    srk31@font:~/scratch/kde$ LD_LIBRARY_PATH=/usr/lib:${LD_LIBRARY_PATH} LD_DEBUG=libs ldd bin/konsole 2>&1 | head
          5687:     find library=libtinfo.so.5 [0]; searching
          5687:      search path=/usr/lib/tls/i686/sse2/nosegneg:/usr/lib/tls/i686/sse2:/usr/lib/tls/i68
    6/nosegneg:/usr/lib/tls/i686:/usr/lib/tls/sse2/nosegneg:/usr/lib/tls/sse2:/usr/lib/tls/nosegneg:/usr
    /lib/tls:/usr/lib/i686/sse2/nosegneg:/usr/lib/i686/sse2:/usr/lib/i686/nosegneg:/usr/lib/i686:/usr/li
    b/sse2/nosegneg:/usr/lib/sse2:/usr/lib/nosegneg:/usr/lib                (system search path)
          5687:       trying file=/usr/lib/tls/i686/sse2/nosegneg/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/i686/sse2/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/i686/nosegneg/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/i686/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/sse2/nosegneg/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/sse2/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/nosegneg/libtinfo.so.5
          5687:       trying file=/usr/lib/tls/libtinfo.so.5
    

    I have no idea why it does this, but would guess it's intended behaviour for some reason. It's annoying though.

    Oh, and another gotcha relating to LD_LIBRARY_PATH is that if you're using gdb, it seems you must set solib-search-path to match your LD_LIBRARY_PATH, because gdb seems to ignore the latter (again, probably for some reason or other). And if you ever use a frontend to gdb, like DDD, that probably has its own setting too. There is so much fun to be had with these things.

    [/devel] permanent link contact


    Powered by blosxom

    validate this page