Rambles around computer science

Diverting trains of thought, wasting precious time

Tue, 13 Dec 2011

Load addresses

For reasons I will only hint at, I want to predict the load address of a set of shared libraries, given an executable that links against them. Naturally, I have turned off address space layout randomization.

At first, I thought I could use ldd for this. It seems to work.

$ ldd /usr/local/src/git-1.7.5.4/git
        linux-vdso.so.1 =>  (0x00007ffff7fdd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7dc3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff7ba5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7806000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fde000)

But also, there is an environment variable called LD_TRACE_LOADED_OBJECTS that is supposed to have the same effect. As it happens, ldd is just a small wrapper script which sets this variable and invokes the dynamic linker, which on my system is /lib64/ld-linux-x86-64.so.2. Let's try doing this directly.

$ LD_TRACE_LOADED_OBJECTS=1 /usr/local/src/git-1.7.5.4/git
        linux-vdso.so.1 =>  (0x00007ffff7ffb000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7bc4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff79a7000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7608000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7ddc000)

That seems to work too. But wait! It's given us different load addresses than ldd did. Have I really turned off randomization? Well, yes. In fact, repeating either of these commands will reliably yield the output above, and they are reliably different from one another. What is going on?

Let's hack ldd so that it prints exactly what command it is going to execute.

$ ldd /usr/local/src/git-1.7.5.4/git
About to eval:  LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION= LD_VERBOSE= /lib64/ld-linux-x86-64.so.2 /usr/local/src/git-1.7.5.4/git
verify_out is 
        linux-vdso.so.1 =>  (0x00007ffff7fdd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7dc3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff7ba5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7806000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fde000)

So, it has set a bunch of other environment variables to empty strings. They look innocuous enough. But also, it is invoking the loader directly, whereas we were just letting execve call the loader for us. Can we reproduce the result of ldd by running the same command it does?

$ LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION= LD_VERBOSE= /lib64/ld-linux-x86-64.so.2 /usr/local/src/git-1.7.5.4/git
        linux-vdso.so.1 =>  (0x00007ffff7fdd000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ffff7dc3000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffff7ba5000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7806000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fde000)

Yes, we can. Now, the big question: which one is correct? Let's run our program under gdb and inspect the memory map.

$ gdb --args /usr/local/src/git-1.7.5.4/git
... snipped ...
(gdb) break main
Breakpoint 1 at 0x404bf0: file git.c, line 509.
(gdb) run
Starting program: /usr/local/src/git-1.7.5.4/git 
[Thread debugging using libthread_db enabled]

Breakpoint 1, main (argc=1, argv=0x7fffffffddd8) at git.c:509
509     {
(gdb) print getpid()
$1 = 27023
(gdb) shell cat /proc/27023/maps | grep 'lib.*\.so.*'
7ffff7608000-7ffff779d000 r-xp 00000000 08:07 356590                     /lib/x86_64-linux-gnu/libc-2.13.so
... snipped ...

So, libc-2.13.so has been loaded at address 0x7ffff7608000, which is what we got from running with just the LD_TRACE_LOADED_OBJECTS flag set, and not what we got with ldd or with invoking ld-linux.so.2 specially.

Why the difference? Clearly, first execveing the loader perturbs the address assignment. It's not clear why this should be---isn't the loader itself the first thing to be loaded anyway? I'm not yet sure what is going on.

Another question: is predicting the load address even a sound thing to do? Given that we had to disable randomization in the first place, it seems like a bad idea. In my case, this approach will do for now, but ultimately I should defer my work until application start-up time. Then we can discover the actual loaded addresses of the various libraries, which is much more robust.

[/devel] permanent link contact


Powered by blosxom

validate this page