Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 30 Jan 2017

Debugging with the natives, part 2

Some aeons ago, I wrote a piece giving a rough outline of how a native debugger like gdb works, and promised a follow-up that looks at the same pile of techniques in a more principled way. I can't really excuse the delay, but if anyone's still listening, here goes.

Source-level debugging of native code is supported on (I make it) four different different levels in a complete hardware–software stack.

We can split these four into two pairs. The first two implement machine-level primitives and their “upward mapping” (virtualisation) into the operating system's process abstraction. The second two implement the source-level view that the programmer usually prefers, again mapping upwards from binary- to source-level.

Let's take the machine-level pair to begin with. From the operating system's point of view, all debugging support is designed around the following principles. (Perhaps I shouldn't say “designed” since in reality they “grew”— and only became principles later.)

Some surprisingly strong properties result from this design. Firstly, debugging can be done from a remote process, perhaps on a separate machine from the debuggee, perhaps even a machine of a different architecture. Secondly, debugging can be done post-mortem. Thirdly, the same infrastructure works for many source languages—albeit trivially so far, since we've only seen how to get an assembly-level view. There are some contrasts here with most language virtual machines (think JVM): these implement debugging using in-VM debug servers. These can work across the network, but don't support post-mortem debugging, and typically bake in concepts from source languages.

That's enough about the machine-level part. To go from machine- or assembly-level debugging to source-level debugging, we need help from the compiler. This is designed around the following principles.

Let's do a more concrete run-through of how it works. So far I've been fairly generic, but let's fix on GNU/Linux as our modern Unix—though all ELF-based systems are pretty similar—and a familiar architecture (x86-64) and specific metadata format (DWARF).

When I compile a program with -g it means “please generate metadata”. First, let's try without.

$ cc -o hello hello.c
$ readelf -S hello | grep debug  # no output! no debugging sections

You can still debug this program, at the assembly level, because the OS debugging mechanisms remain available. It's as if the compiler-generated assembly is code that you wrote manually by yourself. You can set breakpoints, watchpoints, single step, and so on.

$ gdb -q --args ./hello
Reading symbols from ./hello...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x40052a
(gdb) run
Starting program: /tmp/hello 

Breakpoint 1, 0x000000000040052a in main ()
(gdb) disas
Dump of assembler code for function main:
   0x0000000000400526 <+0>:     push   %rbp
   0x0000000000400527 <+1>:     mov    %rsp,%rbp
=> 0x000000000040052a <+4>:     mov    $0x4005c4,%edi
   0x000000000040052f <+9>:     callq  0x400400 <puts@plt>
   0x0000000000400534 <+14>:    mov    $0x0,%eax
   0x0000000000400539 <+19>:    pop    %rbp
   0x000000000040053a <+20>:    retq 

Now let's compile with debug information.

$ cc -g -o hello hello.c
$ readelf -S hello | grep debug
  [28] .debug_aranges    PROGBITS         0000000000000000  00001081
  [29] .debug_info       PROGBITS         0000000000000000  000010b1
  [30] .debug_abbrev     PROGBITS         0000000000000000  00001142
  [31] .debug_line       PROGBITS         0000000000000000  00001186
  [32] .debug_frame      PROGBITS         0000000000000000  000011c8
  [33] .debug_str        PROGBITS         0000000000000000  00001210

What's in these debug sections? There are three main kinds of information. Firstly, there are files and line numbers (.debug_line). These encode a mapping from object code addresses to source coordinates (file, line, column). You can dump it fairly readably, as follows.

$ readelf -wL hello 
Decoded dump of debug contents of section .debug_line:

CU: hello.c:
File name                            Line number    Starting address
hello.c                                        4            0x400526
hello.c                                        5            0x40052a
hello.c                                        6            0x400534
hello.c                                        7            0x400539

Secondly, there is frame information (often this comes out in a section called .eh_frame so I cheated a bit above; to get exactly the above with a current gcc, you should add the -fno-dwarf2-cfi-asm switch). This tells the debugger how to walk the stack. What does walking the stack mean? Roughly it means getting a sequence of stack pointers paired with their program counter values, for each call site that is active on the callchain. The old stack pointer and program counter are always saved somewhere, otherwise we couldn't return from the call. To walk the stack we start from the current “live” register file given to use by ptrace(), which holds the “end” stack pointer and program counter. The DWARF then describes how to “rewind” these register values, and/or any other registers whose job is to record the callchain (rbp on x86-64; other callee-saves are often included too) back to their state at the previous call site in the chain. The description of this unwinding is logically a table, which you can see below for the main function. The cells are expressions describing how to compute the caller's value for a register, here rbp value (frame pointer) and also the caller's program counter, i.e. the return address (given as ra). The computations are both factored into two. Firstly we calculate a “canonical frame address” from the current frame's register values (see the CFA column): it's a fixed offset from rsp and rbp, and is actually a fixed address on the stack, but the expression changes from instruction to instruction as the stack pointer gets adjusted. Secondly we obtain the saved values we want by reading from the stack at fixed offsets from that address (c-8 means 8 bytes down). This factoring helps compactness, because the CFA-relative offsets don't change when the stack pointer moves; only the CFA column needs to describe that. However, although “stored at some offset from the CFA” covers a lot of cases, sometimes more complex computations are required, which usually appear as DWARF bytecode expressions.

$ readelf -wF hello
00000088 000000000000002c 0000001c FDE cie=00000070 pc=0000000000400526..000000000040053b
   LOC           CFA      rbp   ra    
0000000000400526 rsp+8    u     c-8 
0000000000400527 rsp+16   c-16  c-8 
000000000040052a rbp+16   c-16  c-8 
000000000040053a rsp+8    c-16  c-8 

The .debug_info section is the biggie. It describes the structural detail of the source program along both source and binary dimensions. It has a list of source files, but also a list of compilation units. The latter is where most of the structure is. It describes functions/methods, data types, and all the language-implementation decisions that the compiler took when generating binary code: how data types are laid out, which registers or stack slots hold each local variable over its lifetime, and so on. Although not shown much in the simple case shown below, addresses of program variables are described in a Turing-powerful stack machine language which is essentially a bytecode; the DW_OP_call_frame_cfa below is one operation, which simply says “push the address of the frame base, as recorded by the frame info”. The tree-like structure of the information also describes detailed static structure of code, including function inlining, the in-memory locations corresponding to particular lexical blocks in the code, and so on. (It's worth asking whether DWARF info it might usefully bundle the source code itself. I've never seen this done, but it would make a lot of sense to me.)

$ readelf -wi hello
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x8d (32-bit)
   Version:       4
   Abbrev Offset: 0x0
   Pointer Size:  8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x2f): GNU C 4.9.2 -mtune=generic -march=x86-64 -g -fno-dwarf2-cfi-asm -fstack-protector-strong
    <10>   DW_AT_language    : 1        (ANSI C)
    <11>   DW_AT_name        : (indirect string, offset: 0x88): hello.c
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0xb5): /tmp
    <19>   DW_AT_low_pc      : 0x400526
    <21>   DW_AT_high_pc     : 0x15
    <29>   DW_AT_stmt_list   : 0x0
 <1><2d>: Abbrev Number: 2 (DW_TAG_base_type)
    <2e>   DW_AT_byte_size   : 8
    <2f>   DW_AT_encoding    : 7        (unsigned)
    <30>   DW_AT_name        : (indirect string, offset: 0x0): long unsigned int
 <1><34>: Abbrev Number: 2 (DW_TAG_base_type)
    <35>   DW_AT_byte_size   : 1
    <36>   DW_AT_encoding    : 8        (unsigned char)
    <37>   DW_AT_name        : (indirect string, offset: 0xa2): unsigned char
 <1><3b>: Abbrev Number: 2 (DW_TAG_base_type)
    <3c>   DW_AT_byte_size   : 2
    <3d>   DW_AT_encoding    : 7        (unsigned)
    <3e>   DW_AT_name        : (indirect string, offset: 0x12): short unsigned int
 <1><42>: Abbrev Number: 2 (DW_TAG_base_type)
    <43>   DW_AT_byte_size   : 4
    <44>   DW_AT_encoding    : 7        (unsigned)
    <45>   DW_AT_name        : (indirect string, offset: 0x5): unsigned int
 <1><49>: Abbrev Number: 2 (DW_TAG_base_type)
    <4a>   DW_AT_byte_size   : 1
    <4b>   DW_AT_encoding    : 6        (signed char)
    <4c>   DW_AT_name        : (indirect string, offset: 0xa4): signed char
 <1><50>: Abbrev Number: 2 (DW_TAG_base_type)
    <51>   DW_AT_byte_size   : 2
    <52>   DW_AT_encoding    : 5        (signed)
    <53>   DW_AT_name        : (indirect string, offset: 0x25): short int
 <1><57>: Abbrev Number: 3 (DW_TAG_base_type)
    <58>   DW_AT_byte_size   : 4
    <59>   DW_AT_encoding    : 5        (signed)
    <5a>   DW_AT_name        : int
 <1><5e>: Abbrev Number: 2 (DW_TAG_base_type)
    <5f>   DW_AT_byte_size   : 8
    <60>   DW_AT_encoding    : 5        (signed)
    <61>   DW_AT_name        : (indirect string, offset: 0xb0): long int
 <1><65>: Abbrev Number: 2 (DW_TAG_base_type)
    <66>   DW_AT_byte_size   : 8
    <67>   DW_AT_encoding    : 7        (unsigned)
    <68>   DW_AT_name        : (indirect string, offset: 0xb9): sizetype
 <1><6c>: Abbrev Number: 2 (DW_TAG_base_type)
    <6d>   DW_AT_byte_size   : 1
    <6e>   DW_AT_encoding    : 6        (signed char)
    <6f>   DW_AT_name        : (indirect string, offset: 0xab): char
 <1><73>: Abbrev Number: 4 (DW_TAG_subprogram)
    <74>   DW_AT_external    : 1
    <74>   DW_AT_name        : (indirect string, offset: 0xc2): main
    <78>   DW_AT_decl_file   : 1
    <79>   DW_AT_decl_line   : 3
    <7a>   DW_AT_prototyped  : 1
    <7a>   DW_AT_type        : <0x57>
    <7e>   DW_AT_low_pc      : 0x400526
    <86>   DW_AT_high_pc     : 0x15
    <8e>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <90>   DW_AT_GNU_all_tail_call_sites: 1
 <1><90>: Abbrev Number: 0

That's it for the tour. Let me finish with some reflections on what's good and bad about this way of doing things.

Descriptive debugging: the good

Why do debugging this convoluted, long-winded way, with all this metadata, instead of the apparently simpler VM way of doing things? In VMs, debug servers are integrated into the runtime, and offer a fixed, high-level command interface. This allows the VM's compile-time implementation decisions to stay hidden. There is no need to tell the debugger how storage is laid out, where local variables live, and so on, because the command interface is pitched at a higher level of abstraction; those details remain internal to the VM. This is convenient for the VM implementer, since generation of debugging information is onerous to implement. But it also requires cooperation from the debuggee, couples debuggers to a fixed command language or wire protocol, and presents strictly less information to the developer. While VM debuggers are designed around the abstraction boundaries of the source languages, metadata-based debugging actively enables descending through these boundaries. It is sometimes very useful for debugging tools to expose implementation details in this way. The most obvious case is when faced with a compiler or VM bug; the user would like to “shift down” to the lower level to inspect the assembly code or VM state. At other times, there are performance bugs that the developer has a hunch are about cache or paging effects; being able to see the raw addresses and raw memory contents can help here, even when the program is running on a VM.

Being highly descriptive, debugging metadata documents a large number of implementation decisions taken by the compiler, so is useful not only to debuggers but also to profilers, language runtimes (C++ exception handling is usually built on DWARF frame information), other dynamic analysis tools such as Valgrind family, and so on.

Debugging optimised code (without deoptimising)

Debugging metadata must describe optimised code. By contrast, VM debug servers typically arrange that debug-server operations only need to deal with unoptimised stack frames and at most simply-generated code (e.g. from a template-based JIT). Confusingly, even the “full-speed debugging” feature of HotSpot uses dynamic deoptimisation to get back to unoptimised code—the earlier approach was to run the whole program under the interpreter whenever you wanted a debuggable execution. In general, a debuggable VM instance must either refrain from optimisation, or know how to dynamically undo that optimisation when a debugger is attached. So, dynamic deoptimisation is not exactly “full speed”—unlike with native debuggers, execution still slows down significantly when a debugger is attached. Having the VM implement debug operations only over unoptimised code is a restriction that helps make the debug server simple, at some cost in debug-time performance.

The flip side is that VM debugging is pretty good at precisely maintaining the source-level abstraction (modulo VM bugs), without complicating the task of implementing optimisations. Meanwhile, in Unix-land, the debugging experience remains best-effort and only as good as the compiler-generated metadata, which is sometimes wrong or incomplete following complex transformations. When optimisation and debuggability are in tension, debuggability usually takes the hit, so a smooth debugging experience still sometimes relies on a separate unoptimised “debug build”. Tail call optimisations are a classic debugging-hindering optimisation, since they rely on eliding stack frames, meaning the debugger cannot know how many recursive calls are logically on the (source-level) stack. Instruction scheduling is another: the order of operations in the executed code need not match source program order, and this can make for debug-time confusion.

The control problem

Some other properties are incidental but tend to be true of current Unix-style debugging. Debugger support for exercising the target language's control features (exceptions, threads, allocations, ...) is uneven, because this can't be described purely as metadata; there needs to be some interaction with the host runtime. DWARF and similar debugging information is good at the “statics” of describing how to decode the program state, but not good at describing the protocols of interaction with a language runtime, necessar for performing operations such as spawning a thread, allocating an object or throwing an exception. These tend to be difficult unless these happen to be cleanly exposed as entry points in the language runtime. In practice debuggers usually achieve these things by having magic knowledge of particular implementations.

At least one semi-portable interface has emerged with the aim of encapsulating run-time control operations for debuggers' benefit. I'm thinking of libthread_db, best described by Joe Damato's excellent article. Unfortunately it's an abomination, because it violates the principle that implementation details are described by architecture-independent metadata. An odd but cleaner and more consistent alternative would be to bundle snippets of DWARF bytecode for doing these runtime interactions—perhaps in the debug information of a language runtime, either simply calling into the runtime (for cleanly abstracted operations) or doing something more complex. But that is only a technical possibility; there are no proposals or working demos of that as far as I'm aware (maybe I'll make one). This might sound wacky, but if you know about the early history of Java, in Oak and the Green Project, you'll see a certain uncanny similarity in these ideas.

Levels of abstraction

Debugging at multiple levels of abstraction is a neat facility of Unix-style debugging, but is also a difficult power to control. It can be useful to switch down to the assembly-level view, or to switch languages, but this capability doesn't generalise to the case where many abstraction layers are built up within the same language (think C++). The debugger will let you “continue to the next source line”, but it doesn't know how to keep you at the same abstraction level. If the next source line is deep inside a library implementing something fairly basic like a smart pointer, it will skip only as far as that line, whereas you probably wanted to stay roughly at the same level of abstraction, or perhaps within the same codebase. Things get particularly bad when there is a lot of inlining (again with C++). The traditional “step over” and “step into” are hints at this need, but are too crude.

Doing better is currently beyond the debugger's ken, but this problem could be solved: perhaps by bringing in the knowledge of libraries and source file trees that the debugger already has, or perhaps most simply by allowing programmers to manually mark out the boundaries between layers. This could be a simple partitioning over source files and binary objects, or could be something more complex, perhaps sensitive to calling context or argument values (consider the case of the same library used from two places in the same application). “Next”-style operations could then be defined in terms of these layers. I'd love to see this, although the details would take a lot of working out.

To be continued?

There's plenty of research to be done on more debuggable language implementations. This research area doesn't seem to get the attention it deserves. One problem is that debugging is usually an afterthought for researchers, even though it is essential for programmers. Another problem is that not many PL researchers have sight of the design space; they're familiar with either the Unix-style approach or the VM-style one. I hope that in the future we can figure out how to get the best of both worlds.

[/devel] permanent link contact

Powered by blosxom

validate this page