Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 06 Oct 2014

Project extra

I just thought of another nice project idea, so here it is.

A generic fuzz-tester, using DWARF

Fuzz-testing is a technique for randomised software testing. The software under test is run with randomly modified inputs, starting from existing test inputs. This covers more code paths, hence potentially finding more bugs, than using only human-crafted test inputs.

Typically, fuzzers are built around particular input domains. For example, we might write one fuzzer for C compilers which generates randomised C source files. We might write another fuzzer for X11 applications which randomly modifies the packets sent over the wire. (In fact the original fuzzer, xjig, did exactly this.) We might write yet another fuzzer for a particular library API, like John Regehr describes here. This works... but can we build a more powerful, more general fuzzing system than these per-domain solutions?

This project is about building a general tool for the latter scenario: fuzzing of library APIs. We want to be able to fuzz any library whose API we have a description of. For us, this will mean described by compiler-generated DWARF debugging information. There are several technical steps to this. Firstly, we need the ability to observe and (optionally) capture API traces, typically from running an existing test suite. This can be done using something reasonably off-the-shelf like ltrace, although we might want to make some modifications to suit our purposes. Secondly, we need to perturb these traces somehow to generate randomised versions. These can be randomised both in terms of the calls made and the arguments passed; the project would need to investigate various randomisation strategies. Thirdly, we then execute these randomised traces and attempt to detect any errors that occur—perhaps assertion failures reported by the program itself, memory errors reported by Memcheck (of Valgrind fame), or type errors reported by libcrunch.

For evaluation, we would like to measure how much we improve test coverage, relative to the existing test suites. We could perhaps also compare this improvement against the improvement obtained by API-specific fuzzers, like Regehr's, hoping to do almost as well (or perhaps even better!).

One problem with fuzzing is false positives. It might be that our randomised changes yield traces which exercise the API in way that aren't supposed to work. Normally we'd want only to generate traces in which the client stays in-spec, while perhaps leading the library itself to go out-of-spec (that's what a bug is!). (In security-flavoured fuzzing, the exposed attack surface is what's important, not the documented interface per se, but the same principle applies.) Such specifications are rarely written down precisely! So, extension-wise, an obvious direction is to refine our model of APIs to allow for more semantic constraints. To pick a trivial example, if we were testing the C library's malloc implementation, one constraint is that we shouldn't double-free a chunk. That's a little subtle—it's okay to free the same chunk of memory a second time, iff it was issued by malloc a second time! There is a lot of scope for investigating this kind of constraint, and, in general, to produce more sophisticated semantic models of APIs. There is a wealth of research about “API usage patterns”, but often using only a naive finite-state formalism that struggles to capture whole-trace properties (like the malloc one I just described). We could also investigate using Daikon, or some similar invariant generator, for inferring such invariants from collections of harvested traces.

[/research] permanent link contact

validate this page