libcrunch: run-time type- and bounds-checking for C (and maybe other languages...)

libcrunch is a system for run-time checks in unsafe languages. That means mostly C, but also some support for C++, and perhaps other unsafe languages in future.

Although C and C++ are not safe languages, there is nothing to stop implementations of them from being dynamically safe—or at least, safe modulo certain assumptions and restrictions. The research hypothesis behind libcrunch is that this can be achieved without unreasonable assumptions or impractical restrictions, at least when targeting modern Unix-like operating systems (rather than embedded platforms).

What does libcrunch do?

Libcrunch consists of a run-time library and some toolchain extensions. The user compiles with crunchcc which invokes the host toolchain's C compiler and linker, but with additional pre- and post-processing. The most significant preprocessing is instrumentation of the source code with run-time checks. Once built, the binary may be run with or without checks enabled.

$ crunchcc -o myprog ...           # + other front-ends
$ ./myprog                         # runs normally
$ LD_PRELOAD=libcrunch.so ./myprog # does checks
myprog: Failed __is_a_internal(0x5a1220, 0x413560
 a.k.a. "uint$32") at 0x40dade, allocation was a 
 heap block of int$32 originating at 0x40daa1

crunchcc inserts only type checks, not bounds checks; crunchxcc inserts both type checks and bounds checks.

Why is libcrunch different?

Various other run-time checking tools exist. Most C programmers have used Memcheck and, increasingly, ASan. Libcrunch is different in several ways.

Type checking

No other tool, to my knowledge, checks pointer casts, va_arg or union accesses.

Precise bounds checking

Memcheck, ASan and other bounds checkers are based on approximate techniques, such as allocation tables and redzones. These are fooled by simple C constructs like nested arrays, and can still miss other wild accesses if you're unlucky. By contrast, libcrunch's checks are precise. In this way it is more like SoftBound, Safe-C and similar research systems, although they lack most of its other advantages (below).

Track allocations, not pointers

Existing precise bounds checkers propagate base and bound information with every pointer value. This is mostly fine, but is vulnerable to false-positive bounds errors in code that does pointer casts, and it cannot deal with uninstrumented libraries. By associating metadata with allocations (pointees), not pointers, libcrunch avoids these problems, making it much more robust to real code and real usage scenarios.

Check derivation, not use

Libcrunch does not check pointer uses (dereferences). Instead, it checks less frequent operations which create pointers. This makes its approach to bounds checking the fastest yet. For array-based programs, it is competitive with SoftBound et al. For programs primarily using linked structures, it is faster.

No ABI changes

Like SoftBound, but unlike earlier approaches that were based on “fat pointers”, libcrunch goes to great lengths to avoid changes to data representation or other ABI details. This is necessary to allow uninstrumented libraries to be linked with instrumented code.

No loss of precision with libraries

By avoiding per-pointer metadata libcrunch not only allows uninstrumented libraries to be linked in, but can still check code that uses them. Of course, no checks are done within the uninstrumented code. But unlike pointer-based systems, in libcrunch you can pass pointers through uninstrumented code without any loss of bounds or type information. This is because bounds are calculated from the pointed-to allocation's type information—they are not associated with the pointer, except as a caching optimisation.

Less rebuilding

Rather than two builds of your code, with libcrunch a single build often suffices. This is because checking is enabled only at load time. Binaries running without the run-time library run as the original code, with reduced slowdown. Relative to an uninstrumented build, this slowdown is negligible many cases, although it does depend heavily on the code concerned and on which checks are enabled.

No problem with custom allocators

Libcrunch supports custom allocators. Even better, you don't have to change your source code to tell libcrunch about them. Instead, you tell it about them at build time. This allows the right functions can be instrumented, mostly using link-time techniques.

No aborts

When libcrunch detects an error, it doesn't have to abort your program. That doesn't sound too special, but it's important for practicality, and is in tension with performance. If a failure can be assumed to abort, the stronger postcondition allows the compiler to optimise subsequent checks more aggressively. Libcrunch goes to some lengths to get the best of both worlds: allow continuing on check failure, but provide a fast path that the compiler can optimise on the basis of “no failures yet”.

Building and installing

Currently, libcrunch is only for x86-64 GNU/Linux systems (although ports are welcome and very feasible). For code and build instructions, please see the GitHub page.

Contact

Please do send me e-mail if you have questions or comments.

Acknowledgments

This work has been supported by EPSRC Programme Grant EP/K008528/1, REMS: Rigorous Engineering of Mainstream Systems.

A preliminary version was created with funding from the Oxford Martin School Institute for the Future of Computing.

Content updated at Tue 17 May 12:27:00 BST 2016.
validate this page