Project Suggestions 2024-25

These might suit good students interested in combining semantics, systems, and/or mechanised reasoning. Some are probably best suited for an MPhil/Part III project rather than a Part II project, though it might be possible to craft a Part II project in a similar vein on a smaller scale. Members of the REMS group might be interested in supervising other projects, using or building semantics for aspects of real-world computer systems that have been developed in the group.

Testing and extension of a mechanised executable specification of DWARF debug information

Supervisors: Peter Sewell and Stephen Kell (King's College London)

The DWARF format is widely used to express debug information in compiled binaries to support source-level debugging, e.g. via gdb. It includes information about:

DWARF is defined by a prose standard. The full debug information for a binary can be very lengthy, so the format uses several small custom languages to express it - to render the debug information, one has to evaluate expressions in these languages.

Previous work in the REMS project developed an executable specification for a large fragment of DWARF, as an extension to our specification for ELF files and compile-time ('static') linking published at OOPSLA 2016 (The missing link: explaining ELF static linking, semantically. Stephen Kell, Dominic P. Mulligan, and Peter Sewell). These are expressed in the pure functional specification language Lem, which is similar to the pure functional fragment of core OCaml and can be translated into OCaml for execution, and to theorem-prover definitions for reasoning. Previous work also developed tooling to render the debug information, as computed by this specification, into a convenient human-readable format, e.g. this example, and used the debug information for translation validation, to help prove the correspondence between source and binary for very small programs.

There are now several interesting challenges:

Testable executable specification of ELF-based dynamic linking

Supervisors: Stephen Kell (King's College London) and Peter Sewell

A dynamic linker is involved in the execution of most programs today. However, dynamic linking remains poorly understood, both conceptually (by programmers) and semantically (in its specification). This project is about producing a testable, executable specification of ELF-based dynamic linking, or at least some subset of it, and applying it in one or more ways.

The core of the project will involve learning about dynamic linking and producing a basic specification, executable in some convenient fashion e.g. as a checker that runs immediately before control passes to user code. This would check that linker-performed mappings (of needed libraries) and bindings (applying relocations etc. to plumb data and code together) meet expectations. It might also run again later, if dynamic loading occurs. Initial testing could focus on straightforward cases and "known deficiencies" (e.g. that copy-relocation of a protected symbol does not work correctly on older versions of glibc, that DF_1_INTERPOSE is unimplemented by glibc and musl, and other examples that can be found in bug trackers).

Extension aspects could include some of the following.

Various starting points are available: a Lem specification for compile-time ('static') linking published at OOPSLA 2016 (The missing link: explaining ELF static linking, semantically. Stephen Kell, Dominic P. Mulligan, and Peter Sewell), and a prior attempt at a load-time checker, in Simon Ser's Paris MPRI project.

Since the specification would be written in Lem, a purely functional ML-family specification language, functional programming skills are required. Familiarity with systems programming and outline familiarity with linking are also important, although it is not important to understand linking deeply at the start of the project.