3 Task 3: Systems Programming Languages
The C language, originating around 1970, remains at the heart of a great deal of security-critical systems software, including operating systems, hypervisors, and embedded and Internet-of-Things devices – and many security flaws ultimately come down to C code. Ambiguities and differences of opinion about what the language is thus have considerable potential for harm – but it has never been well-defined. In earlier (pre-REMS) work, we collaborated closely with the ISO WG21 C++ and WG14 C committees to clarify their concurrency models, with our mathematical models (in prose form) incorporated into those international standards.
Our work on the sequential aspects of C semantics proceeds on several fronts: investigating the de facto standards of C as it is implemented and used; addressing and contributing to the ISO standard produced by the ISO WG14 C committee; and building an executable formal model for a substantial fragment of C, Cerberus [27, 28]. Our initial paper on this  was awarded a PLDI 2016 Distinguished Paper Award. We intend to release Cerberus under a permissive open-source licence in due course.
We are participating in the ISO WG14 C standards committee: we attended the April 2016 (London), October 2016 (Pittsburgh), April 2018 (Brno), and October 2018 (Pittsburgh) committee meetings, producing committee discussion papers ([29, 30, 31, 32], [33, 34, 35], [36, 37, 38, 39, 40], ), and instigated the WG14 C Memory Object Model Study Group; we are also engaging with the various compiler and developer communities, with talks at EuroLLVM 2018 and the GNU Tools Cauldron 2018, to Red Hat, and at 35c3.
3.2 CHERI C
At the C language level, just as at the architecture level, we have another strong collaboration with the CHERI work, with their experience in systems software (and specifically in porting it to the C supported by CHERI) informing our semantic understanding, and with this discussion identifying issues with the CHERI design. This made substantial contributions to the two Cerberus papers cited above [27, 28], and we have also contributed to work on the CHERI ABI, enforcing pointer provenance and minimising pointer privilege in a POSIX C environment .
We also developed a conceptual framework for application compartmentalization, a vulnerability mitigation technique employed in programs such as OpenSSH and the Chromium web browser, which decomposes software into isolated components to limit privileges leaked or otherwise available to attackers. This is embodied in an LLVM-based tool: the Security-Oriented Analysis of Application Programs (SOAAP), that allows programmers to reason about compartmentalization using source-code annotations (compartmentalization hypotheses) .
We are developing a system for run-time type checking in C, libcrunch, extending it to additionally check array/pointer bounds [51, 52, 53]. Using run-time type information, we have found it possible to tolerate real C idioms not supportable by software fat-pointer schemes. Early indications suggest that the available run-time performance is comparable or better than competing approaches.
We have continued to refine the run-time type infrastructure underlying libcrunch into a more general-purpose infrastructure with a variety of other applications  including debugging, interfacing native code with dynamic languages, and dealing with memory-mapped files.
Libcrunch github repo: https://github.com/stephenrkell/libcrunch/
Systems software in reality is written not just in more-or-less conventional programming languages, like C and C++, but also using a complex menagerie of techniques to control the linking process. This linker speak is often poorly understood by working programmers and has largely been neglected by language researchers. We have surveyed the many use-cases that linkers support and developed the first validated formalisation of a realistic executable and linkable format, ELF, together with an executable specification of linking with substantial coverage . This also incorporates a semantics for DWARF debug information. The model is written in Lem, which generates an executable OCaml version. This ELF model is used in our Task 2 RMEM architectural emulation tool. The model is available under a permissive open-source licence.
ELF semantics github repo: https://github.com/rems-project/linksem.
3.5 Concurrent C/C++
Turning to the concurrency semantics of C-based languages, in earlier (pre-REMS) work, we collaborated closely with the ISO WG21 C++ and WG14 C committees to clarify their concurrency models, with our mathematical models (in prose form) incorporated into those ISO C/C++11 international standards. (Batty et al., POPL 2011). We have built on those, and also considered potential future models that would fix the so-called ‘thin-air problem’ that C/C++11 suffers from.
Batty’s 2014 Cambridge PhD thesis The C11 and C++11 Concurrency Model was awarded the 2015 ACM SIGPLAN John C. Reynolds Doctoral Dissertation Award, presented annually to the author of the “outstanding doctoral dissertation in the area of Programming Languages”, and was the winner of the 2015 CPHC/BCS Distinguished Dissertations competition, selected as “the best British PhD/DPhil dissertations in computer science”. The WG14 C committee also approved his remaining defect reports w.r.t. the C11 concurrency model.
We developed a C11 operational semantics that is proved equivalent (in Isabelle/HOL) to the Batty et al. axiomatic model . This was also integrated into our Cerberus sequential C model to allow litmus tests to be executed, though that integration is not currently maintained, to simplify Cerberus development.
We showed that no straightforward adaptation of the C/C++11 axiomatic model, phrased as a consistency predicate over candidate executions, can exclude thin-air reads while also admitting some standard optimisations, identified additional difficulties with undefined behaviour and with mixtures of atomic and nonatomic accesses. We also proved (in HOL4) that the C/C++11 model actually provides the DRF-SC guarantee .
Along with the extension of our ARM and IBM POWER hardware models to support mixed-size accesses, described above, we proposed a corresponding extension to the C/C++11 axiomatic model to support mixed-size non-atomic accesses .
We discussed a new problem with C/C++11, uncovered by groups in Princeton and MPI-SWS/SNU, that contradicts the soundness of compilation schemes from C/C++11 to ARM and POWER (pre-REMS, we had published hand proofs of those that now turn out to be flawed); this required additional changes to C/C++11, developed by the latter group.
We proposed a safe, portable, and efficient implementation of co-routines .
3.6 Verified ML implementation: CakeML
We continue to contribute to the CakeML development of a verified ML implementation, especially w.r.t. the backend ISA models and proof, as discussed in Task 2.
This work has been presented in keynotes in CADE 2017 and FSEN 2017, and at an invited talk at Emerging Technologies, the 2017 New York ECMAScript meeting.
WebAssembly is a new low-level language currently being implemented in all major web browsers. It is designed to become the universal compilation target for the web, obsoleting previous solutions such as asm.js and Native Client. We have developed a mechanised Isabelle specification for WebAssembly, including a verified executable interpreter and type-checker, and a mechanised proof of type soundness . This exposed several issues with the official WebAssembly specification and influenced its development. We have also worked on constant-time computation in WebAssembly .
WebAssembly AFP Isabelle development: https://www.isa-afp.org/entries/WebAssembly.html
Working with our OCaml Labs colleagues, we have proposed a concurrency semantics for OCaml  and contributed to various OCaml infrastructure: a generic approach to defining partially-static data and corresponding operations ; conex , a tool which allows developers to cryptographically sign their released software, and users to verify them, ensuring integrity and authenticity without a centralised trust authority; and a modular foreign function interface for OCaml .