Cerberus

C remains central to our computing infrastructure but still lacks a clear and complete semantics. Programmers lack tools to explore the range of behaviours they should expect; compiler development lacks test oracles; and formal verification and analysis must make (explicitly or implicitly) many choices about the specific C they target. The ISO standards for C have been developed over many years but they remain unclear in some important respects, and in some areas there are differences between the properties of C that are relied on by the corpus of systems code in production, the properties that compiler implementers aim to provide, and the properties of C as it is specified in ISO C11.

The Cerberus project is developing semantic models for a substantial fragment of C. It has several distinctive features:

Where the ISO C11 standard is clear and corresponds with practice, Cerberus aims to follow that.
Where there are ambiguities or differences, chiefly in the memory layout model (the behaviour of pointers, pointer arithmetic, uninitialised values, etc.), we aim to clarify the de facto standards and understand how the ISO standard could be reconciled with them.
Cerberus precisely defines the range of allowed behaviour, not just that of some specific implementation.
It is executable, to explore either all behaviours or single paths of small test programs.
Its thread-local semantics is factored via an elaboration into a simpler Core language, to make it readable and conceptually and mathematically tractable; the dynamic semantics of Core can be linked with various memory object models
The Cerberus front-end is written from scratch to closely follow the C11 standard, including a parser that follows the C11 standard grammar, and a typechecker.
The Cerberus BMC tool supports bounded model checking (for small examples), combining support for much of the Cerberus thread-local semantics, a modern memory object model, and an arbitrary axiomatic concurrency model.
A previous version of Cerberus supported integration with an operational concurrency model, proved equivalent to the C/C++11 axiomatic concurrency model of Batty et al.

This is a step towards a clear, consistent, and unambiguous semantics for C.

Cerberus Web Interface

The Cerberus web interface lets one interactively, randomly, or exhaustively explore the behaviour of small sequential C test programs in the Cerberus semantics.

Cerberus BMC Web Interface

The Cerberus BMC web interface lets one explore the behaviour of small concurrent C test programs with respect to an arbitrary axiomatic concurrency model.

Github repo

github repo

PhD Thesis

The Cerberus C semantics. Kayvan Memarian. Technical Report UCAM-CL-TR-981, University of Cambridge, Computer Laboratory, May 2023. PhD thesis. [ bib | doi | .pdf | abstract ]

Papers

CN: Verifying systems C code with separation-logic refinement types. Christopher Pulte, Dhruv C. Makwana, Thomas Sewell, Kayvan Memarian, Peter Sewell, and Neel Krishnaswami. In POPL 2023. [ bib | doi | project page | pdf | abstract ]
VIP: Verifying Real-World C Idioms with Integer-Pointer Casts. Rodolphe Lepigre, Michael Sammler, Kayvan Memarian, Robbert Krebbers, Derek Dreyer, and Peter Sewell. In POPL 2022, Proc. ACM Program. Lang. 6, POPL, Article 20. [ bib | doi | supplementary material | project page | pdf | abstract ]
RefinedC: Automating the Foundational Verification of C Code with Refined Ownership Types. Michael Sammler, Rodolphe Lepigre, Robbert Krebbers, Kayvan Memarian, Derek Dreyer, and Deepak Garg. In PLDI 2021. [ bib | pdf | abstract ]
Cerberus-BMC: a Principled Reference Semantics and Exploration Tool for Concurrent and Sequential C. Stella Lau, Victor B. F. Gomes, Kayvan Memarian, Jean Pichon-Pharabod, and Peter Sewell. In CAV 2019. [ bib | project page | pdf | abstract ]
Exploring C Semantics and Pointer Provenance. Kayvan Memarian, Victor B. F. Gomes, Brooks Davis, Stephen Kell, Alexander Richardson, Robert N. M. Watson, and Peter Sewell. In POPL 2019, Proc. ACM Program. Lang. 3, POPL, Article 67. Also available as ISO/IEC JTC1/SC22/WG14 N2311. [ bib | doi | supplementary material | project page | pdf | abstract ]
Into the depths of C: elaborating the de facto standards. Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N.M. Watson, and Peter Sewell. In PLDI 2016, PLDI 2016 Distinguished Paper award. [ bib | doi | project page | pdf | http | abstract ]
An operational semantics for C/C++11 concurrency. Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. In OOPSLA 2016. [ bib | doi | pdf | http | abstract ]

C Memory Object Model Study Group

github repo

The C/C++ Memory Object Models: ISO WG21 papers

WG21 Cologne meeting, July 2019

C/C++ Memory Object Model Papers - Introduction (P1797R0). Peter Sewell ISO/IEC JTC1/SC22/WG21 P1797R9. June 2019.
Effective Types: Examples (P1796R0). Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Hubert Tong. June 2019

The C Memory Object Model: ISO WG14 papers

WG14 virtual meeting, 2022-07

N3005: A Provenance-aware Memory Object Model for C. Working Draft Technical Specification. Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N3005. 2022-06-15.

WG14 virtual meeting, 2021-03

cmom-0006: Clarifying uninitialised reads v5 - working draft Kayvan Memarian, Peter Sewell. University of Cambridge 2021-03-09

WG14 virtual meeting, 2020-10

N2577: A Provenance-aware Memory Object Model for C. Working Draft Technical Specification. Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2577. September 30, 2020.

WG14 London meeting, 2019-04

N2378: C provenance semantics: slides (extracts from N2363). Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2378. April 2019.
N2362: Moving to a provenance-aware memory object model for C, Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2362 v1, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf, March 2019.
N2363: C provenance semantics: examples, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2363, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf, April 2019.
tarball of N2363 examples
N2364: C provenance semantics: detailed semantics (for PNVI-plain, PNVI address-exposed, PNVI address-exposed user-disambiguation, and PVI models), Peter Sewell, Kayvan Memarian, and Victor B. F. Gomes. ISO/IEC JTC1/SC22/WG14 N2364, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf, April 2019.
N2369: Pointer lifetime-end zap, Paul E. McKenney, Maged Michael, and Peter Sewell. ISO/IEC JTC1/SC22/WG14 N2369, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2369.pdf, April 2019.

N2311

N2311: Exploring C Semantics and Pointer Provenance. Kayvan Memarian, Victor B. F. Gomes, Brooks Davis, Stephen Kell, Robert N. M. Watson, Peter Sewell. Identical to the above POPL 2019 paper text.

WG14 Pittsburgh meeting 2018-10

N2294: C Memory Object Model Study Group: Progress Report. Peter Sewell. 2018-09-16
N2263: Clarifying Pointer Provenance v4 (N2263 at WG14)

WG14 Brno meeting working drafts 2018-04

WG14 Brno pre-meeting mailing, 2018-04

N2223: Clarifying the C Memory Object Model: Introduction to N2219 - N2222
N2219: Clarifying Pointer Provenance (Q1-Q20) v3
N2220: Clarifying Trap Representations (Q47) v3
N2221: Clarifying Unspecified Values (Q48-Q59) v3
N2222: Further Pointer Issues (Q21-Q46)
Structure and Union Padding (Q60-Q72): Section 3.3 of our revised N2013
Effective Types (Q73-Q81): Section 4 of our revised N2013

WG14 Pittsburgh meeting, 2016-10

WG14 London meeting, 2016-04

Here is a tarball of our test programs, including a tool to run them and log the compile- and run-time output:

defacto_tarball.tar

Surveys

In 2013, we disseminated a web form made of 42 questions, ranging over the semantics of pointers, the representation of objects, and the interaction of the two with other values (e.g. through the use of cast operators). Each question consisted of a prose description of a programming idiom, followed by a concrete C program. We targeted this to a small number of experts, but this survey demanded a lot offrom the respondents; it was best done by discussion in person over several hours - so this was limited to 16 participants. A (no longer active) snapshot of the questions is here.

In April-September 2015 we distributed a survey of 15 questions about C (What is C in practice? (Cerberus survey v2)). We were asking what C is in current mainstream practice: the behaviour that programmers assume they can rely on, the behaviour provided by mainstream compilers, and the idioms used in existing code, especially systems code. We were explicitly not asking what the ISO C standard permits, which is often more restrictive, or about obsolete or obscure hardware or compilers. We focussed on the behaviour of memory and pointers. We had responses from 323 people, including many compiler and OS developers. Here is a summary of the results and the (anonymised) detailed comments:

What is C in practice? (Cerberus survey v2): Analysis of Responses, Kayvan Memarian and Peter Sewell. (WG14 N2014)
What is C in practice? (Cerberus survey v2): Analysis of Responses - with Comments, Kayvan Memarian and Peter Sewell. (WG14 N2015)

People

Main contributors:

The main Cerberus developer is Kayvan Memarian. Victor Gomes made substantial contributions across the system, and Stella Lau was the main developer of Cerberus BMC. Kyndylan Nienhuis worked on the operational semantics for C11 concurrency. Cerberus originated with Justus Matthiesen's 2010-11 Part II project dissertation and his 2011-12 MPhil dissertation. James Lingard's 2013-14 MPhil dissertation developed a certifying translation validator for simple C programs for the Clang front-end, w.r.t. the Cerberus and Vellvm semantics. David Chisnall and Robert N. M. Watson contributed to the discussion of CHERI semantics. The experimental RefinedC verification system, which uses the Cerberus front-end, is by Michael Sammler, Rodolphe Lepigre, Robbert Krebbers, Kayvan Memarian, Derek Dreyer, and Deepak Garg. The experimental CN verification system, which uses the Cerberus front-end and elaboration into Core (adapted), is by Christopher Pulte, Dhruv C. Makwana, Thomas Sewell, Kayvan Memarian, Peter Sewell, and Neel Krishnaswami.

Funding

This software was developed largely within the Rigorous Engineering of Mainstream Systems (REMS) group at the University of Cambridge. It has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 789108, ELVER); the EPSRC Programme Grant REMS: Rigorous Engineering of Mainstream Systems (EP/K008528/1); an EPSRC Leadership Fellowship EP/H005633 (Sewell); a Gates Cambridge Scholarship (Nienhuis); an MIT EECS Graduate Alumni Fellowship (Lau); and Google. The work is also part of the CTSRD projects sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA8750-10-C-0237. The views, opinions, and/or findings contained in the papers are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of the Department of Defense or the U.S. Government.