: Code-Specify-Test-Debug-Prove: Flexibly Integrating Separation Logic Specification into Conventional Workflows. Zain K Aamer¹, Rini Banerjee¹, Hiroyuki Katsura¹, David Kaloper-Meršinjak¹, Dimitrios J. Economou, Kayvan Memarian, Dhruv Makwana, Neel Krishnaswami, Benjamin C. Pierce, Christopher Pulte, and Peter Sewell. Proc. ACM Program. Lang., 10(PLDI), June 2026. ¹ these authors contributed equally. [ bib | doi | pdf | http ]
We seek to enable more flexible use of rich specifications, in a variety of ways that smoothly extend conventional software development practice: we show how a single specification language based on separation logic, that can capture the subtle ownership disciplines of systems code, can be used for runtime assertion checking, for property-based testing, and for formal machine-checked proof---with each of these complementing and supporting the others. We demonstrate this on a challenging example: a component of a production hypervisor, running both stand-alone at user level and in situ in the hypervisor.
: An Axiomatic Basis for Computer Programming on Relaxed Hardware Architectures: The AxSL Logics. Zongyuan Liu, Angus Hammond, Thibaut Pérami, Peter Sewell, Lars Birkedal, and Jean Pichon-Pharabod. ACM TOPLAS, January 2026. [ bib | doi | pdf | http ]
Very relaxed concurrency memory models, like those of the Arm-A, RISC-V, and IBM Power hardware architectures, underpin much of computing but break a fundamental intuition about programs, namely that syntactic program order and the reads-from relation always both induce order in the execution. Instead, out-of-order execution is allowed except where prevented by certain pairwise dependencies, barriers, or other synchronisation. This means that there is no notion of the ‘current’ state of the program, making it challenging to design (and prove sound) syntax-directed, modular reasoning methods like Hoare logics, as usable resources cannot implicitly flow from one program point to the next.We present AxSL, a family of separation logics for relaxed hardware memory models, and instantiate it on sequential consistency and on the Arm-A memory model. The Arm-A instance captures the fine-grained reasoning underpinning the low-overhead synchronisation idioms used by high-performance systems code. We mechanise AxSL in the Iris separation logic framework, illustrate it on key examples, and prove it sound with respect to the axiomatic memory model of Arm-A.By instantiating AxSL on different memory models, we demonstrate the generality of our approach, and show that it is largely generic in the axiomatic model and in the instruction-set semantics, offering a potential way forward for compositional reasoning for other models, and for the combination of production concurrency models and full-scale ISAs.
: ArchSem: Reusable rigorous semantics of relaxed architectures. Thibaut Pérami, Thomas Bauereiss, Brian Campbell, Zongyuan Liu, Nils Lauermann, Alasdair Armstrong, and Peter Sewell. In POPL 2026. [ bib | doi | pdf | github ]
The specifications of mainstream processor architectures, such as Arm, x86, and RISC-V, underlie modern computing, as the targets of compilers, operating systems, and hypervisors. However, despite extensive research and tooling for instruction-set architecture (ISA) and relaxed-memory semantics, recently including systems features, there still do not exist integrated mathematical models that suffice for foundational formal verification, of concurrent architecture properties or of systems software. Previous proof-assistant work has had to substantially simplify the ISA semantics, the concurrency model, or both.
We present ArchSem, an architecture-generic framework for architecture semantics, modularly combining ISA and concurrency models along a tractable interface of instruction-semantics effects, that covers a range of systems aspects. To do so, one has to handle many issues that were previously unclear, about the architectures themselves, the interface, the proper definition of reusable models, and the Rocq and Isabelle idioms required to make it usable. We instantiate it to the Arm-A and RISC-V instruction-set architectures and multiple concurrency models.
We demonstrate usability for proof, despite the scale, by establishing that the Arm architecture (in a particular configuration) provides a provable virtual memory abstraction, with a combination of Rocq, Isabelle, and paper proof. Previous work provides further confirmation of usability: the AxSL program logic for Arm relaxed concurrency was proved sound above an earlier version of ArchSem.
This establishes a basis for future proofs of architecture properties and systems software, above production architecture specifications.
: Ghost in the Android Shell: Pragmatic Test-oracle Specification of a Production Hypervisor. Kayvan Memarian, Ben Simner, David Kaloper-Meršinjak, Thibaut Pérami, and Peter Sewell. In SOSP 2025. [ bib | doi | pdf ]
Developing systems code that robustly provides its intended security guarantees remains very challenging: conventional practice does not suffice, and full functional verification, while now feasible in some contexts, has substantial barriers to entry and use.
In this paper, we explore an alternative, more lightweight approach to building confidence for a production hypervisor: the pKVM hypervisor developed by Google to protect virtual machines and the Android kernel from each other. The basic approach is very simple and dates back to the 1970s: we specify the desired behaviour in a way that can be used as a test oracle, and check correspondence between that and the implementation at runtime. The setting makes that challenging in several ways: the implementation and specification are intertwined with the underlying architecture; the hypervisor is highly concurrent; the specification has to be loose in certain ways; the hypervisor runs bare-metal in a privileged exception level; naive random testing would quickly crash the whole system; and the hypervisor is written in C using conventional methods. We show how all of these can be overcome to make a practically useful specification, finding a number of critical bugs in pKVM along the way.
This is not at all what conventional developers (nor what formal verifiers) normally do -- but we argue that, with the appropriate mindset, they easily could and should.
: Morello-Cerise: A Proof of Strong Encapsulation for the Arm Morello Capability Hardware Architecture. Angus Hammond, Ricardo Almeida, Thomas Bauereiss, Brian Campbell, Ian Stark, and Peter Sewell. In PLDI 2025. [ bib | doi | pdf ]
When designing new architectural security mechanisms, a key question is whether they actually provide the intended security, but this has historically been very hard to assess. One cannot gain much confidence by testing, as such mechanisms should provide protection in the presence of arbitrary unknown code. Previously, one also could not gain confidence by mechanised proof, as the scale of production instruction-set architecture (ISA) designs, many tens or hundreds of thousands of lines of specification, made that prohibitive.
We focus in this paper especially on the secure encapsulation of software components, as supported by CHERI architectures in general and by the Arm Morello prototype architecture and hardware design in particular. Secure encapsulation is an essential security mechanism, for fault isolation and to constrain untrusted third-party code. It has previously often been implemented using virtual memory, but that does not scale to large numbers of compartments. Morello provides capability-based mechanisms that do scale, within a single address space.
We prove a strong secure encapsulation property for an example of encapsulated code running on Morello, that holds in the presence of arbitrary untrusted code, above a full-scale sequential model of the Morello ISA. To do so, we build on, extend, and unify three orthogonal lines of previous work: the Cerise proof of such an encapsulation property for a highly idealised capability machine, expressed using a logical relation in Iris; the Islaris approach for reasoning about known code in production-scale ISAs; and the T-CHERI security properties of arbitrary Morello code, previously proved only for executions up to domain crossing.
This demonstrates how one can prove such strong properties of security mechanisms for full-scale industry architectures.
: Precise exceptions in relaxed architectures. Ben Simner, Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Ohad Kammar, Jean Pichon-Pharabod, and Peter Sewell. In ISCA 2025, Best paper award. [ bib | doi | errata | pdf ]
To manage exceptions, software relies on a key architectural guarantee, precision: that exceptions appear to execute between instructions. However, this definition, dating back over 60 years, fundamentally assumes a sequential programmers model. Modern architectures such as Arm-A with programmer-observable relaxed behaviour make such a naive definition inadequate, and it is unclear exactly what guarantees programmers have on exception entry and exit.
In this paper, we clarify the concepts needed to discuss exceptions in the relaxed-memory setting -- a key aspect of precisely specifying the architectural interface between hardware and software. We explore the basic relaxed behaviour across exception boundaries, and the semantics of external aborts, using Arm-A as a representative modern architecture. We identify an important problem, present yet unexplored for decades: pinning down what it means for exceptions to be precise in a relaxed setting. We describe key phenomena that any definition should account for. We develop an axiomatic model for Arm-A precise exceptions, tooling for axiomatic model execution, and a library of tests. Finally we explore the relaxed semantics of software-generated interrupts, as used in sophisticated programming patterns, and sketch how they too could be modelled.
: ISO/IEC TS 6010:2025 Programming languages --- C --- A provenance-aware memory object model for C, ISO/IEC JTC 1/SC 22. ISO, May 2025. Based on N3005: A Provenance-aware Memory Object Model for C. Working Draft Technical Specification ISO/IEC TS 6010:2023 (E). Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Martin Uecker. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n3005.pdf. June 2022. [ bib | .html ]
This document specifies the form and establishes the interpretation of programs written in the C programming language. It is not a complete specification of that language but builds upon ISO/IEC 9899:2018 by constraining and clarifying the Memory Object Model.
The resolution of DR260 confirmed the concept of provenance of pointers, introduced as means to track and distinguish pointer values that represent storage instances with the same address. Implementations started to use that concept in optimisations relying on provenance-based alias analysis, without it ever being clearly or formally defined, and without it being integrated consistently with the rest of the C standard. This document provides a solution for this: a provenance-aware memory object model for C to put C programmers and implementers on a solid footing in this regard.
In addition to this document, https://cerberus.cl.cam.ac.uk/cerberus provides an executable version of the semantics, with a web interface that allows one to explore and visualise the behaviour of small test programs.
This document does not address subobject provenance.
: Arm systems semantics. Ben Simner. PhD thesis, University of Cambridge, April 2025. [ bib | doi | errata | pdf | http ]
Computing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and `user-mode' concurrency, clarifying architectures and bringing them into the scope of programming-language semantics and verification.
However, the system semantics, of address translation and TLB maintenance, instruction-fetch and its required cache maintenance, and exceptions and interrupts, remains mostly obscure, leaving us without a solid foundation for verification of security-critical systems software.
We develop precise mathematical models, for those aspects of the Arm A-class architecture. We implement these models as executable models, in both microarchitectural-flavoured operational and declarative axiomatic style formats. We validate these models, against currently available hardware through the production and evaluation of hardware test harnesses and test suites, and against the architectural intent through discussions with Arm architects. We give a variety of hand-written and machine-generated litmus tests, exercising parts of the architecture previously unexplored.
We discuss the nature of developing such models, and the challenges that writing specifications of existing systems entails. We briefly touch on how these models have evolved over time, and how we imagine they will evolve in the future as the remaining questions are resolved.
: Comprehensive Formal Verification of Observational Correctness for the CHERIoT-Ibex Processor, Louis-Emile Ploix, Alasdair Armstrong, Tom Melham, Ray Lin, Haolong Wang, and Anastasia Courtney, February 2025. [ bib | arXiv | http ]
The CHERI architecture equips conventional RISC ISAs with significant architectural extensions that provide a hardware-enforced mechanism for memory protection and software compartmentalisation. Architectural capabilities replace conventional integer pointers with memory addresses bound to permissions constraining their use. We present the first comprehensive formal verification of a capability extended RISC-V processor with internally `compressed' capabilities -- a concise encoding of capabilities with some resemblance to floating point number representations. The reference model for RTL correctness is a minor variant of the full and definitive ISA description written in the Sail ISA specification language. This is made accessible to formal verification tools by a prototype flow for translation of Sail into SystemVerilog. Our verification demonstrates a methodology for establishing that the processor always produces a stream of interactions with memory that is identical to that specified in Sail, when started in the same initial state. We additionally establish liveness. This abstract, microarchitecture-independent observational correctness property provides a comprehensive and clear assurance of functional correctness for the CHERIoT-Ibex processor's observable interactions with memory.
: Towards general white-box automation: a typeclass-guided context cleaner. Thibaut Pérami. In CoqPL 2025. [ bib | http ]
: A CHERI C Memory Model for Verified Temporal Safety. Vadim Zaliva, Kayvan Memarian, Brian Campbell, Ricardo Almeida, Nathaniel Wesley Filardo, Ian Stark, and Peter Sewell. In CPP 2025. [ bib | doi | http ]
: It Is Time to Standardize Principles and Practices for Software Memory Safety. Robert N.M. Watson, John Baldwin, David Chisnall, Tony Chen, Jessica Clarke, Brooks Davis, Nathaniel Filardo, Brett Gutstein, Graeme Jenkinson, Ben Laurie, Alfredo Mazzinghi, Simon Moore, Peter G. Neumann, Hamed Okhravi, Alex Richardson, Alex Rebert, Peter Sewell, Laurence Tratt, Murali Vijayaraghavan, Hugo Vincent, and Konrad Witaszczyk. Communications of the ACM, 68(2):40–45, January 2025. [ bib | doi | http ]
Twenty-one authors, spanning academia and industry, with expertise in memory-safety research, deployment, and policy, argue that standardization is an essential next step to achieving universal strong memory safety.
: Fulminate: Testing CN Separation-Logic Specifications in C. Rini Banerjee, Kayvan Memarian, Dhruv Makwana, Christopher Pulte, Neel Krishnaswami, and Peter Sewell. In POPL 2025. [ bib | doi | pdf ]
Separation logic has become an important tool for formally capturing and reasoning about the ownership patterns of imperative programs, originally for paper proof, and now the foundation for industrial static analyses and multiple proof tools. However, there has been very little work on program testing of separation-logic specifications in concrete execution. At first sight, separation-logic formulas are hard to evaluate in reasonable time, with their implicit quantification over heap splittings, and other explicit existentials.
In this paper we observe that a restricted fragment of separation logic, adopted in the CN proof tool to enable predictable proof automation, also has a natural and readable computational interpretation, that makes it practically usable in runtime testing. We discuss various design issues and develop this as a C+CN source to C source translation, Fulminate. This adds checks -- including ownership checks and ownership transfer -- for C code annotated with CN pre- and post-conditions; we demonstrate this on nontrivial examples, including the allocator from a production hypervisor. We formalise our runtime ownership testing scheme, showing (and proving) how its reified ghost state correctly captures ownership passing, in a semantics for a small C-like language.