Multiprocessors are now pervasive and concurrent programming is becoming mainstream, but typical multiprocessors (x86, Sparc, Power, ARM, Itanium) and programming languages (C, C++, Java) do not provide the sequentially consistent shared memory that has been assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, exposing behaviour that arises from hardware and compiler optimisations to the programmer. Moreover, these memory models have usually described only in ambiguous (and sometimes flawed) prose, leading to widespread confusion. This page collects work by a group of people working to develop mathematically rigorous and usable semantics for multiprocessor programs. We are focussing on three processor architectures (x86, Power, and ARM), on the recent revisions of the C++ and C languages (C++11 and C11), and on reasoning and verification using these models.
University of Cambridge:
University of Kent:
University of York:
University of Leicester:
University College London:
Microsoft and MSR Cambridge:
University of Pennsylvania
For x86, we have produced the x86-TSO model. To the best of our knowledge, x86-TSO is sound with respect to actual processor behaviour, matches the current vendor intentions, and is a good model to program above. The model is described in the CACM paper below, written to be accessible for a broad audience, and formally defined in the TPHOLs 2009 paper below. The POPL 2009 paper gives a rather different model, x86-CC, that was based on the misleading vendor documentation of the time but that is not sound with respect to observable behaviour.
The low-level assembly-language implementations of the abstractions that support language-level concurrency, such as locks and concurrent datastructures, are particularly interesting and challenging to reason about because they invariably contain data races. The ECOOP 2010 paper below develops a novel principle for reasoning about assembly programs on our previous x86-TSO memory model, uses it to analyze concurrency abstraction implementations of two spinlocks (from Linux), a non-blocking write protocol, the double-checked locking idiom; and java.util.concurrent's Parker. This triangular race freedom principle strengthens the usual data-race freedom style of reasoning.
Power and ARM have broadly similar relaxed-memory behaviour, as described in the tutorial below. We have produced several formal models. The PLDI 2011 paper below gives an abstract-machine model for Power that, to the best of our knowledge, accurately captures the architectural intent and observable processor behaviour for a wide range of subtle examples. This model was principally validated against Power but we believe is broadly applicable also to ARM. It was revised and extended to support the Power load-reserve/store-conditional operations as described in Sections 2 and 3 of our PLDI 2012 paper below. The CAV 2012 paper gives an axiomatic model that is provably equivalent to the PLDI 2011 operational model. The TACAS 2011 tool paper describes our Litmus experimental testing tool for testing the behaviour of hardware; the associated diy tool suite supports generating and running litmus tests for Power, ARM, and x86. The CAV 2010 paper gives an earlier axiomatic model that is sound with respect to current POWER processor behaviour, as far as we know, but does not match the architectural intent for some examples (as described in the PLDI 2011 paper). It is based on a general framework developed in Alglave's thesis. The DAMP 2009 paper gives a very preliminary axiomatic model, based on a naive reading of the vendor documentation. See also more details of the CAV and DAMP work.
C and C++ are defined by standards, but those standards have historically not covered the behaviour of concurrent programs, motivating an ongoing effort to specify concurrent behaviour in the new revisions of C and C++, C11 and C++11 [WG14, WG21, Boehm and Adve, PLDI 2008]. The key issue here is the multiprocessor relaxed-memory behaviour induced by hardware and compiler optimisations. The proposals aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed-memory atomic primitives for high-performance concurrent code. However, the draft standards (e.g. the Final Committee Draft (N3092), are prose documents: while the result of careful deliberation, they have almost inevitably been unclear on some points, and have been subject to some subtle semantic flaws. The POPL 2011 paper below gives a mathematically rigorous semantics for C++ concurrency. To the best of our knowledge, this captures the intent of the revised standards texts, incorporating changes that we suggested; see the further details for the current full version of the model. We hope that this will aid discussion of any further changes to the standard, provide an unambiguous correctness condition for compilers, and give a basis for analysis and verification of concurrent C and C++ programs. As a starting point for the latter, the paper also proves correctness of a compilation scheme from C/C++11 concurrency to x86-TSO. The PPDP 2011 paper uses the Isabelle Nitpick tool to efficiently explore C/C++11 executions.
In these POPL 2012 and PLDI 2012 papers we prove correctness of the proposed compilation scheme from C/C++11 concurrency primitives to Power (again, the situation for ARM is similar). The first deals with the various forms of C/C++11 atomic accesses with relaxed memory orders. The second extends this to cover other synchronisation operations including C/C++11 read-modify-write operations, locks, and fences; it includes an operational model for the Power load-reserve/store-conditional operations which are used to implement the former.
CompCertTSO is a compiler that generates x86 assembly code from ClightTSO, a large subset of the C programming language enhanced with concurrency primitives for thread management and synchronisation, and with a TSO relaxed memory model. The development is based on the CompCert 1.5 compiler from sequential Clight to PowerPC and ARM (developed by Leroy et al., INRIA). The CompCertTSO compiler is written mostly within the specification language of the Coq proof assistant, and its correctness --- the fact that, for well-defined source programs, any behaviour of the generated code (with x86-TSO semantics) is permitted by the source-language semantics --- has been proved within Coq. The POPL 2011 paper and draft below describe some of this work, and the SAS 2011 paper below describes further work on verified fence elimination.
Current proposals for concurrent shared-memory languages, including C++ and C, give well-defined semantics (in fact, sequential consistency) only for programs without data races; this is known as the DRF guarantee. The correctness of compiler optimisations under the DRF guarantee is not completely clear, and experience with Java shows that this area is error-prone. The PLDI 2011 paper below gives a rigorous study of optimisations that involve both reordering and elimination of memory reads and writes, covering many practically important optimisations. It also discusses some surprising limitations of the DRF guarantee. Earlier work by Ševčík and Aspinall, in ECOOP 2008, showed that the Java Memory Model disallows some standard optimisations, including some performed by the Hotspot JVM. See also more details.
The paper below develops a program logic for x86-TSO assembly programs, extending rely-guarantee reasoning. It is formalised in HOL4 and illustrated with Simpson's 4-slot algorithm.
We acknowledge funding from: