The Semantics of x86 Multiprocessor Programs
Multiprocessors are now dominant, but real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, usually described only in ambiguous (and sometimes flawed) prose, leading to widespread confusion.
We are working to develop mathematically rigorous and usable semantics for multiprocessor programs, focussing on three architectures: x86 (Intel 64/IA-32 and AMD64), Power, and ARM. Such models should provide a solid intuition for low-level programming, and a sound foundation for future work on verification, static analysis, and compilation of low-level concurrent code, and for the design of high-level concurrent programming languages.
This work is associated with a broader project on Relaxed-Memory Concurrency.
For x86, we have developed the x86-TSO model.
To the best of our knowledge, x86-TSO is sound with respect to actual processor
behaviour, matches the current vendor intentions, and is a good
model to program above. We give two
equivalent definitions of x86-TSO: an intuitive operational model
based on local write buffers, and an axiomatic total store ordering
model, similar to that of the SPARCv8. Both are adapted to handle
x86-specific features. We implemented the axiomatic model in our
The x86-TSO model was introduced and is formally defined in the TPHOLs paper below, together with its accompanying techreport and HOL4 definitions.
The paper below introduces a reasoning principle for programs running above
the x86-TSO model: a generalisation, triangular race freedom
of a data race freedom property. Of particular interest are the low-level implementations of the abstractions that
support language-level concurrency - especially because they invariably
contain data races. Triangular race freedom is used to analyze five concurrency abstraction implementations: two spinlocks (from
Linux); a non-blocking write protocol; the double-checked locking idiom;
The paper below develops a program logic for x86-TSO assembly programs, extending rely-guarantee reasoning. It is formalised in HOL4 and illustrated with Simpson's 4-slot algorithm.
Our earlier x86-CC model, presented at POPL 2009, was based on the then-current Intel and AMD documentation: the Intel 64 and IA-32 Architectures Software Developer's Manual [up to revision 28, Sept. 2008] and AMD64 Architecture Programmer's Manual [vol 2, rev. 3.14, Sept. 2007], and also with the earlier Intel 64 Architecture Memory Ordering White Paper [Aug. 2007]. We tested the semantics against actual processors and the vendor litmus-test examples, and gave an equivalent abstract-machine characterisation of our axiomatic memory model. We proved that the memory model is equivalent to a simpler (`nice execution') definition. For programs that are, in some precise sense, data-race free, we proved in HOL that their behaviour is sequentially consistent. However, that vendor documentation, and hence x86-CC, turned out to be misleading or unsound with respect to the behaviour of actual x86 processors: it forbids behaviour that actual processors exhibit, as detailed in the Addendum to the x86-CC paper below. More recent vendor documentation (revisions 29ff of the Intel SDM) fixes that unsoundness but gives surprisingly weak guarantees to programmers. Accordingly, while the paper may still be of some interest (and the instruction semantics and testing material is still relevant), for most purposes one should focus on the x86-TSO model above.
To reason about multiprocessor machine-code or assembly code programs,
the memory model must be integrated with a semantic definition of the
behaviour of machine instructions.
The HOL developments above for x86 (x86-CC) and for the Power and ARM
(DAMP'09) include such definitions for modest fragments of the
In the absence of fully verified processor implementations, we
establish confidence in these definitions by empirical testing,
comparing the behaviours permitted by the HOL definition against the
behaviours observed in practice running randomly generated
instructions on actual processors. Our
See also the work by Fox and Myreen in the same vein on testing Fox and Gordon's ARM semantics.
Computer Laboratory, University of Cambridge:
University of Leicester:
We acknowledge funding from the EPSRC (grants EP/H005633/1 and EP/F036345), the INRIA equipes associees MM, and ANR grant ANR-06-SETI-010-02.