Synchronising C/C++ and POWER

Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of Power, ARM, or C/C++, it remains surprisingly unclear exactly what the programmer can depend on.

This paper studies relaxed-memory synchronisation. On the hardware side, we give a clear semantic characterisation of the load-reserve/store-conditional primitives as provided by Power multiprocessors, for the first time since they were introduced 20 years ago; we cover their interaction with relaxed loads, stores, barriers, and dependencies. Our model, while not officially sanctioned by the vendor, is validated by extensive testing, comparing actual implementation behaviour against an oracle generated from the model, and by detailed discussion with a senior IBM Power designer/architect. We believe the ARM semantics to be similar.

On the software side, we prove sound a proposed compilation scheme of the C/C++ synchronisation constructs to Power, including C/C++ spinlock mutexes, fences, and read-modify-write operations, together with the simpler atomic operations for which soundness is already known; this is a first step in verifying concurrent algorithms that use load-reserve/store-conditional w.r.t. a realistic semantics. We also build confidence in the C/C++ model in its own terms, fixing some omissions and contributing to the C standards committee adoption of the C++11 concurrency model.

Papers

Synchronising C/C++ and POWER. Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, Derek Williams. In PLDI 2012.

Errata

See here.

Supplementary material

The models are expressed in Lem code, a lightweight language for executable mathematical definitions.

POWER Model (for more details of the model without load-reserve/store-conditional, see Understanding POWER Multiprocessors)

C++11 Models (for more details of the core model see Mathematizing C++ Concurrency, though this is revised)

atomic_deadlock.lem

Proofs

statement.lem The main correctness statement of compilation, Theorem 3
intermediate_actions.lem Defining a type of “intermediate actions”, to relate C++11 actions and POWER trace events
power_thread_lemmas.lem Key properties of the POWER thread semantics used extensively in the proofs below
build_sc_order.lem We can always build a linear SC order from the POWER trace with properties necessary for a C++11 SC order
power_to_consistent_ex.lem Analysis of a POWER trace with respect to derived C++11 relations
c_add_read.lem Lemmas about adding reads to a partial C++11 execution
positive.txt The positive direction
proof.lem The top level of the proof, using the preceding results
locks.lem The proof of Theorem 2
mapping.lem The proof of Theorem 1
cppproof.lem Equivalence proof for C++ model with simplified lock order relation

Examples mentioned in the paper

TODO: collect links for those examples

...linux add-return example
C++11 source and Power version of store-conditional forwarding example
- rmw-dob.c
links to test pages: MP+lwsync+porr MP+poaa+addr MP+eieioaa+addr 2+2W+lwsync+rfi-data MP+eieio+addr WRC+eieio+addr
C++11 sources for the 7 examples of Section 4 mp-mtx.c rmw-mo.c mp-acq-fence.c mp-rel-fence.c sb-sc.c sb-1fence-sc.c sb-2fence-sc.c
SB+locks

Experimental results We compare our model of the Power architecture and three generations of Power machines: Testing load-reserve and store conditional Testing the eieio instruction People University of Cambridge: Mark Batty Kayvan Memarian Scott Owens Susmit Sarkar Peter Sewell Oxford University: Jade Alglave INRIA Paris-Rocquencourt: Luc Maranget IBM: Derek Williams [validate]