Susmit Sarkar - ss265@st-andrews.ac.uk
Ali Sezgin - ali.sezgin@cl.cam.ac.uk

We begin by recalling the previous result stating that the compilation
scheme as given in [POPL'12] is correct.

"Theorem" [POPL'12, PLDI'12]:
Given a C11 program C which is free of data races,
unsequenced races, and unitialised races [etc], for a threadwise good
compiler following the mapping, the compiled Power program P has no
more behaviour (in the old Power model) than is allowed under the
(old) C11 model.

This is now (between submission and publication of the POPL 2017
paper) known to be incorrect (see [47, 48, 49] from the POPL 2017
paper), but the following argument, that built on top of it, should
not be affected by any reasonable fix.


The POPL'12/PLDI'12 result was limited to the case where all accesses
were of a single size.

In going from single-sized accesses to mixed-size accesses, there are
(conceptually) two kinds of new features: first, not all accesses are
of the same size, and second, two accesses may have overlapping but
not identical footprints. Let us handle them step by step by defining
three classes of C11 programs:

1. All accesses are of the same size (call this Unisize);

2. All accesses to the same location (atomic or non-atomic) are of the
same size as that location (call this Nonoverlapping);

3. Accesses to a non-atomic location may be differently sized (Full Mixed-Size).


In [POPL'12,PLDI'12], the correctness of compilation in the case
where all programs were Unisize was shown, i.e. Given a Unisized C11 program
Cuni which is free of data races, unsequenced races, and unitialised
races [etc], for a threadwise good compiler following the mapping, the
compiled Power program Puni has no more behaviour (in the unisized
Power model) than is allowed under the (old, unisize) C11 model.


By inspection of the proof structure, we can actually show:

Lemma 1: Correctness of compilation for Nonoverlapping:

Given a Nonoverlapping C11 program Cno which is free of data races,
unsequenced races, and unitialised races [etc], for a threadwise good
compiler following the mapping, the compiled Power program Pno has no
more behaviour (in the new Power model) than is allowed under the (new
mixed-size) C11 model.

Proof sketch:
This is basically straightforward, rechecking all the cases in the new
Power or C11 model as appropriate. It is a crucial simplification that
the new mixed-size C11 does not allow mixed-size atomic accesses.
QED

So now we need the same correctness theorem for Full Mixed-Size
programs.

Main Theorem:
Given a Mixed-Size C11 program Cms which is free of data races,
unsequenced races, and unitialised races [etc], for a threadwise good
compiler following the mapping, the compiled Power program Pms has no
more behaviour (in the new Power model) than is allowed under the (new
mixed-size) C11 model.

Proof:

For any such Cms, we can create a corresponding Nonoverlapping
program Cno by appropriately splitting each overlapping non-atomic
access (in extremis, this would be split into bytes, as mentioned in 
the main text).

Claim I: If Cms is free of data races, unsequenced races (etc) then so
is Cno.

See later for proof of Claim I.

Claim II: The Power compilation Pno of Cno following the scheme has no
more behaviour than Cno.

Proof of Claim II immediate by Lemma 1, using Claim I to satisfy the
premise.

Claim III: The behaviour (really, set of Power traces in new Power
model) of Pno always includes at least one element of a certain set of
traces called good traces of Pno.

Definition: "Good traces" are traces of the program Pno where all
transitions corresponding to the same (mixed-size C11) access are
together.

Proof of Claim III is by the observation that all the accesses created
in Cno by splitting a single access of Cms are program-order adjacent
to each other on the same thread. So it is always possible to do all
transitions corresponding to that set as a block. Note that not all
traces are good.

Claim IV: All behaviour (set of Power traces as above) of Pms (the
Power compilation of Cms) are observationally equivalent to good
traces of Pno. 

Proof of Claim IV by induction on the traces. At each step of the
induction, for each transition that a state can make, there is one or
more transitions that can be made such that the resulting state is
observationally equivalent to the corresponding resulting state. Thus
in fact we have a bisimulation.

Claim V: Any (C11) consistent execution of Cno corresponds to a (C11)
consistent execution of Cms.

Proof of Claim V is a pure C11 level proof, by checking all the
different clauses of the consistent-execution predicate piecewise.

Putting together Claims I-V, we get the required proof. Just to spell
that out: by Claim IV, for any (new Power) traces of Pms, we can get a
corresponding good new-Power trace of Pno, at least one of which
exists by Claim III. By Claim II, any such good trace can be related
to a consistent (new C11) execution of Cno, which by Claim V is also a
consistent new-C11 execution of Cms.

QED (with Claim I left to prove)


So, the remaining part of the proof is the one of Claim I.

It is obvious that a race in Cms implies a race in Cno, but the
reverse direction is the one we need.

The point is that Cno can have a consistent but racy execution which
cannot be related to a execution of Cms. Eg a byte of access A1 and a
byte of access A2 have a data race between them in one execution, but
other parts of A1 and A2 are related somehow by happens-before. We
have to show that in this case we can always find a different C11
execution in which all of A1 is unrelated to all of A2.

Proof of Claim I:

So consider a racy execution of Cno. There are various cases
(unsequenced, uninitialised, data races), but the argument is similar
in all. For the data race case, there are two accesses A1s and A2s, at
least one of which is non-atomic, at least one of which is a write,
they are to the same location, and they are happens-before
unrelated. For the accesses in Cms they came from (A1 and A2),
trivially at least one of *those* is non-atomic, at least one is a
write, and they are to overlapping locations. To show that we can make
A1 and A2 happens-before unrelated, we have to argue about what
happens if some other part of A1 and A2 (say A1h and A2h) are related
by happens-before in the racy execution of Cno. The argument is that
the happens-before cannot be "forced" in any way, since if A1h and A2h
are sequenced-before related, so will be A1s and A2s. Alternatively,
if A1h and A2h are on different threads, there must be a inter-thread
happens-before shape between them. Any such will also relate A1s and
A2s, again leading to a contradiction.

QED


References:
[POPL'12] - Clarifying and Compiling C/C++ Concurrency: from C++11 to
POWER, by Batty, Memarian, Owens, Sarkar, Sewell. POPL 2012.
[PLDI'12] - Synchronising C/C++ and POWER, by Sarkar, Memarian, Owens,
Batty, Sewell, Maranget, Alglave, Williams. PLDI 2012.