Susmit Sarkar - ss265@st-andrews.ac.uk Ali Sezgin - ali.sezgin@cl.cam.ac.uk We begin by recalling the previous result stating that the compilation scheme as given in [POPL'12] is correct. "Theorem" [POPL'12, PLDI'12]: Given a C11 program C which is free of data races, unsequenced races, and unitialised races [etc], for a threadwise good compiler following the mapping, the compiled Power program P has no more behaviour (in the old Power model) than is allowed under the (old) C11 model. This is now (between submission and publication of the POPL 2017 paper) known to be incorrect (see [47, 48, 49] from the POPL 2017 paper), but the following argument, that built on top of it, should not be affected by any reasonable fix. The POPL'12/PLDI'12 result was limited to the case where all accesses were of a single size. In going from single-sized accesses to mixed-size accesses, there are (conceptually) two kinds of new features: first, not all accesses are of the same size, and second, two accesses may have overlapping but not identical footprints. Let us handle them step by step by defining three classes of C11 programs: 1. All accesses are of the same size (call this Unisize); 2. All accesses to the same location (atomic or non-atomic) are of the same size as that location (call this Nonoverlapping); 3. Accesses to a non-atomic location may be differently sized (Full Mixed-Size). In [POPL'12,PLDI'12], the correctness of compilation in the case where all programs were Unisize was shown, i.e. Given a Unisized C11 program Cuni which is free of data races, unsequenced races, and unitialised races [etc], for a threadwise good compiler following the mapping, the compiled Power program Puni has no more behaviour (in the unisized Power model) than is allowed under the (old, unisize) C11 model. By inspection of the proof structure, we can actually show: Lemma 1: Correctness of compilation for Nonoverlapping: Given a Nonoverlapping C11 program Cno which is free of data races, unsequenced races, and unitialised races [etc], for a threadwise good compiler following the mapping, the compiled Power program Pno has no more behaviour (in the new Power model) than is allowed under the (new mixed-size) C11 model. Proof sketch: This is basically straightforward, rechecking all the cases in the new Power or C11 model as appropriate. It is a crucial simplification that the new mixed-size C11 does not allow mixed-size atomic accesses. QED So now we need the same correctness theorem for Full Mixed-Size programs. Main Theorem: Given a Mixed-Size C11 program Cms which is free of data races, unsequenced races, and unitialised races [etc], for a threadwise good compiler following the mapping, the compiled Power program Pms has no more behaviour (in the new Power model) than is allowed under the (new mixed-size) C11 model. Proof: For any such Cms, we can create a corresponding Nonoverlapping program Cno by appropriately splitting each overlapping non-atomic access (in extremis, this would be split into bytes, as mentioned in the main text). Claim I: If Cms is free of data races, unsequenced races (etc) then so is Cno. See later for proof of Claim I. Claim II: The Power compilation Pno of Cno following the scheme has no more behaviour than Cno. Proof of Claim II immediate by Lemma 1, using Claim I to satisfy the premise. Claim III: The behaviour (really, set of Power traces in new Power model) of Pno always includes at least one element of a certain set of traces called good traces of Pno. Definition: "Good traces" are traces of the program Pno where all transitions corresponding to the same (mixed-size C11) access are together. Proof of Claim III is by the observation that all the accesses created in Cno by splitting a single access of Cms are program-order adjacent to each other on the same thread. So it is always possible to do all transitions corresponding to that set as a block. Note that not all traces are good. Claim IV: All behaviour (set of Power traces as above) of Pms (the Power compilation of Cms) are observationally equivalent to good traces of Pno. Proof of Claim IV by induction on the traces. At each step of the induction, for each transition that a state can make, there is one or more transitions that can be made such that the resulting state is observationally equivalent to the corresponding resulting state. Thus in fact we have a bisimulation. Claim V: Any (C11) consistent execution of Cno corresponds to a (C11) consistent execution of Cms. Proof of Claim V is a pure C11 level proof, by checking all the different clauses of the consistent-execution predicate piecewise. Putting together Claims I-V, we get the required proof. Just to spell that out: by Claim IV, for any (new Power) traces of Pms, we can get a corresponding good new-Power trace of Pno, at least one of which exists by Claim III. By Claim II, any such good trace can be related to a consistent (new C11) execution of Cno, which by Claim V is also a consistent new-C11 execution of Cms. QED (with Claim I left to prove) So, the remaining part of the proof is the one of Claim I. It is obvious that a race in Cms implies a race in Cno, but the reverse direction is the one we need. The point is that Cno can have a consistent but racy execution which cannot be related to a execution of Cms. Eg a byte of access A1 and a byte of access A2 have a data race between them in one execution, but other parts of A1 and A2 are related somehow by happens-before. We have to show that in this case we can always find a different C11 execution in which all of A1 is unrelated to all of A2. Proof of Claim I: So consider a racy execution of Cno. There are various cases (unsequenced, uninitialised, data races), but the argument is similar in all. For the data race case, there are two accesses A1s and A2s, at least one of which is non-atomic, at least one of which is a write, they are to the same location, and they are happens-before unrelated. For the accesses in Cms they came from (A1 and A2), trivially at least one of *those* is non-atomic, at least one is a write, and they are to overlapping locations. To show that we can make A1 and A2 happens-before unrelated, we have to argue about what happens if some other part of A1 and A2 (say A1h and A2h) are related by happens-before in the racy execution of Cno. The argument is that the happens-before cannot be "forced" in any way, since if A1h and A2h are sequenced-before related, so will be A1s and A2s. Alternatively, if A1h and A2h are on different threads, there must be a inter-thread happens-before shape between them. Any such will also relate A1s and A2s, again leading to a contradiction. QED References: [POPL'12] - Clarifying and Compiling C/C++ Concurrency: from C++11 to POWER, by Batty, Memarian, Owens, Sarkar, Sewell. POPL 2012. [PLDI'12] - Synchronising C/C++ and POWER, by Sarkar, Memarian, Owens, Batty, Sewell, Maranget, Alglave, Williams. PLDI 2012.