We extend the proof for load and store operations by adding the C++
locks, RMWs and fences. We introduce a new POWER trace relation to
track the presence of RMWs. These additions change the definition of
happensbefore and therefore affect its caracterisation. The proof
goes by showing that despite these modifications of machine_ithb,
propBefore_lemma and its corollaries still hold.
################################## New labels ##################################
We now have loadreserves and storeconditionals in POWER. This introduces two
new kinds of labels: "loadreserve commit labels" and "successful
storeconditional commit labels". From now on, when referring to a "commit
label" we include these; when referring to a "write commit label" we
include successful storeconditional commits, and similarly for loads and
loadreserves.
################################# New relation #################################
[rmw]
(R*,W*) IN rmw iff
1) R* is a loadreserve commit label
2) W* is a successful storeconditional commit label
3) (R*,W*) IN po
4) there is no successful commit label of a store conditional E, such that
(R*,E) IN po and (E,W*) IN po
######################## Assumptions about the compiler ########################
(1) The only POWER loads/stores writing to a lock location are the ones
implementing locks and unlocks.
(2) The only stores writing the value stating that a lock is free are the ones
implementing unlocks.
(3) The only stores writing the value stating that a lock is taken are the
storeconditionals implementing locks.
(4) At the beginning of any execution, all lock locations are initialised with
the value stating that a lock is free.
################################## Some lemmas #################################
LEMMA rf_rmw_co:
Consider a POWER trace with W a write commit label, R* a loadreserve commit
label and W* a storeconditional commit label such that:
(R*,W*) IN rmw and (W, R*) IN rf
then
(W,W*) IN co.
PROOF
By the action 4a of the "Accept a successful writeconditional request"
transition.
QED
LEMMA rmw_to:
Consier a POWER trace with R* a loadreserve commit label and W* a
successful storeconditional commit label such that:
(R*,W*) IN rmw
then
R* ==to==> W*
PROOF
Since W* succeed, the programorder previous loadreserve must be at the same
location. By the definition of the rmw relation, R* is the programorder
previous loadreserve. We conclude by the 4th precondition of the "Commit
inflight instruction" transition.
QED
################################################################################
################################################################################
Now, for each additions we show the new characterisation of ithb, show that
propBefore_lemma and its corollaries hold, and verify the new properties
specified by the C++ memory model.
1 LOCKS ########################################################################
Let the register r3 contain a lock location, register r4 contain the value
stating that a lock is free and register r5 contain the value stating that a
lock is taken. Lock and unlock actions at the location stored in r3 are
implemented as follows:
 Lock
loop:
lwarx r6,0,r3,1
cmpw r4,r6
bne wait
stwcx. r5,0,r3
bne loop
isync
 Unlock
lwsync
stw r4,0(r3)

When reconstructing a C++ execution from a POWER trace, a total "lockorder"
must be built for each lock location. We take the restriction of the
coherenceorder to the storeconditionals of locks and the stores of unlock
and linearize it. We write (lo) the disjunct union of these orders. We now have
to show some properties from [consistent_locks]:

** Consider L, L' two successful lock actions such that (L,L') IN lo, then
there exists an unlock action U such that (L,U) IN lo and (U,L') IN lo:
By translating that statement to the POWER level, we have:
(R1*, W1*) IN rmw and (R2*, W2*) IN rmw (W1*,W2*) in co
where R1*, W1* and R2*, W2* are respectively the loadreserve and
successful storeconditional commit labels corresponding to L and L'.
In the implementation of a lock, the loadreserve is followed by a test
making sure that the value read states that the lock is free. If this is not
the case, the program wait and the loadreserve is reexecuted later. Here
we know that the storeconditional W2* is committed, hence at some point the
loadreserve read the value stating that the lock is free. The commit
corresponding to that "last" load is R2*, this is direct by the definition
of rmw which makes sure that R2* is the last loadreserve commit label
before W2*.
But also the implementation of locks, W1* stores the value stating that
a lock is taken. Hence R2* cannot read from W1*. Let's name W the store read
by R2*: (W,R2*) IN rf
Note that compilation mapping makes sure that the stores writing at a lock
location the value stating that a lock is free, are the stores implementing
an unlock, hence W correspond to an unlock.
It remains to show that (W1*,W) IN co and (W,W2*) IN co.
The first is exactly "rf_rmw_co_co_1", and the second is direct by lemma
"rf_rmw_co".

** lo is a strict total order of mutex actions:
By the "threadwise_good_theorem" hypothesis and the assumption that any
execution of the original C++ program respects the "good_mutex_use"
predicate, in the POWER trace any store implementing an unlock must be
programorder after a storeconditional implementing a lock. Since memory
accesses on a same thread and at a same location must be committed in order,
we have that at any lock location the coherenceorder starts with a
storeconditional implementing a lock. Then from the previous point, we know
that in the lockorder a lock must follow an unlock.
We translate the reasonning to the POWER level. The reflexivity and
transitivity directly comes from the coherenceorder. It is clear the
reduction we take do not relate nonmutex actions.
When a store W reaches its coherence point, it becomes coherencebefore any
store at the same address seen by the storagesubsystem but that has not yet
reached its coherence point. Also, when a new store to the same address is
committed, it directly becomes coherenceafter W. Hence, for a given
location, by the time the last store at the location seen by the
storagesubsystem reaches its coherence point, the coherence relation
(restricted to the stores at that given location) is total. By the 3rd
action of the "Accept a successful writeconditional request", the
storeconditionals implementing locks reaches their coherence point during
their commit transition. Hence, if for each lock location, the coherence
relation ends with a storeconditional, the derived C++ lockorder is
directy total over mutex actions. However the coherence may end with normal
stores implementing unlocks. For this reason, we need to linearize the
coherence relation when building the C++ lockorder. However note that
during the linearization, we do not add any coherence edge going to a store
that has reached its coherence point, hence no edge going to a
storeconditional (implementing a lock) is added. Nor to a coherence
predecessor of a storeconditional.

** the restriction to mutexes of the happensbefore is a subset of lockorder:
Consider in the reconstructed C++ execution, (E1, E2) IN hb, such that E1
and E2 are mutex actions.
There are two ways for the hbedge to exists. Either from (E1, E2) IN sb or
from (E1, E2) IN lo with E1 an unlock and E2 a successful lock. In the
second case we are done. In the first case, we have three subcases:
1) E1 and E2 are both unlocks:
we know that lo is total over mutex. If we translate the situation to
the POWER level, then we have W1 and W2 the stores respectively
corresponding to E1 and E2, such that (W1, W2) IN po. When W2 commit,
the storagesubsystem adds a coherence edge from W1 to W2. Hence, in
C++ we must have (E1,E2) IN po.
2) E1 is a successful lock and E2 an unlock:
the argument is the same as for 1).
3) E1 and E2 are both successful locks
the argument is the same as for 1).

* The addition of locks, introduces a new way of producing an swedge:
U lo> L
where U is an unlock, L a lock.
At the POWER level, following the discussion of the first point (**) regarding
the values writen and read by the instructions implementing locks and unlocks,
a "loedge" must be a sequence of synchronisation between unlocks and locks
(aternating). By the good_mutex_use predicate satisfied by the original C++
program, any unlock must follow in sequencebefore by a successful lock at
the same location. Hence, by the hypothesis of "good compiler", any store
implementing an unlock must be programorder after a storeconditional
implementing a lock at the same location. In term of the POWER trace
relations using the protection placed before an unlock and after a lock, this
corresponds to a chain of: lwsync; rf;ctrlisync. It is clear that this
included in machine_ithb. Hence the addition of locks doesn't change the
characterization or break the propBefore_lemma.
2 RMWs #########################################################################
* NOTE 1: RMWs do satisfy the "is_read" and "is_write" predicates, but since the
mapping places the same protections as for the corresponding atomic
loads and stores this does not affect the definition of machine_ithb.
* The addition of RMWs however changes the shape of releasesequences, and
therefore affect the corresponding part of machine_ithb. Previously the
relation was:
( Xsync^refl; coi^refl; rfe; (ctrlisync + dd)^refl )^+
the modified version is:
( Xsync^refl; coi^refl; rfe; (rmw; rf)^*; (ctrlisync + dd)^refl )^+
Justification of the modification 
RMWs can appear inside of a release sequence, that is a RMW can be followed by
a normal write in a release sequence. But that following write (W') must be
from the thread of the release head (W), hence the corresponding commit labels
are such that: (W, W') IN coi.
RMWs can also be on the same thread as the release head, but in that case its
storeconditional is coirelated to the store corresponding to the release
head.
Hence we can forget about these two kinds of RMWs.
Either there is no other kind of RMWs in the release sequence, and the "rfe"
points to the acquire read, or there is a RMWs chain ending the
releasesequence without coming back the thread of release head. In that case,
the "rfe" points to the loadreserve of the first RMW of the chain and
(rmw; rf)^* is not empty. The "rmw" related the loadreserve and
storeconditional pair of a single RMW and the "rf" relates a
storeconditional to the loadreserve of the next RMW in the chain. The final
"rf" points to the acquire read. Note that the last RMW can be on the thread
of the acquire read.

* [propBefore_lemma] still holds
LEMMA 1:
In any POWER trace, the relation rf; (rmw; rf)*; rmw is a subset of co.
PROOF
The proof is by induction on the (.)*.
* base case:
rf
W1 > R2*

 rmw
v
W3*

This is exactly the result of "rf_rmw_co".
* inductive step:
rf
W1 > R2*

rmw 
v rf
W3* > R4*
 (rmw; rf)*
 .> R5*
_/ 
 rmw
v
W6*

By the induction hypothesis we have:
[1] (W3*, W6*) IN co
By "rf_rmw_co" we have that (W1, W3*) IN co, we then conclude by
transitivity of the coherence relation.
QED
LEMMA [xsync_groupA_2]
Consider a POWER trace with E1 and E2, two memory commit labels such that:
(E1,E2) IN xsync
and W1 a write commit labels W1 and W2* a storeconditional commit label
such that:
(a) W1 = E1 or (W1,E1) IN propBefore
(b) (E2,W2*) IN po; rfe; (rmw; rf)*; rmw
Then for any thread t,
if the label W2*_prop_t exists, then:
there exits W a write commit label either equal to W1 or (W1, W) IN co,
such that the label W_prop_t exists and W_prop_t ==to==> W2*_prop_t.
PROOF
The proof is by structural induction over (rmw; rf)^*:
* base case:
W1 .
 propBefore + =
. E1 <'
xsync 
'> E2

po 
v rfe
W3 > R4*

 rmw
v
W2*

By the definition of the Xsync, there exists a barrier commit label S from
the same thread as E1 and E2 such that :
E1 ==to==> S ==to==> E2
From the definition of the propBefore relation, "lastSat_to" and the
transitivity od the trace order, it is easy to see that W1 is in the
group A of S.
Since S and E2 are on the same thread they are programorder related, and
using (b) and by transitivity of the programorder relation, we have:
(S, W2) IN po
We must have S ==to==> W2 by the CommitPrevBarrierLS precondition of
Commit inflight instruction.
By "rf_propBefore" on (W3,R4*), we have:
(W3,R4*) IN propBefore
Hence, by the 3rd precondition of the transition "Propagated write to
another thread", we have that the barrier S propagates to the thread of
R4* and W2*, before W3 does.
Hence, by the 2nd precondition of the "Propagated a barrier to another
thread" transition, we have that there exists some write commit label W,
either equal to W1 or (W1, W) IN co, such that W propagates to the thread
of R4* and W2*, before S does.
By "rmw_to", we have:
R4* ==to==> W2*
We conclude by using the 3rd precondition of the "Propagated write to
another thread" and the 2nd precondition of the "Propagated a barrier to
another thread" transitions regarding a possible propagation of W2* to
some thread t.
* inductive step:
W1 .
 propBefore + =
. E1 <'
xsync 
'> E2

po 
v rfe
W3 > R4*
 (rmw; rf)*
 .> R5*
_/ 
rmw 
v rf
W6* > R7

 rmw
v
W2

By the definition of the rmw relation, we have:
(R5*, W6*) IN co
By "rf_propBefore" on (W6*,R7), we have:
(W6*,R7) IN propBefore
Hence using the induction hypothesis, we have:
there exits W a write commit label either equal to W1 or (W1, W) IN co,
such that the label W_prop_"thread of R7, W2" exists and
W_prop_"thread of R7, W2" ==to==> W6*_prop_"thread of R7, W2".
Using the proof of the base case and the Acumulativity of POWER, we know
that there is a barrier E1 and E2 that propagates to the thread of R7
and W2, before W6* does. Then we show similarly to the base case to
show the propagation of W1 or a coherence successor to the thread of R7* and
W2 before W6* does. We conclude using "rmw_to" and the transitivity of the
traceorder.
QED
LEMMA [base_propBefore']:
Consider a POWER trace with E a memory commit label, W1 and W2 two write
commit labels, R3* a loadreserve commit label and R a read commit label
such that:
 (E,W1) IN xsync + =
 (W1,W2) IN coi + =
 (W2, R3*) IN rfe
 (R3*,R) IN (rmw; rf)^*
Then:
* if E is a write, then (E, R) IN propBefore
* if E is a read, for any write commit label W', such that:
(W',E) IN propBefore, then (W',R) IN propBefore
PROOF
The proof is by case analysis on the structure of [.]^* relating R3* to R.
* reflexive case:
E .
 xsync + =
. W1 <'
coi + =  rfe
'> W2 > R

The is exactly the base case of the original "propBefore_lemma"
* step case
E .
 xsync + =
. W1 <'
coi + =  rfe
'> W2 > R3*
 (rmw; rf)*
 .> R4*
_/ 
rmw 
v rf
W5* > R

Using "rf_propBefore" on (W5*,R), we have:
[1] (W5*, R) IN propBefore
If E is a write, then either we have E=W1 or there (E,W1) IN xync. In the
first case, W2 is a coherence successor of E, and by "lemma 1" W5* is a
coherence successor of W2, hence by transivity W5* is a coherence
successor of E and we are done by [1]. In the second case, by
"xsync_groupA_2" and [1] we have:
there exists W a write commit label either equal to E or (E,W) IN co,
such that the label W_prop_"thread of R" exists and
W_prop_"thtrad of R" ==to==> W5*_prop_"thread of R".
The corresponds to the definition of the propBefore relation.
If E is a read, then we must have (E,W1) IN xsync. Using "xsync_groupA_2",
we have that:
for any W such that (W,E) IN propBefore, there exists W' a write commit
label either equal to W or (W,W') IN co, such that the label
W'_prop_"thread of R" exists and
W'_prop_"thread of R" ==to==> W5*_prop_"thread of R".
QED
The previous corresponds to the base case of propBefore_lemma (one step of the
message passing relation). To show the inductive step, we do exactly like in
the original propBefore_lemma.

* [rmw_atomicity] a RMW must read from its last modificationorder predecessor.
We translate the reasonning to the POWER level. Consider W a write commit
label and R*, W* two loadreserve and storeconditional commit labels,
such that:
(W,R*) IN rf (R*,W*) IN rmw
We first show that (W,W*) IN co. This is direct by lemma "rf_rmw_co".
Second we need to show that at the C++ there is nothing in the modification
order between the action corresponding to W and the one corresponding to W*.
However since modification order is built as a linearization of co, it is
not sufficient to show there is no write commit label between W and W* in
co. Instead it is sufficent to show the two following lemmas:
Consider W' a write commit level.
1) If (W',W*) IN co, then (W',W) IN co
PROOF
By the 2nd precondition of the "Accept a successful writeconditional
request" transition, we know that W reaches its coherence point before W*
is committed.
There are four possible ways for a coherenceedge to be produced:
a) by action 3 of "accept write request";
b) through a "partial coherence commitment";
c) by the action 2 of "write reaches its coherence point";
d) by the action 4a of "accept a successful writeconditional request".
Since W* is a storeconditional, the transitions from a) and b) can only
correspond to W' and they would procude a coherenceedge from W* to W',
which contradict our hypothesis.
The "partial coherence commitment" transition in c) can only perform once
W* is visible to the storagesubsystem. Hence after W* is committed, which
is also the transition during the which W* reaches its coherence point.
But "partial coherence commitment" cannot add an edge going to a write who
has already reached its coherence point.
Hence the coherenceedge from W' to W* must have been produced by the
action 4a of the "accept a successful writeconditional request" of W*,
and therefore W' must reach its coherence point before W* is committed.
As a result, it is clear the W' must be committed before W* is committed.
We now show that W' must be committed before W reaches its coherence point.
Assume the contrary. If W' is a normal store, then action 3 of the
corresponding "accept write request" transition adds a coherenceedge from
W to W'. But then since W* succeed, by the 3rd precondition of its "accept
a successful writeconditional request", W' cannot reach its coherence
point before W* is committed. This is a contradiction. If W' is a
storeconditional then the coherenceedge from W to W' is added by the
corresponding "accept a successful writeconditional request" transition,
and the same argument applies.
Now we can show that in fact W' must also reach its coherence point before
W does. Assume the contrary. If W is a normal store, then action 2 of its
"write reaches its coherence point" adds the same faulty coherenceedge as
in the previous paragraph. If W is storeconditional then already have
shown that it must be committed before W reaches its coherence point, and
a storeconditional reaches its coherence point at its commit transition.
Now either W is committed before W' reaches its coherence point, in which
case if W' is a normal store, then action 2 of its "write reaches its
coherence point" transition adds a coherenceedge from W' to W and we are
done. If W' is a storeconditional, the same edge is added by action 4b of
the corresponding "accept a successful writeconditional request"
transition.
Or W is committed after W' reaches its coherence point and then if W is a
normal store, the coherenceedge we are looking for is added either by
action 3 of the its "accept write request" transition, and if W is a
storeconditional, then the edge is added by action 4a of its "accept a
successful writeconditional request" transition.
QED
2) If (W,W') IN co, then (W*,W) IN co
PROOF
First we show that W is committed before W' reaches its coherence point:
Assume the contrary, either W is a normal store and then its "accept
write request" transition adds a coherenceedge from W' to W which
contradicts our hypothesis; or W is a storeconditional and then
action 4a of its "accept a successful writeconditional request"
transition adds the same bad coherenceedge.
We now show that in fact W reaches its coherence point before W' does:
Assume the contrary, either W' is a normal store and then the action 2
of its "write reaches its coherence point" transition adds the same bad
coherenceedge as in the previous paragraph; or W' is a
storeconditional and then the bad coherenceedge is added by action 4b
of its "accept a successful writeconditional request" transition.
This implies that W' is committed after W reaches its coherence point:
Consider the contrary, then the coherenceedge from W to W' is created
when W reaches its coherence point and we know that W reaches its
coherence point before W* is committed. But then we have a
coherencesuccessor of W reaching its coherence point before W* is
committed which contradict the fact that W* succeed.
This implies that W' is committed after W* is committed:
We just so that when W' is committed, W has already reached its
coherence point. Hence by action 3 of the "accept write request"
transition or by action 4a of the "accept a successful writeconditional
request", the coherenceedge from W to W' is added at the commit of W'.
If W' is committed before W*, then we have the same contradiction as in
the previous paragraph.
Now, either W' is normal store and then action 3 of its "accept write
request" gives us the coherenceedge we want; or W' is a storeconditional
and in that case the coherenceedge is given by action 4a of its "accept a
successful writeconditional request" transition.
QED
3 FENCES #######################################################################
* NOTE 1: Even so a fence do satisfy the "is_acquire" and "is_release"
predicates, the definition of "release_sequence" ensures that a fence
cannot start a release sequence (by the check on the location kind).
Also since a fence cannot be on an rfedge, no fence can play part in
the original rel/acq swedge.
* NOTE 2: Other then for the next point, a fence cannot appear in any of the
instances of "is_acquire" and "is_release" because of NOTE 1.
* The addition of fences, introduces three new ways of producing an swedge:
(1) (2)
a: F_rel a: F_rel
 
 sb  sb
v v
x: W x x: W x
 
 hrs  hrs
v rf v rf
z: W' x > y: R x z: W' x > b: R_acq x

 sb
v
b: F_acq
(3)
a: W_rel x

 rs
v rf
z: W' x > y: R x

 sb
v
b: F_acq
Using the POWER trace relations and the compilation mapping for fences, the
sequence of POWER relations corresponding to these swedges are:
(1) lwsync^refl; po; coi^*; rfe; po; lwsync^refl
(2) lwsync^refl; po; coi^*; rfe; lwsync^refl
(3) lwsync^refl; coi^*; rfe; po; lwsync^refl
By its definition, the lwsyncedges can absorb the poedges, hence the three new
swedges correspond to the following POWER trace relation:
lwsync^refl; coi^*; rfe; lwsync^refl
We therefore need to make another modification (add the ending lwsync instead of
a ctrlisync or a dd) to machineithb:
( Xsync^refl; coi^refl; rfe; (rmw; rf)^*; (ctrlisync + dd + lwsync)^refl )^+
Since the left part of the relation does not change, hence the validity of
propBefore_lemma is not affected. The semantics of the lwsync instruction being
stronger than that of ctrlisync, the two corollaries still hold.
* [sc_fences_heeded] "by construction of the scorder".