We extend the proof for load and store operations by adding the C++
locks, RMWs and fences. We introduce a new POWER trace relation to
track the presence of RMWs. These additions change the definition of
happens-before and therefore affect its caracterisation. The proof
goes by showing that despite these modifications of machine_ithb,
propBefore_lemma and its corollaries still hold.

################################## New labels ##################################
We now have load-reserves and store-conditionals in POWER. This introduces two
new kinds of labels: "load-reserve commit labels" and  "successful
store-conditional commit labels". From now on, when referring to a "commit
label" we include these; when referring to a "write commit label" we
include successful store-conditional commits, and similarly for loads and
load-reserves.
################################# New relation #################################
[rmw]
  (R*,W*) IN rmw iff
  
    1) R* is a load-reserve commit label
    2) W* is a successful store-conditional commit label
    3) (R*,W*) IN po 
    4) there is no successful commit label of a store conditional E, such that
         (R*,E) IN po  and  (E,W*) IN po
######################## Assumptions about the compiler ########################
  (1) The only POWER loads/stores writing to a lock location are the ones
      implementing locks and unlocks.
  
  (2) The only stores writing the value stating that a lock is free are the ones
      implementing unlocks.
  
  (3) The only stores writing the value stating that a lock is taken are the
      store-conditionals implementing locks.
  
  (4) At the beginning of any execution, all lock locations are initialised with
      the value stating that a lock is free.
################################## Some lemmas #################################
LEMMA rf_rmw_co:
  Consider a POWER trace with W a write commit label, R* a load-reserve commit
  label and W* a store-conditional commit label such that:
    (R*,W*) IN rmw  and  (W, R*) IN rf
  then
    (W,W*) IN co.
PROOF
  By the action 4a of the "Accept a successful write-conditional request"
  transition.
QED
LEMMA rmw_to:
  Consier a POWER trace with R* a load-reserve commit label and W* a
  successful store-conditional commit label such that:
    (R*,W*) IN rmw
  then
    R* ==to==> W*
PROOF
  Since W* succeed, the program-order previous load-reserve must be at the same
  location. By the definition of the rmw relation, R* is the program-order
  previous load-reserve. We conclude by the 4th precondition of the "Commit
  in-flight instruction" transition.
QED
################################################################################
################################################################################
Now, for each additions we show the new characterisation of ithb, show that
propBefore_lemma and its corollaries hold, and verify the new properties
specified by the C++ memory model.
1 LOCKS ########################################################################
  
  Let the register r3 contain a lock location, register r4 contain the value
  stating that a lock is free and register r5 contain the value stating that a
  lock is taken. Lock and unlock actions at the location stored in r3 are
  implemented as follows:
  ------------------------------------------------------------------------- Lock
  loop:
         lwarx r6,0,r3,1
         cmpw r4,r6
         bne- wait
         stwcx. r5,0,r3
         bne- loop
         isync
  ----------------------------------------------------------------------- Unlock
  lwsync
  stw r4,0(r3)
  ------------------------------------------------------------------------------
  
  When reconstructing a C++ execution from a POWER trace, a total "lock-order"
  must be built for each lock location. We take the restriction of the
  coherence-order to the store-conditionals of locks and the stores of unlock
  and linearize it. We write (lo) the disjunct union of these orders. We now have
  to show some properties from [consistent_locks]:
  
  ------------------------------------------------------------------------------
  ** Consider L, L' two successful lock actions such that (L,L') IN lo, then
     there exists an unlock action U such that (L,U) IN lo and (U,L') IN lo:
    
    By translating that statement to the POWER level, we have:
      (R1*, W1*) IN rmw  and  (R2*, W2*) IN rmw  (W1*,W2*) in co
    
    where R1*, W1* and R2*, W2* are respectively the load-reserve and
    successful store-conditional commit labels corresponding to L and L'.
    
    In the implementation of a lock, the load-reserve is followed by a test
    making sure that the value read states that the lock is free. If this is not
    the case, the program wait and the load-reserve is reexecuted later. Here
    we know that the store-conditional W2* is committed, hence at some point the
    load-reserve read the value stating that the lock is free. The commit
    corresponding to that "last" load is R2*, this is direct by the definition
    of rmw which makes sure that R2* is the last load-reserve commit label
    before W2*.
    But also the implementation of locks, W1* stores the value stating that
    a lock is taken. Hence R2* cannot read from W1*. Let's name W the store read
    by R2*: (W,R2*) IN rf
    Note that compilation mapping makes sure that the stores writing at a lock
    location the value stating that a lock is free, are the stores implementing
    an unlock, hence W correspond to an unlock.
    It remains to show that (W1*,W) IN co and (W,W2*) IN co.
    
    The first is exactly "rf_rmw_co_co_1", and the second is direct by lemma
    "rf_rmw_co".
    
  ------------------------------------------------------------------------------
  ** lo is a strict total order of mutex actions:
    
    By the "threadwise_good_theorem" hypothesis and the assumption that any
    execution of the original C++ program respects the "good_mutex_use"
    predicate, in the POWER trace any store implementing an unlock must be
    program-order after a store-conditional implementing a lock. Since memory
    accesses on a same thread and at a same location must be committed in order,
    we have that at any lock location the coherence-order starts with a
    store-conditional implementing a lock. Then from the previous point, we know
    that in the lock-order a lock must follow an unlock. 
    
    We translate the reasonning to the POWER level. The reflexivity and
    transitivity directly comes from the coherence-order. It is clear the
    reduction we take do not relate non-mutex actions.
    
    When a store W reaches its coherence point, it becomes coherence-before any
    store at the same address seen by the storage-subsystem but that has not yet
    reached its coherence point. Also, when a new store to the same address is
    committed, it directly becomes coherence-after W. Hence, for a given
    location, by the time the last store at the location seen by the
    storage-subsystem reaches its coherence point, the coherence relation
    (restricted to the stores at that given location) is total. By the 3rd
    action of the "Accept a successful write-conditional request", the
    store-conditionals implementing locks reaches their coherence point during
    their commit transition. Hence, if for each lock location, the coherence
    relation ends with a store-conditional, the derived C++ lock-order is
    directy total over mutex actions. However the coherence may end with normal
    stores implementing unlocks. For this reason, we need to linearize the
    coherence relation when building the C++ lock-order. However note that
    during the linearization, we do not add any coherence edge going to a store
    that has reached its coherence point, hence no edge going to a
    store-conditional (implementing a lock) is added. Nor to a coherence
    predecessor of a store-conditional.
    
  ------------------------------------------------------------------------------
  ** the restriction to mutexes of the happens-before is a subset of lock-order:
    
    Consider in the reconstructed C++ execution, (E1, E2) IN hb, such that E1
    and E2 are mutex actions.
    
    There are two ways for the hb-edge to exists. Either from (E1, E2) IN sb or
    from (E1, E2) IN lo with E1 an unlock and E2 a successful lock. In the
    second case we are done. In the first case, we have three subcases:
    
    1) E1 and E2 are both unlocks:
         we know that lo is total over mutex. If we translate the situation to
         the POWER level, then we have W1 and W2 the stores respectively
         corresponding to E1 and E2, such that (W1, W2) IN po. When W2 commit,
         the storage-subsystem adds a coherence edge from W1 to W2. Hence, in
         C++ we must have (E1,E2) IN po.
    
    2) E1 is a successful lock and E2 an unlock:
         the argument is the same as for 1).
    
    3) E1 and E2 are both successful locks
         the argument is the same as for 1).
    
  ------------------------------------------------------------------------------
* The addition of locks, introduces a new way of producing an sw-edge:
    U --lo--> L
  where U is an unlock, L a lock.
  
  At the POWER level, following the discussion of the first point (**) regarding
  the values writen and read by the instructions implementing locks and unlocks,
  a "lo-edge" must be a sequence of synchronisation between unlocks and locks
  (aternating). By the good_mutex_use predicate satisfied by the original C++
  program, any unlock must follow in sequence-before by a successful lock at
  the same location. Hence, by the hypothesis of "good compiler", any store
  implementing an unlock must be program-order after a store-conditional
  implementing a lock at the same location. In term of the POWER trace
  relations using the protection placed before an unlock and after a lock, this
  corresponds to a chain of: lwsync; rf;ctrlisync. It is clear that this
  included in machine_ithb. Hence the addition of locks doesn't change the
  characterization or break the propBefore_lemma.

2 RMWs #########################################################################
* NOTE 1: RMWs do satisfy the "is_read" and "is_write" predicates, but since the
          mapping places the same protections as for the corresponding atomic
          loads and stores this does not affect the definition of machine_ithb.
* The addition of RMWs however changes the shape of release-sequences, and
  therefore affect the corresponding part of machine_ithb. Previously the
  relation was:
    
    ( Xsync^refl; coi^refl; rfe; (ctrlisync + dd)^refl )^+
  
  the modified version is:
    
    ( Xsync^refl; coi^refl; rfe; (rmw; rf)^*; (ctrlisync + dd)^refl )^+
  
  Justification of the modification --------------------------------------------
  
  RMWs can appear inside of a release sequence, that is a RMW can be followed by
  a normal write in a release sequence. But that following write (W') must be
  from the thread of the release head (W), hence the corresponding commit labels
  are such that: (W, W') IN coi.
  RMWs can also be on the same thread as the release head, but in that case its
  store-conditional is coi-related to the store corresponding to the release
  head.
  Hence we can forget about these two kinds of RMWs.
  Either there is no other kind of RMWs in the release sequence, and the "rfe"
  points to the acquire read, or there is a RMWs chain ending the
  release-sequence without coming back the thread of release head. In that case,
  the "rfe" points to the load-reserve of the first RMW of the chain and
  (rmw; rf)^* is not empty. The "rmw" related the load-reserve and
  store-conditional pair of a single RMW and the "rf" relates a
  store-conditional to the load-reserve of the next RMW in the chain. The final
  "rf" points to the acquire read. Note that the last RMW can be on the thread
  of the acquire read.
  ------------------------------------------------------------------------------
* [propBefore_lemma] still holds
  
  LEMMA 1:
    In any POWER trace, the relation rf; (rmw; rf)*; rmw is a subset of co.
  PROOF
    The proof is by induction on the (.)*.
    * base case:
          rf
      W1 ---> R2*
               |
               | rmw
               v
              W3*
      --------------------------------------------------------------------------
      This is exactly the result of "rf_rmw_co".
      
    * inductive step:
          rf
      W1 ---> R2*
               |
           rmw |
               v    rf
               W3* ---> R4*
                         |    (rmw; rf)*
                         |  .----------> R5*
                         |_/              |
                                          | rmw
                                          v
                                         W6*
      --------------------------------------------------------------------------
      By the induction hypothesis we have:
        [1] (W3*, W6*) IN co
      
      By "rf_rmw_co" we have that (W1, W3*) IN co, we then conclude by
      transitivity of the coherence relation.
  QED
  
  
  LEMMA [xsync_groupA_2]
    Consider a POWER trace with E1 and E2, two memory commit labels such that:
      (E1,E2) IN xsync
    and W1 a write commit labels W1 and W2* a store-conditional commit label
    such that:
      (a) W1 = E1 or (W1,E1) IN propBefore
      (b) (E2,W2*) IN po; rfe; (rmw; rf)*; rmw
    
    Then for any thread t,
      if the label W2*_prop_t exists, then:
       there exits W a write commit label either equal to W1 or (W1, W) IN co,
       such that the label W_prop_t exists and W_prop_t ==to==> W2*_prop_t.
  PROOF
    The proof is by structural induction over (rmw; rf)^*:
    * base case:
                                       W1 ---.
                                             | propBefore + =
                                     .- E1 <-'
                               xsync |
                                     '-> E2
                                         |
                                      po |
                                         v    rfe
                                         W3 -----> R4*
                                                    |
                                                    | rmw
                                                    v
                                                    W2*
      --------------------------------------------------------------------------
      By the definition of the Xsync, there exists a barrier commit label S from
      the same thread as E1 and E2 such that :
        E1 ==to==> S ==to==> E2
      
      From the definition of the propBefore relation, "lastSat_to" and the
      transitivity od the trace order, it is easy to see that W1 is in the
      group A of S.
      
      Since S and E2 are on the same thread they are program-order related, and
      using (b) and by transitivity of the program-order relation, we have:
        (S, W2) IN po
      
      We must have S ==to==> W2 by the CommitPrevBarrierLS precondition of
      Commit in-flight instruction.
      
      By "rf_propBefore" on (W3,R4*), we have:
        (W3,R4*) IN propBefore
      
      Hence, by the 3rd precondition of the transition "Propagated write to
      another thread", we have that the barrier S propagates to the thread of
      R4* and W2*, before W3 does.
      
      Hence, by the 2nd precondition of the "Propagated a barrier to another
      thread" transition, we have that there exists some write commit label W,
      either equal to W1 or (W1, W) IN co, such that W propagates to the thread
      of R4* and W2*, before S does.
      
      By "rmw_to", we have:
        R4* ==to==> W2*
      
      We conclude by using the 3rd precondition of the "Propagated write to
      another thread" and the 2nd precondition of the "Propagated a barrier to
      another thread" transitions regarding a possible propagation of W2* to
      some thread t.
    
    * inductive step:
                          W1 ---.
                                | propBefore + =
                        .- E1 <-'
                  xsync |
                        '-> E2
                             |
                          po |
                             v  rfe
                            W3 -----> R4*
                                       |    (rmw; rf)*
                                       |  .------------> R5*
                                       |_/                |
                                                      rmw |
                                                          v   rf
                                                         W6* ---> R7
                                                                   |
                                                                   | rmw
                                                                   v
                                                                  W2
      --------------------------------------------------------------------------
      By the definition of the rmw relation, we have:
        (R5*, W6*) IN co
      
      By "rf_propBefore" on (W6*,R7), we have:
        (W6*,R7) IN propBefore
      
      Hence using the induction hypothesis, we have:
        there exits W a write commit label either equal to W1 or (W1, W) IN co,
        such that the label W_prop_"thread of R7, W2" exists and
        W_prop_"thread of R7, W2" ==to==> W6*_prop_"thread of R7, W2".
      
      Using the proof of the base case and the A-cumulativity of POWER, we know
      that there is a barrier E1 and E2 that propagates to the thread of R7
      and W2, before W6* does. Then we show similarly to the base case to
      show the propagation of W1 or a coherence successor to the thread of R7* and
      W2 before W6* does. We conclude using "rmw_to" and the transitivity of the
      trace-order.
  QED
  
  
  LEMMA [base_propBefore']:
    Consider a POWER trace with E a memory commit label, W1 and W2 two write
    commit labels, R3* a load-reserve commit label and R a read commit label
    such that:
      - (E,W1) IN xsync + =
      - (W1,W2) IN coi + =
      - (W2, R3*) IN rfe
      - (R3*,R) IN (rmw; rf)^*
    Then:
      * if E is a write, then (E, R) IN propBefore
      * if E is a read, for any write commit label W', such that:
          (W',E) IN propBefore, then (W',R) IN propBefore
  PROOF
    The proof is by case analysis on the structure of [.]^* relating R3* to R.
    * reflexive case:
                                        E ---.
                                             | xsync + =
                                     .- W1 <-'
                             coi + = |       rfe
                                     '-> W2 -----> R
      --------------------------------------------------------------------------
      The is exactly the base case of the original "propBefore_lemma"
      
    * step case
                           E ---.
                                | xsync + =
                        .- W1 <-'
                coi + = |       rfe
                        '-> W2 -----> R3*
                                       |    (rmw; rf)*
                                       |  .------------> R4*
                                       |_/                |
                                                      rmw |
                                                          v   rf
                                                         W5* ---> R
      --------------------------------------------------------------------------
      Using "rf_propBefore" on (W5*,R), we have:
        [1] (W5*, R) IN propBefore
      
      If E is a write, then either we have E=W1 or there (E,W1) IN xync. In the
      first case, W2 is a coherence successor of E, and by "lemma 1" W5* is a
      coherence successor of W2, hence by transivity W5* is a coherence
      successor of E and we are done by [1]. In the second case, by
      "xsync_groupA_2" and [1] we have:
        there exists W a write commit label either equal to E or (E,W) IN co,
        such that the label W_prop_"thread of R" exists and
        W_prop_"thtrad of R" ==to==> W5*_prop_"thread of R".
      The corresponds to the definition of the propBefore relation.
      
      If E is a read, then we must have (E,W1) IN xsync. Using "xsync_groupA_2",
      we have that:
        for any W such that (W,E) IN propBefore, there exists W' a write commit
        label either equal to W or (W,W') IN co, such that the label
        W'_prop_"thread of R" exists and
        W'_prop_"thread of R" ==to==> W5*_prop_"thread of R".
  QED
  
  The previous corresponds to the base case of propBefore_lemma (one step of the
  message passing relation). To show the inductive step, we do exactly like in
  the original propBefore_lemma.
--------------------------------------------------------------------------------
* [rmw_atomicity] a RMW must read from its last modification-order predecessor.
    
    We translate the reasonning to the POWER level. Consider W a write commit
    label and R*, W* two load-reserve and store-conditional commit labels,
    such that:
      (W,R*) IN rf  (R*,W*) IN rmw
    
    We first show that (W,W*) IN co. This is direct by lemma "rf_rmw_co".
    
    Second we need to show that at the C++ there is nothing in the modification
    order between the action corresponding to W and the one corresponding to W*.
    However since modification order is built as a linearization of co, it is
    not sufficient to show there is no write commit label between W and W* in
    co. Instead it is sufficent to show the two following lemmas:
    
    Consider W' a write commit level.
    
    1) If (W',W*) IN co, then (W',W) IN co
    PROOF
      By the 2nd precondition of the "Accept a successful write-conditional
      request" transition, we know that W reaches its coherence point before W*
      is committed.
            
      There are four possible ways for a coherence-edge to be produced:
        a) by action 3 of "accept write request";
        b) through a "partial coherence commitment";
        c) by the action 2 of "write reaches its coherence point";
        d) by the action 4a of "accept a successful write-conditional request".
      
      Since W* is a store-conditional, the transitions from a) and b) can only
      correspond to W' and they would procude a coherence-edge from W* to W',
      which contradict our hypothesis.
      The "partial coherence commitment" transition in c) can only perform once
      W* is visible to the storage-subsystem. Hence after W* is committed, which
      is also the transition during the which W* reaches its coherence point.
      But "partial coherence commitment" cannot add an edge going to a write who
      has already reached its coherence point.
      Hence the coherence-edge from W' to W* must have been produced by the
      action 4a of the "accept a successful write-conditional request" of W*,
      and therefore W' must reach its coherence point before W* is committed.
      
      As a result, it is clear the W' must be committed before W* is committed.
      We now show that W' must be committed before W reaches its coherence point.
      Assume the contrary. If W' is a normal store, then action 3 of the
      corresponding "accept write request" transition adds a coherence-edge from
      W to W'. But then since W* succeed, by the 3rd precondition of its "accept
      a successful write-conditional request", W' cannot reach its coherence
      point before W* is committed. This is a contradiction. If W' is a
      store-conditional then the coherence-edge from W to W' is added by the
      corresponding "accept a successful write-conditional request" transition,
      and the same argument applies.
      
      Now we can show that in fact W' must also reach its coherence point before
      W does. Assume the contrary. If W is a normal store, then action 2 of its
      "write reaches its coherence point" adds the same faulty coherence-edge as
      in the previous paragraph. If W is store-conditional then already have
      shown that it must be committed before W reaches its coherence point, and
      a store-conditional reaches its coherence point at its commit transition.
      
      Now either W is committed before W' reaches its coherence point, in which
      case if W' is a normal store, then action 2 of its "write reaches its
      coherence point" transition adds a coherence-edge from W' to W and we are
      done. If W' is a store-conditional, the same edge is added by action 4b of
      the corresponding "accept a successful write-conditional request"
      transition.
      Or W is committed after W' reaches its coherence point and then if W is a
      normal store, the coherence-edge we are looking for is added either by
      action 3 of the its "accept write request" transition, and if W is a
      store-conditional, then the edge is added by action 4a of its "accept a
      successful write-conditional request" transition.
    QED
    
    2) If (W,W') IN co, then (W*,W) IN co
    PROOF
      First we show that W is committed before W' reaches its coherence point:
        Assume the contrary, either W is a normal store and then its "accept
        write request" transition adds a coherence-edge from W' to W which
        contradicts our hypothesis; or W is a store-conditional and then        
        action 4a of its "accept a successful write-conditional request"
        transition adds the same bad coherence-edge.
      
      We now show that in fact W reaches its coherence point before W' does:
        Assume the contrary, either W' is a normal store and then the action 2
        of its "write reaches its coherence point" transition adds the same bad
        coherence-edge as in the previous paragraph; or W' is a
        store-conditional and then the bad coherence-edge is added by action 4b
        of its "accept a successful write-conditional request" transition.
      
      This implies that W' is committed after W reaches its coherence point:
        Consider the contrary, then the coherence-edge from W to W' is created
        when W reaches its coherence point and we know that W reaches its
        coherence point before W* is committed. But then we have a
        coherence-successor of W reaching its coherence point before W* is
        committed which contradict the fact that W* succeed.
      
      This implies that W' is committed after W* is committed:
        We just so that when W' is committed, W has already reached its
        coherence point. Hence by action 3 of the "accept write request"
        transition or by action 4a of the "accept a successful write-conditional
        request", the coherence-edge from W to W' is added at the commit of W'.
        If W' is committed before W*, then we have the same contradiction as in
        the previous paragraph.
      
      Now, either W' is normal store and then action 3 of its "accept write
      request" gives us the coherence-edge we want; or W' is a store-conditional
      and in that case the coherence-edge is given by action 4a of its "accept a
      successful write-conditional request" transition.
    QED
3 FENCES #######################################################################
* NOTE 1: Even so a fence do satisfy the "is_acquire" and "is_release"
          predicates, the definition of "release_sequence" ensures that a fence
          cannot start a release sequence (by the check on the location kind).
          Also since a fence cannot be on an rf-edge, no fence can play part in
          the original rel/acq sw-edge.
* NOTE 2: Other then for the next point, a fence cannot appear in any of the
          instances of "is_acquire" and "is_release" because of NOTE 1.
* The addition of fences, introduces three new ways of producing an sw-edge:
(1)                                 (2)
                                    
a: F_rel                            a: F_rel
   |                                   |
   | sb                                | sb
   v                                   v
x: W x                              x: W x
   |                                   |
   | hrs                               | hrs
   v      rf                           v      rf
z: W' x ------> y: R x              z: W' x ------> b: R_acq x
                   |
                   | sb
                   v
                b: F_acq
(3)
a: W_rel x
   |
   | rs
   v      rf
z: W' x ------> y: R x
                   |
                   | sb
                   v
                b: F_acq
Using the POWER trace relations and the compilation mapping for fences, the
sequence of POWER relations corresponding to these sw-edges are:
  
  (1) lwsync^refl; po; coi^*; rfe; po; lwsync^refl
  (2) lwsync^refl; po; coi^*; rfe; lwsync^refl
  (3) lwsync^refl; coi^*; rfe; po; lwsync^refl
By its definition, the lwsync-edges can absorb the po-edges, hence the three new
sw-edges correspond to the following POWER trace relation:
  lwsync^refl; coi^*; rfe; lwsync^refl
We therefore need to make another modification (add the ending lwsync instead of
a ctrlisync or a dd) to machine-ithb:
  ( Xsync^refl; coi^refl; rfe; (rmw; rf)^*; (ctrlisync + dd + lwsync)^refl )^+
Since the left part of the relation does not change, hence the validity of
propBefore_lemma is not affected. The semantics of the lwsync instruction being
stronger than that of ctrlisync, the two corollaries still hold.
* [sc_fences_heeded] "by construction of the sc-order".