We extend the proof for load and store operations by adding the C++ locks, RMWs and fences. We introduce a new POWER trace relation to track the presence of RMWs. These additions change the definition of happens-before and therefore affect its caracterisation. The proof goes by showing that despite these modifications of machine_ithb, propBefore_lemma and its corollaries still hold. ################################## New labels ################################## We now have load-reserves and store-conditionals in POWER. This introduces two new kinds of labels: "load-reserve commit labels" and "successful store-conditional commit labels". From now on, when referring to a "commit label" we include these; when referring to a "write commit label" we include successful store-conditional commits, and similarly for loads and load-reserves. ################################# New relation ################################# [rmw] (R*,W*) IN rmw iff 1) R* is a load-reserve commit label 2) W* is a successful store-conditional commit label 3) (R*,W*) IN po 4) there is no successful commit label of a store conditional E, such that (R*,E) IN po and (E,W*) IN po ######################## Assumptions about the compiler ######################## (1) The only POWER loads/stores writing to a lock location are the ones implementing locks and unlocks. (2) The only stores writing the value stating that a lock is free are the ones implementing unlocks. (3) The only stores writing the value stating that a lock is taken are the store-conditionals implementing locks. (4) At the beginning of any execution, all lock locations are initialised with the value stating that a lock is free. ################################## Some lemmas ################################# LEMMA rf_rmw_co: Consider a POWER trace with W a write commit label, R* a load-reserve commit label and W* a store-conditional commit label such that: (R*,W*) IN rmw and (W, R*) IN rf then (W,W*) IN co. PROOF By the action 4a of the "Accept a successful write-conditional request" transition. QED LEMMA rmw_to: Consier a POWER trace with R* a load-reserve commit label and W* a successful store-conditional commit label such that: (R*,W*) IN rmw then R* ==to==> W* PROOF Since W* succeed, the program-order previous load-reserve must be at the same location. By the definition of the rmw relation, R* is the program-order previous load-reserve. We conclude by the 4th precondition of the "Commit in-flight instruction" transition. QED ################################################################################ ################################################################################ Now, for each additions we show the new characterisation of ithb, show that propBefore_lemma and its corollaries hold, and verify the new properties specified by the C++ memory model. 1 LOCKS ######################################################################## Let the register r3 contain a lock location, register r4 contain the value stating that a lock is free and register r5 contain the value stating that a lock is taken. Lock and unlock actions at the location stored in r3 are implemented as follows: ------------------------------------------------------------------------- Lock loop: lwarx r6,0,r3,1 cmpw r4,r6 bne- wait stwcx. r5,0,r3 bne- loop isync ----------------------------------------------------------------------- Unlock lwsync stw r4,0(r3) ------------------------------------------------------------------------------ When reconstructing a C++ execution from a POWER trace, a total "lock-order" must be built for each lock location. We take the restriction of the coherence-order to the store-conditionals of locks and the stores of unlock and linearize it. We write (lo) the disjunct union of these orders. We now have to show some properties from [consistent_locks]: ------------------------------------------------------------------------------ ** Consider L, L' two successful lock actions such that (L,L') IN lo, then there exists an unlock action U such that (L,U) IN lo and (U,L') IN lo: By translating that statement to the POWER level, we have: (R1*, W1*) IN rmw and (R2*, W2*) IN rmw (W1*,W2*) in co where R1*, W1* and R2*, W2* are respectively the load-reserve and successful store-conditional commit labels corresponding to L and L'. In the implementation of a lock, the load-reserve is followed by a test making sure that the value read states that the lock is free. If this is not the case, the program wait and the load-reserve is reexecuted later. Here we know that the store-conditional W2* is committed, hence at some point the load-reserve read the value stating that the lock is free. The commit corresponding to that "last" load is R2*, this is direct by the definition of rmw which makes sure that R2* is the last load-reserve commit label before W2*. But also the implementation of locks, W1* stores the value stating that a lock is taken. Hence R2* cannot read from W1*. Let's name W the store read by R2*: (W,R2*) IN rf Note that compilation mapping makes sure that the stores writing at a lock location the value stating that a lock is free, are the stores implementing an unlock, hence W correspond to an unlock. It remains to show that (W1*,W) IN co and (W,W2*) IN co. The first is exactly "rf_rmw_co_co_1", and the second is direct by lemma "rf_rmw_co". ------------------------------------------------------------------------------ ** lo is a strict total order of mutex actions: By the "threadwise_good_theorem" hypothesis and the assumption that any execution of the original C++ program respects the "good_mutex_use" predicate, in the POWER trace any store implementing an unlock must be program-order after a store-conditional implementing a lock. Since memory accesses on a same thread and at a same location must be committed in order, we have that at any lock location the coherence-order starts with a store-conditional implementing a lock. Then from the previous point, we know that in the lock-order a lock must follow an unlock. We translate the reasonning to the POWER level. The reflexivity and transitivity directly comes from the coherence-order. It is clear the reduction we take do not relate non-mutex actions. When a store W reaches its coherence point, it becomes coherence-before any store at the same address seen by the storage-subsystem but that has not yet reached its coherence point. Also, when a new store to the same address is committed, it directly becomes coherence-after W. Hence, for a given location, by the time the last store at the location seen by the storage-subsystem reaches its coherence point, the coherence relation (restricted to the stores at that given location) is total. By the 3rd action of the "Accept a successful write-conditional request", the store-conditionals implementing locks reaches their coherence point during their commit transition. Hence, if for each lock location, the coherence relation ends with a store-conditional, the derived C++ lock-order is directy total over mutex actions. However the coherence may end with normal stores implementing unlocks. For this reason, we need to linearize the coherence relation when building the C++ lock-order. However note that during the linearization, we do not add any coherence edge going to a store that has reached its coherence point, hence no edge going to a store-conditional (implementing a lock) is added. Nor to a coherence predecessor of a store-conditional. ------------------------------------------------------------------------------ ** the restriction to mutexes of the happens-before is a subset of lock-order: Consider in the reconstructed C++ execution, (E1, E2) IN hb, such that E1 and E2 are mutex actions. There are two ways for the hb-edge to exists. Either from (E1, E2) IN sb or from (E1, E2) IN lo with E1 an unlock and E2 a successful lock. In the second case we are done. In the first case, we have three subcases: 1) E1 and E2 are both unlocks: we know that lo is total over mutex. If we translate the situation to the POWER level, then we have W1 and W2 the stores respectively corresponding to E1 and E2, such that (W1, W2) IN po. When W2 commit, the storage-subsystem adds a coherence edge from W1 to W2. Hence, in C++ we must have (E1,E2) IN po. 2) E1 is a successful lock and E2 an unlock: the argument is the same as for 1). 3) E1 and E2 are both successful locks the argument is the same as for 1). ------------------------------------------------------------------------------ * The addition of locks, introduces a new way of producing an sw-edge: U --lo--> L where U is an unlock, L a lock. At the POWER level, following the discussion of the first point (**) regarding the values writen and read by the instructions implementing locks and unlocks, a "lo-edge" must be a sequence of synchronisation between unlocks and locks (aternating). By the good_mutex_use predicate satisfied by the original C++ program, any unlock must follow in sequence-before by a successful lock at the same location. Hence, by the hypothesis of "good compiler", any store implementing an unlock must be program-order after a store-conditional implementing a lock at the same location. In term of the POWER trace relations using the protection placed before an unlock and after a lock, this corresponds to a chain of: lwsync; rf;ctrlisync. It is clear that this included in machine_ithb. Hence the addition of locks doesn't change the characterization or break the propBefore_lemma. 2 RMWs ######################################################################### * NOTE 1: RMWs do satisfy the "is_read" and "is_write" predicates, but since the mapping places the same protections as for the corresponding atomic loads and stores this does not affect the definition of machine_ithb. * The addition of RMWs however changes the shape of release-sequences, and therefore affect the corresponding part of machine_ithb. Previously the relation was: ( Xsync^refl; coi^refl; rfe; (ctrlisync + dd)^refl )^+ the modified version is: ( Xsync^refl; coi^refl; rfe; (rmw; rf)^*; (ctrlisync + dd)^refl )^+ Justification of the modification -------------------------------------------- RMWs can appear inside of a release sequence, that is a RMW can be followed by a normal write in a release sequence. But that following write (W') must be from the thread of the release head (W), hence the corresponding commit labels are such that: (W, W') IN coi. RMWs can also be on the same thread as the release head, but in that case its store-conditional is coi-related to the store corresponding to the release head. Hence we can forget about these two kinds of RMWs. Either there is no other kind of RMWs in the release sequence, and the "rfe" points to the acquire read, or there is a RMWs chain ending the release-sequence without coming back the thread of release head. In that case, the "rfe" points to the load-reserve of the first RMW of the chain and (rmw; rf)^* is not empty. The "rmw" related the load-reserve and store-conditional pair of a single RMW and the "rf" relates a store-conditional to the load-reserve of the next RMW in the chain. The final "rf" points to the acquire read. Note that the last RMW can be on the thread of the acquire read. ------------------------------------------------------------------------------ * [propBefore_lemma] still holds LEMMA 1: In any POWER trace, the relation rf; (rmw; rf)*; rmw is a subset of co. PROOF The proof is by induction on the (.)*. * base case: rf W1 ---> R2* | | rmw v W3* -------------------------------------------------------------------------- This is exactly the result of "rf_rmw_co". * inductive step: rf W1 ---> R2* | rmw | v rf W3* ---> R4* | (rmw; rf)* | .----------> R5* |_/ | | rmw v W6* -------------------------------------------------------------------------- By the induction hypothesis we have: [1] (W3*, W6*) IN co By "rf_rmw_co" we have that (W1, W3*) IN co, we then conclude by transitivity of the coherence relation. QED LEMMA [xsync_groupA_2] Consider a POWER trace with E1 and E2, two memory commit labels such that: (E1,E2) IN xsync and W1 a write commit labels W1 and W2* a store-conditional commit label such that: (a) W1 = E1 or (W1,E1) IN propBefore (b) (E2,W2*) IN po; rfe; (rmw; rf)*; rmw Then for any thread t, if the label W2*_prop_t exists, then: there exits W a write commit label either equal to W1 or (W1, W) IN co, such that the label W_prop_t exists and W_prop_t ==to==> W2*_prop_t. PROOF The proof is by structural induction over (rmw; rf)^*: * base case: W1 ---. | propBefore + = .- E1 <-' xsync | '-> E2 | po | v rfe W3 -----> R4* | | rmw v W2* -------------------------------------------------------------------------- By the definition of the Xsync, there exists a barrier commit label S from the same thread as E1 and E2 such that : E1 ==to==> S ==to==> E2 From the definition of the propBefore relation, "lastSat_to" and the transitivity od the trace order, it is easy to see that W1 is in the group A of S. Since S and E2 are on the same thread they are program-order related, and using (b) and by transitivity of the program-order relation, we have: (S, W2) IN po We must have S ==to==> W2 by the CommitPrevBarrierLS precondition of Commit in-flight instruction. By "rf_propBefore" on (W3,R4*), we have: (W3,R4*) IN propBefore Hence, by the 3rd precondition of the transition "Propagated write to another thread", we have that the barrier S propagates to the thread of R4* and W2*, before W3 does. Hence, by the 2nd precondition of the "Propagated a barrier to another thread" transition, we have that there exists some write commit label W, either equal to W1 or (W1, W) IN co, such that W propagates to the thread of R4* and W2*, before S does. By "rmw_to", we have: R4* ==to==> W2* We conclude by using the 3rd precondition of the "Propagated write to another thread" and the 2nd precondition of the "Propagated a barrier to another thread" transitions regarding a possible propagation of W2* to some thread t. * inductive step: W1 ---. | propBefore + = .- E1 <-' xsync | '-> E2 | po | v rfe W3 -----> R4* | (rmw; rf)* | .------------> R5* |_/ | rmw | v rf W6* ---> R7 | | rmw v W2 -------------------------------------------------------------------------- By the definition of the rmw relation, we have: (R5*, W6*) IN co By "rf_propBefore" on (W6*,R7), we have: (W6*,R7) IN propBefore Hence using the induction hypothesis, we have: there exits W a write commit label either equal to W1 or (W1, W) IN co, such that the label W_prop_"thread of R7, W2" exists and W_prop_"thread of R7, W2" ==to==> W6*_prop_"thread of R7, W2". Using the proof of the base case and the A-cumulativity of POWER, we know that there is a barrier E1 and E2 that propagates to the thread of R7 and W2, before W6* does. Then we show similarly to the base case to show the propagation of W1 or a coherence successor to the thread of R7* and W2 before W6* does. We conclude using "rmw_to" and the transitivity of the trace-order. QED LEMMA [base_propBefore']: Consider a POWER trace with E a memory commit label, W1 and W2 two write commit labels, R3* a load-reserve commit label and R a read commit label such that: - (E,W1) IN xsync + = - (W1,W2) IN coi + = - (W2, R3*) IN rfe - (R3*,R) IN (rmw; rf)^* Then: * if E is a write, then (E, R) IN propBefore * if E is a read, for any write commit label W', such that: (W',E) IN propBefore, then (W',R) IN propBefore PROOF The proof is by case analysis on the structure of [.]^* relating R3* to R. * reflexive case: E ---. | xsync + = .- W1 <-' coi + = | rfe '-> W2 -----> R -------------------------------------------------------------------------- The is exactly the base case of the original "propBefore_lemma" * step case E ---. | xsync + = .- W1 <-' coi + = | rfe '-> W2 -----> R3* | (rmw; rf)* | .------------> R4* |_/ | rmw | v rf W5* ---> R -------------------------------------------------------------------------- Using "rf_propBefore" on (W5*,R), we have: [1] (W5*, R) IN propBefore If E is a write, then either we have E=W1 or there (E,W1) IN xync. In the first case, W2 is a coherence successor of E, and by "lemma 1" W5* is a coherence successor of W2, hence by transivity W5* is a coherence successor of E and we are done by [1]. In the second case, by "xsync_groupA_2" and [1] we have: there exists W a write commit label either equal to E or (E,W) IN co, such that the label W_prop_"thread of R" exists and W_prop_"thtrad of R" ==to==> W5*_prop_"thread of R". The corresponds to the definition of the propBefore relation. If E is a read, then we must have (E,W1) IN xsync. Using "xsync_groupA_2", we have that: for any W such that (W,E) IN propBefore, there exists W' a write commit label either equal to W or (W,W') IN co, such that the label W'_prop_"thread of R" exists and W'_prop_"thread of R" ==to==> W5*_prop_"thread of R". QED The previous corresponds to the base case of propBefore_lemma (one step of the message passing relation). To show the inductive step, we do exactly like in the original propBefore_lemma. -------------------------------------------------------------------------------- * [rmw_atomicity] a RMW must read from its last modification-order predecessor. We translate the reasonning to the POWER level. Consider W a write commit label and R*, W* two load-reserve and store-conditional commit labels, such that: (W,R*) IN rf (R*,W*) IN rmw We first show that (W,W*) IN co. This is direct by lemma "rf_rmw_co". Second we need to show that at the C++ there is nothing in the modification order between the action corresponding to W and the one corresponding to W*. However since modification order is built as a linearization of co, it is not sufficient to show there is no write commit label between W and W* in co. Instead it is sufficent to show the two following lemmas: Consider W' a write commit level. 1) If (W',W*) IN co, then (W',W) IN co PROOF By the 2nd precondition of the "Accept a successful write-conditional request" transition, we know that W reaches its coherence point before W* is committed. There are four possible ways for a coherence-edge to be produced: a) by action 3 of "accept write request"; b) through a "partial coherence commitment"; c) by the action 2 of "write reaches its coherence point"; d) by the action 4a of "accept a successful write-conditional request". Since W* is a store-conditional, the transitions from a) and b) can only correspond to W' and they would procude a coherence-edge from W* to W', which contradict our hypothesis. The "partial coherence commitment" transition in c) can only perform once W* is visible to the storage-subsystem. Hence after W* is committed, which is also the transition during the which W* reaches its coherence point. But "partial coherence commitment" cannot add an edge going to a write who has already reached its coherence point. Hence the coherence-edge from W' to W* must have been produced by the action 4a of the "accept a successful write-conditional request" of W*, and therefore W' must reach its coherence point before W* is committed. As a result, it is clear the W' must be committed before W* is committed. We now show that W' must be committed before W reaches its coherence point. Assume the contrary. If W' is a normal store, then action 3 of the corresponding "accept write request" transition adds a coherence-edge from W to W'. But then since W* succeed, by the 3rd precondition of its "accept a successful write-conditional request", W' cannot reach its coherence point before W* is committed. This is a contradiction. If W' is a store-conditional then the coherence-edge from W to W' is added by the corresponding "accept a successful write-conditional request" transition, and the same argument applies. Now we can show that in fact W' must also reach its coherence point before W does. Assume the contrary. If W is a normal store, then action 2 of its "write reaches its coherence point" adds the same faulty coherence-edge as in the previous paragraph. If W is store-conditional then already have shown that it must be committed before W reaches its coherence point, and a store-conditional reaches its coherence point at its commit transition. Now either W is committed before W' reaches its coherence point, in which case if W' is a normal store, then action 2 of its "write reaches its coherence point" transition adds a coherence-edge from W' to W and we are done. If W' is a store-conditional, the same edge is added by action 4b of the corresponding "accept a successful write-conditional request" transition. Or W is committed after W' reaches its coherence point and then if W is a normal store, the coherence-edge we are looking for is added either by action 3 of the its "accept write request" transition, and if W is a store-conditional, then the edge is added by action 4a of its "accept a successful write-conditional request" transition. QED 2) If (W,W') IN co, then (W*,W) IN co PROOF First we show that W is committed before W' reaches its coherence point: Assume the contrary, either W is a normal store and then its "accept write request" transition adds a coherence-edge from W' to W which contradicts our hypothesis; or W is a store-conditional and then action 4a of its "accept a successful write-conditional request" transition adds the same bad coherence-edge. We now show that in fact W reaches its coherence point before W' does: Assume the contrary, either W' is a normal store and then the action 2 of its "write reaches its coherence point" transition adds the same bad coherence-edge as in the previous paragraph; or W' is a store-conditional and then the bad coherence-edge is added by action 4b of its "accept a successful write-conditional request" transition. This implies that W' is committed after W reaches its coherence point: Consider the contrary, then the coherence-edge from W to W' is created when W reaches its coherence point and we know that W reaches its coherence point before W* is committed. But then we have a coherence-successor of W reaching its coherence point before W* is committed which contradict the fact that W* succeed. This implies that W' is committed after W* is committed: We just so that when W' is committed, W has already reached its coherence point. Hence by action 3 of the "accept write request" transition or by action 4a of the "accept a successful write-conditional request", the coherence-edge from W to W' is added at the commit of W'. If W' is committed before W*, then we have the same contradiction as in the previous paragraph. Now, either W' is normal store and then action 3 of its "accept write request" gives us the coherence-edge we want; or W' is a store-conditional and in that case the coherence-edge is given by action 4a of its "accept a successful write-conditional request" transition. QED 3 FENCES ####################################################################### * NOTE 1: Even so a fence do satisfy the "is_acquire" and "is_release" predicates, the definition of "release_sequence" ensures that a fence cannot start a release sequence (by the check on the location kind). Also since a fence cannot be on an rf-edge, no fence can play part in the original rel/acq sw-edge. * NOTE 2: Other then for the next point, a fence cannot appear in any of the instances of "is_acquire" and "is_release" because of NOTE 1. * The addition of fences, introduces three new ways of producing an sw-edge: (1) (2) a: F_rel a: F_rel | | | sb | sb v v x: W x x: W x | | | hrs | hrs v rf v rf z: W' x ------> y: R x z: W' x ------> b: R_acq x | | sb v b: F_acq (3) a: W_rel x | | rs v rf z: W' x ------> y: R x | | sb v b: F_acq Using the POWER trace relations and the compilation mapping for fences, the sequence of POWER relations corresponding to these sw-edges are: (1) lwsync^refl; po; coi^*; rfe; po; lwsync^refl (2) lwsync^refl; po; coi^*; rfe; lwsync^refl (3) lwsync^refl; coi^*; rfe; po; lwsync^refl By its definition, the lwsync-edges can absorb the po-edges, hence the three new sw-edges correspond to the following POWER trace relation: lwsync^refl; coi^*; rfe; lwsync^refl We therefore need to make another modification (add the ending lwsync instead of a ctrlisync or a dd) to machine-ithb: ( Xsync^refl; coi^refl; rfe; (rmw; rf)^*; (ctrlisync + dd + lwsync)^refl )^+ Since the left part of the relation does not change, hence the validity of propBefore_lemma is not affected. The semantics of the lwsync instruction being stronger than that of ctrlisync, the two corollaries still hold. * [sc_fences_heeded] "by construction of the sc-order".