Allowed by Flat-operational => allowed by ARMv8.2-axiomatic ----------------------------------------------------------- ******************************************************************************** This proof assumes 0. Finite candidate executions and finite Flat traces. 1. That if a write-exclusive is successfully paired with a read-exclusive, then they are to the same address. 2. Axiomatic and Operational model use the same instruction semantics. 3. (E,E') in addr iff: E is a read and E' is a read or a write and E feeds into a register write that affects the address of E'. Note: this means addr(E,E') does not hold if E is a write-exclusive. 4. (E,E') in data iff: E is a read and E' is a write and E feeds into a register write that affects the value written by E'. Note: this means data(E,E') does not hold if E is a write-exclusive. 5. (E,E') in ctrl iff: E is a read and E' is a write and E feeds into a register write that affects a conditional branch or a computed branch instruction program-order before E'. Note: this means ctrl(E,E') does not hold if E is a write-exclusive. ******************************************************************************** Note: In herd a successful write-exclusive's success bit introduces dependencies *from its matching read-exclusive*: For example, in the program-fragment a: LDXR W1,[X0] b: STXR W2,W1,[X0] c: STR W2,[X3] where c stores the success bit of b into some location in memory, in herd it will be (a,c) in data. This does not match ARMv8's concurrency semantics where it is neither (a,c) in data, nor (b,c) in data. ******************************************************************************** Let (po,rf,co,rmw) be a candidate execution. Ignoring weak acquire loads ("Q"), the ARMv8.2 model is this: (* Coherence-after *) let ca = fr | co (* Per-location coherence requirement *) acyclic po-loc | ca | rf as coherence (* ca only used here *) (* Observed-before *) let obs = rfe | fre | coe (* Dependency-ordered-before *) let dob = addr | data | ctrl; [W] | (ctrl | (addr; po)); [ISB]; po; [R] | addr; po; [W] | (addr | data); rfi | (ctrl | data); coi (* Atomic-ordered-before *) let aob = rmw | [range(rmw)]; rfi; [A] (* Barrier-ordered-before *) let bob = po; [dmb.full]; po | [L]; po; [A] | [R]; po; [dmb.ld]; po (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the right *) | [A]; po (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the right *) | [W]; po; [dmb.st]; po; [W] | po; [L] (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the left *) | po; [L]; coi (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the left *) (* Ordered-before *) let rec ob = obs | dob | aob | bob | ob; ob irreflexive ob as ob (* ob is only used here, so can remove ob's recursion and check for acyclicity *) (* Atomic: Basic LDXR/STXR constraint to forbid intervening writes. *) empty rmw & (fre; coe) as atomic This is equivalent to this, according to the comments. acyclic po-loc | fr | co | rf as coherence (* ca only used here *) let obs = rfe | fre | coe let dob = addr | data | ctrl; [W] | (ctrl | (addr; po)); [ISB]; po; [R] | addr; po; [W] | (addr | data); rfi | (ctrl | data); coi (* Atomic-ordered-before *) let aob = rmw | [range(rmw)]; rfi; [A] let bob = po; [dmb.full]; po | [L]; po; [A] | [R]; po; [dmb.ld]; po; [R|W] | [A]; po; [R|W] | [W]; po; [dmb.st]; po; [W] | [R|W]; po; [L] | [R|W]; po; [L]; coi let ob = (obs | aob | dob | bob)+ irreflexive ob as ob empty rmw & (fre; coe) as atomic Lemma 2a: Let TR be a valid trace of Flat-operational that induces the relations po, co, rf, and rmw. Then there is a valid trace TR' that induces the same po, co, rf, and rmw relations but has no restarts. Lemma 2b: Let TR be a valid trace of Flat-operational that induces the relations po, co, rf, and rmw. Then there is a valid trace TR' that induces the same po, co, rf, and rmw relations but has no discarded instruction tree branches ("always speculates the correct successor instruction of a branch"). Lemma 3. Let TR be a valid trace of the operational model that has no restarts or discarded instruction tree branches, which induces the relations po, co, rf, and rmw. Let EDGE be an edge from ARMv8.2's "ob". Then the following property holds TR: for - EDGE: barrier E -> barrier E'. In TR E is committed before E'. - EDGE: barrier E -> write E'. In TR E is committed before E' is propagated. - EDGE: write E -> barrier E'. In TR E is propagated before E' is committed. - EDGE: write E -> write E'. In TR E is propagated before E'. - EDGE: barrier E -> read R. In TR E is committed before R is satisfied. - EDGE: write E -> read R. In TR E is propagated before R is satisfied. - EDGE: read R -> barrier E. In TR R is satisfied before E is committed. - EDGE: read R -> write E. In TR R is satisfied before E is propagated. - EDGE: read R -> read R'. In TR R is satisfied before R'. Proof by induction on the definition of ob. Induction step: EDGE in obs | aob | dob | bob. Then there are multiple cases: EDGE = rfe Then EDGE: W -> R and W and R are from different threads. In Flat-operational for R to read from R, W must be propagated to memory, so the induction statement holds. EDGE = coe Then EDGE: W -> W', and W and W' in co. By assumption the coherence order induced by TR is the same as co. By definition for (W,W') in the operational model's coherence order W has to propagated to memory before W'. So the induction statement holds. EDGE = fre Then EDGE: R -> W, R and W are from different threads, and there is W' with (W',R) in rf and (W',W) in co. Now there are two cases: R is satisfied by forwarding from W' or not. a) By forwarding: then R is satisfied before W' propagates in TR and by the proof for EDGE = coe the write W' propagates to memory before W. So R is satisfied before W propagates to memory and the induction statement holds. b) Then when R is satisfied W' is already propagated and in TR the write W' propagates before W. Since R reads from the last write to R's location and (W',R) in rf it must be that W' is in memory and W is not yet in memory. So R satisfies before W is propagated to memory. EDGE = rmw Then EDGE: R -> W. Then R and W are a successful load/store-exclusive pair and by definition of pop_commit_store_cand R must be finished, and therefore satisfied, for W to commit, hence propagate. EDGE = [range(rmw)]; rfi; [A] Then EDGE: W -> R. Then W is a successful store exclusive and R an acquire read A. Since by definition of Flat-operational R cannot read from W by forwarding W must commit to memory before R can satisfy. EDGE = addr Then a) EDGE: R -> R' or b) EDGE R -> W. a) In TR R' can only be satisfied if initiated, so after its address is available, so R must be satisfied before R' is. b) In TR W can only propagate if initiated, so after its address is available, so R must be satisfied before W propagates. EDGE = data Then EDGE: R -> W. W can only propagate after committing, and commit if its data-feeding memory reads are finished, including being satisfied. So R is satisfied before W propagates in TR. EDGE = ctrl; [W] Then EDGE : R -> W. W can only propagate when committed, and by pop_commit_store_cand commit when the memory reads feeding into the register reads of conditional branches are finished, including being satisfied. So R is satisfied before W propagates in TR. EDGE = (ctrl | (addr; po)); [ISB]; po; [R] Then EDGE: R -> R'. Before R' can finish it has to satisfy. By definition of pop_memory_read_request_cand R' can only satisfy if all po-earlier ISBs are finished and therefore committed. By pop_commit_barrier_cand any such ISB can only commit if the memory reads feeding into the conditional branches po-before the ISB are finished, including being satisfied, and if all po-earlier memory accesses have their address-feeding memory reads finished, including being satisfied. Therefore R must be satisfied before R' can be satisfied in TR. EDGE = addr; po; [W] Then EDGE: R -> W. By pop_commit_cand before W can only propagate when committed, and commit only when the address-feeding memory reads of all po-earlier memory accesses are finished, including being satisfied. Therefore R must be satisfied in TR before W propagates. EDGE = (addr | data); rfi Then EDGE: R -> R'. For R' to satisfy, the write it reads from must have both address and data available, so any memory read feeding into the address or data register reads of this write has to be satisfied. Therefore R must be satisfied before R' can satisfy. EDGE = (ctrl | data); coi Then EDGE: R -> W and there is W' such that (R,W') in (ctrl|data) and (W',W) in coi. By definition of how the operational model induces coherence W' must propagate to memory before W does. Before W' propagates it must commit, and therefore all memory reads feeding into the data registers reads of W' and all memory reads feeding into the register reads by conditional branch instructions po-before W' must be finished, including being satisfied. Therefore R must be satisfied before W' can propagate, before W can propagate in TR. EDGE = po; [dmb.full]; po Let DMB be the dmb. Case a) EDGE: RBW (read/write/barrier) -> R. R can only satisfy when DMB is finished, hence committed. DMB can only commit when all po-earlier instructions are finished, including being satisfied (for reads)/committed (for barriers)/propagated (for writes). Case b) EDGE: RBW (read/write/barrier) -> B. B can only commit when DMB is finished, hence committed. DMB can only commit when all po-earlier instructions are finished, including being satisfied (for reads)/committed (for barriers)/propagated (for writes). Case c) EDGE: RBW (read/write/barrier) -> W. W can only propagate when committed, and only commit when DMB is finished, hence committed. DMB can only commit when all po-earlier instructions are finished, including being satisfied (for reads)/committed (for barriers)/propagated (for writes). EDGE = [L]; po; [A] Then EDGE: W -> R. By definition of pop_memory_read_request_cand, the acquire read R can only initiate when all po-earlier releases are finished and hence propagated. So W propagated before R satisfied in TR. EDGE = [R]; po; [dmb.ld]; po; [R|W] Let B be the DMB.LD Case a) EDGE: R -> W Then B can only commit after R is finished, including being satisfied, and W can only commit and hence propagate after B is finished and hence committed. So R is satisfied before W propagates in TR. Case b) EDGE: R -> R' Then B can only commit after R is finished, including being satisfied, and by definition of read_request_cand R' can only satisfy, if B is finished and therefore committed. So R is satisfied before R' in TR. EDGE = [A]; po; [R|W] Case EDGE: R -> R') Then by definition of read_request_cand R must be completed and hence satisfied before R' can satisfy. Case EDGE: R -> W) Then by definition of pop_commit_cand W can only commit and hence propagate after R is finished, including being satisfied in TR. EDGE = [W]; po; [dmb.st]; po; [W] Then EDGE: W -> W'. Let B be the dmb.st. Then by definition of pop_commit_barrier_cand B can only commit when W is finished and hence propagated and W' can only commit and hence propagate when B is finished and hence committed. So W propagates before W' in TR. EDGE = [R|W]; po; [L] Case EDGE: R -> W) Then by definition of pop_commit_store_cand W can only commit and hence propagate when R is finished and hence satisfied in TR. Case EDGE: W -> W') Then by definition of pop_commit_store_cand W' can only commit and hence propagate when W is finished and hence propagated in TR. EDGE = [R|W]; po; [L]; coi Let L be the write release. Case EDGE: R -> W) As shown above L can only commit when R is satisfied, and by definition of pop_write_co_check W can only propagate when L is propagated. So R must be satisfied before W can propagate in TR. Case EDGE: W -> W') As shown above L can only commit when W is propagated, and by definition of pop_write_co_check W' can only propagate when L is propagated. So W must propagate before W' in TR. Induction step: EDGE = EDGE'; EDGE'' in ob; ob. By transitivity of implication. Let TR be a (partial) trace of Flat-operational. By Lemma 2a and Lemma 2b there is an equivalent trace TR' for TR with the same co, rf, and rmw relations, with no restarts and no discarded instruction tree branches. So let TR' be such a trace of Flat-operational trace with no restarts and no discarded instruction tree branches and let rf, co, rmw, and po be such that (W,R) in rf if the read R is satisfied by the write W in TR, where (W,W') in co if W is propagated to memory before W' in TR, where (RE,WE) in rmw if RE and WE are a successful read-exclusive/write-exclusive pair in TR, and where po is the (linear) instruction tree viewed as a relation. Then the execution is allowed by the ARMv8.2-axiomatic model. Proof. Assume in the ARMv8.2 model there is a cycle in ob. Let EDGE be the cycle. So EDGE : R -> R or EDGE : B -> B or EDGE : W -> W. Case EDGE: R -> R). But from Lemma 3 follows that in TR' R is satisfied before R is satisfied. But since TR' has no restarts or discarded branches, R can only be satisfied once, contradiction. Case EDGE: B -> B). But from Lemma 3 follows that in TR' B is committed before B is committed. But every barrier is only committed once, contradiction. Case EDGE: W -> W). But from Lemma 3 follows that in TR' W is propagtated before W is propagated. But every write is only propagated once, contradiction. Remains to show: by definition of Flat-operational's thread semantics: po-loc | fr | co | rf acyclic, and rmw & (fre; coe) empty. Assume a cycle in po-loc | fr | co | rf as induced by the trace TR' and let C be the cycle that is derived using a minimal number of po-loc edges, and among the cycles with the same number of po-loc edges minimises the total number of edges. Then C has exactly one po-loc edge. Assume C has no po-loc edge. Then C in (rf | co | fr)+ = ([W]; rf; [W] | [W]; co; [W] | [R]; rf^-1; co; [W])+. But then C is in (rf | co): assume it is not, so it includes at least one fr edge. (By type rf^-1; co itself is acyclic.) Then C = (E1,E2); ...; (E3,E4); (E4,E5); rf^-1; co. Then E5 is a read and by type (E4,E5) in rf. So C = (E1,E2); ...; (E3,E4); rf; rf^-1; co. But then C' = (E1,E2); ...; (E3,E4); co is a shorter cycle with no po-loc edges. So C in (rf | co)+. But then by type C in co+: No edge in (rf | co)+ starts from a read, so rf cannot participate in C. By definition of Flat-operational's co relation it is (W,W') in co if W propagates to memory before W' in TR'. But since every write only propagates once, co+ is acyclic. So assume C has at least one po-loc edge. Show it cannot have multiple. Assume C has multiple po-loc edges. Then it must be C = (E1,E2); ...; (E3,E4); po-loc; (E5,E6); ...; (E7,E8); po-loc, without loss of generality assuming (E5,E6); ...; (E7,E8) not containing a po-loc edge. Since po-loc; po-loc <= po-loc assume also (E5,E6); ...; (E7,E8) non-empty. So (E5,E6); ...; (E7,E8) <= (co|rf|fr)+. Now there are different cases for the type of (E4,E5) in po-loc: E4 -> E5: W -> W'. Then it must be (E4,E5) in co, otherwise it is (E5,E4) in co and (E5,E4); (E4,E5) is a cycle in co; po-loc with fewer po-loc edges. But then C can be constructed without using (E4,E5) as po-loc but using (E4,E5) as co, contradicting the minimality of C's po-loc edges. E4 -> E5: W -> R. If it is (E4,E5) in rf then (E4,E5) can be derived using rf instead of po-loc, contradicting the minimality of po-loc edges in C. So assume (E4,E5) not in rf. So (W',E5) in rf for some W'. If (W,W') in co it is (E4,E5) in co; rf; contradicting the minimality of po-loc edges in C. So assume (W',W) in co. But then it is (E5,E4) in fr and (E5,E4); (E4,E5) is a cycle in fr; po-loc with a smaller number of po-loc edges than C. E4 -> E5: R -> W. Let W' be such that (W',R) in rf. Now it is either W' = W or not. If W'= W then (E5,E4); (E4,E5) is a cycle in rf; po-loc with fewer po-loc edges. So assume W <> W'. Then it is either (W,W') in co or (W',W) in co. If (W',W) in co then (E4,E5) in fr and (E4,E5) and the same cycle can be derived with fewer po-loc edges, contradiction. If (W,W') in co then (W',E4); (E4,E5); (E5,W') is a cycle in rf; po-loc; co with fewer po-loc edges. E4 -> E5: R -> R'. Now there are two cases: R and R' read the same write or not. Assume R and R' read from the same write W. By type (E5,E6) must be fr. But since R and R' read from the same write it is also (E4,E6) in fr, contradiction, since then (E1,E2); ...; (E3,E4); (E4,E6); ...; (E7,E8); po-loc is a cycle with fewer po-loc edges. So assume (W,R) in rf and (W',R') in rf for W <> W'. Now there are two cases: (W,W') in co or (W',W) in co. If (W,W') in co it is (R,W') in fr, so (E4,E5) can be derived using fr; rf contradicting the minimality of po-loc edges in C. So assume (W',W) in co. But then (R',W) in fr and (E5,W); (W,E4); (E4,E5) is a cycle in fr; rf; po-loc with fewer po-loc edges, contradiction. So assume C has exactly one po-loc edge. Then C in (co|rf|fr)+; po-loc, so C = C'; P for C in (co|rf|fr)+ and P in po-loc. Now look at all possible cases for the type of P. ## Case P: W -> W'. Then C': W' -> W in co+. Assume otherwise. Then by type it must be C' = C''; rf; fr; C''' = C''; rf; rf^-1; co; C''' = C''; co; C'''. Contradiction to the assumption that C is the shortest cycle. If C' in co+, then by definition of the operational model's coherence it must be that W' propagates to memory before W. But by pop_write_co_check and (W,W') in po-loc W' can only propagate after W is propagated to memory, contradiction. ## Case P: W -> R. Then C': R -> W in fr. Assume otherwise. Then by type either C' in fr; co+ = rf^-1; co; co+ = rf^-1; co+ = rf^-1; co = fr as by definition of the operational model's coherence relation co is transitive. or C' = fr; C''; rf; fr; C''' for fr; C'' in co* and C''' in (co | rf | fr)* but then C' = fr; C''; rf; rf^-1; co; C''' = fr; C''; co; C'''. Contradiction to the assumption of C's minimality. So C': R -> W in fr. Let W' be the write R reads from. So we have (W',R) in rf and (W',W) in co. Now there are two cases: a) R is satisfied by forwarding or b) in storage. a) Then (W',R) in po-loc because by definition of the satisfy-read-by-forwarding transition W' must be before R in the instruction tree. Now assume there is another write po-between W' and R to the address. Let W'' be the po-closest predecessor to R. Then when W'' propagates, by definition of pop_memory_write_propagate_action_restart_roots it restarts R since R did not read from W and not from a po-succcessor of W''. Contradiction to the assumption of TR' not having restarts. So W' must be the closest po-predecessor of R to the same address. Now there are two cases: a1) (W,W') in po or a2) (W',W) in po. a1) Then it is (W,W') and (W',R) in po, so (W,W') in po, But then W' by definition of pop_write_co_check can only propagate when W is propagated, so it is (W,W') in co. Contradiction to the assumption that (W',W) in co. a2) (W',W) in po. Since W' is the closest po-predecessor of R to the same address it must be (R,W) in po. Contradiction to (W,R) in po-loc. b) Then W' is the most recent write to the same address in memory when R is satisfied. By (W',W) in co W' propagates before W. So R is satisfied before W propagates. But when W propagates by definition of pop_memory_write_propagate_action_restart_roots R is restarted since R is to W's address and read from a different write but not by forwarding. Contradiction to the assumption that TR' has no restarts. ## Case P: R -> W Then C': W -> R in co?; rf. Proof. Assume otherwise. Then by type C' = C''; rf; fr; co+; rf = C''; rf; rf^-1; co+; rf = C''; co+; rf. Contradiction to the assumption of the minimality of C. So it is W -co?-> W' -rf-> R for some W'. Now there are two cases: a) W = W' or b) W <> W'. a) If R is satisfied by forwarding it is (W,R) in po, contradiction to the assumption. So assume R is satisfied in memory. Then W must be propagated before R is satisfied. But by pop_write_co_check R must be satisfied before W can propagated, and will not be restarted and satisfied again later, contradiction. b) Now there are two cases: (b1) R satisfied by forwarding or (b2) from storage. (b1) Then it is (W',R) in po-loc and by (R,W) in po-loc also (W',W) in po-loc. But then W' can only propagate when W is propagated and it must be (W',W) in co, contradiction. (b2) Then W' reaches memory before R is satisfied. By assumption of no restarts in TR' R is only satisfied once. By definition of pop_write_co_check R must be satisfied before W propagates. So W' propagates before W and it must be (W',W) in co, contradiction. # Case P: R -> R' Then C': R' -> R in fr; rf. Proof: assume otherwise. By type C' starts with fr and ends with rf. So either a) C' = fr; co+; rf or b) C' = fr; C''; rf; fr; C'''; rf. a) But then C' = rf^-1; co; co+; rf = rf^-1; co+; rf = rf^-1; co; rf = fr; rf. b) But then C' = fr; C''; rf; rf^-1; co; C'''; rf = fr; C''; co; C'''; rf. Contradiction to the assumption of C's minimality. Let (W,R) in rf and (W',R') in rf. Then it is (W',W) in co by definition of fr. W is either from the same thread as R and R' or not. Assume W is from the same thread as R and R'. Then it must be either (R,W) in po or (W,R) in po. If (R,W) in po then there is a cycle in rf;po-loc using only R and W that has already been dealt with in ## Case P: R -> W. If (W,R) in po, then it is (W,R') = (W,R);(R,R') in po-loc and there is a smaller cycle in po-loc;fr using only R' and W that has been dealt with in ## Case P: W -> R. So assume W is from a different thread than R and R'. Then R must read from W in memory. It cannot be (R,W') in po since otherwise there would be a smaller cycle of R, W', and W in po-loc;co;rf. Now there are two cases: R' is satisfied before R or after. If R' is satisfied before R, at the point where R is satisfied by definition of satisfy_read_action_restart_roots R' is restarted since it reads from a different write from R that is not po-after R, contradicting the assumption of no restarts in TR'. So assume R is satisfied before R'. By (W',W) in co it must be that W' propagates to memory before W does, and since R reads from W in memory, W propagates to memory before R satisfies. So W' propagates, and then W propagates, before R' satisfies. But then R' cannot read from W': it cannot read from W' by thread-local forwarding, since W' is already propagated when R' is satisfied; and it cannot read from W' in memory since W propagated to memory after W' before R' is satisfied, overwriting W'. Thus in TR' by definition of the thread semantics: po-loc | fr | co | rf acyclic. Now assume rmw & (fre; coe) non-empty. So there exists a successful load/store-exclusive pair (RE,WE) such that RE reads from a write W and there exists a write W' to the same address as RE but from a different thread such that (W,W'),(W',WE) in co. Then it must be that W propagates to memory before W', and W' propagates to memory before WE. Now there are two cases: a) RE is satisfied by thread-internal forwarding or b) not. a) Here there are again two cases: a1) At the point where W propagates to memory WE has promised its success or a2) not. a1) Then when W propagates to memory it adds t=(RE,[(W,_)]) to flat_ss_exclusive_reads and by definition of Flat-operational t can only be removed by the unique write-exclusive WE paired with RE. Since W' propagates after W and before WE, when W' propagates t is in flat_ss_exclusive_reads. But since RE and W' are from a different thread and to the same address W' cannot propagate, contradiction. a2) WE must promise its success before propagating. When WE promises its success W is already propagated to memory, by assumption. For WE to be able promise its success there can be no write coherence-after W (overlapping W) from a different thread than RE in memory. So W' cannot be propagated yet and must propagate after the promise-write-success transition of WE. WE's promise-write-success transition now adds (RE,[(W,_)]) to flat_ss_exclusive_reads. Since (RE,[(W,_)]) can only be removed from flat_ss_exclusive_reads when propagating WE, and since W' propagates before WE by assumption, this element must be in flat_ss_exclusive_reads when W' propagates. But since W' overlaps W and is from a different thread than RE, the write W' cannot propagate. Contradiction. b) Since RE reads from W in memory it has to enter memory after W. RE's satisfy-read-transition returns the last memory write written to its location, and since W' propagates after W, the read RE satisfies before W' propagates, which in turn is before WE propagates. Now there are two cases: (b1) when RE satisfies WE has promised its success or (b2) not. b1) Then when RE reads from W in memory it adds t=(RE,[(W,_)]) to flat_ss_exclusive_reads. Since t can only be removed when propagating WE and since W' propagates before WE, when W' propagates t must be in flat_ss_exclusive_reads. But since W' is to the same address as RE and W but from a different thread than RE, the write W' cannot propagate, contradiction. b2) WE must promise its success before propagating. For WE to promise its success after RE is satisfied there must be no write overlapping W from a different thread coherence-after W in memory. So W' cannot be propagated to memory when promising the success of WE. When WE promises its success the promise-write-success transition adds t=(RE,[(W,_)]) to flat_ss_exclusive_reads. Since t can only be removed when propagating WE and since W' propagates before WE, when W' propagates, t must be in flat_ss_eclusives. But since W' is to the same address as RE and W but from a different thread than RE, the write W' cannot propagate, contradiction. Therefore rmw & (fre;coe) empty.