Allowed by Flat-axiomatic => allowed by Flat-operational -------------------------------------------------------- ******************************************************************************** This proof assumes 0. Finite candidate executions and finite Flat traces. 1. That if a write-exclusive is successfully paired with a read-exclusive, then they are to the same address. 2. Axiomatic and Operational model use the same instruction semantics. 3. (E,E') in addr iff: E is a read and E' is a read or a write and E feeds into a register write that affects the address of E'. Note: this means addr(E,E') does not hold if E is a write-exclusive. 4. (E,E') in data iff: E is a read and E' is a write and E feeds into a register write that affects the value written by E'. Note: this means data(E,E') does not hold if E is a write-exclusive. 5. (E,E') in ctrl iff: E is a read and E' is a write and E feeds into a register write that affects a conditional branch or a computed branch instruction program-order before E'. Note: this means ctrl(E,E') does not hold if E is a write-exclusive. ******************************************************************************** Note: In herd a successful write-exclusive's success bit introduces dependencies *from its matching read-exclusive*: For example, in the program-fragment a: LDXR W1,[X0] b: STXR W2,W1,[X0] c: STR W2,[X3] where c stores the success bit of b into some location in memory, in herd it will be (a,c) in data. This does not match ARMv8's concurrency semantics where neither (a,c) in data nor (b,c) in data. ******************************************************************************** Lemma A. Let (po,rf,co,rmw) be a candidate execution of Flat-axiomatic. Let S be a linearisation of 'Order'. Let (R,W) be in rmw with R and W to the same address, WR the write such that (WR,R) in rf. Then 1. (WR,W) in S, 2. (R,W) in S, 3. and there is no W' with (WR,W'), (W',W) in S for a write W' to the same address as W but from a different thread than W. Proof. It is (R,W) in rmw <= RC_WC <= S, so 2 holds. Now either (WR,R) in rfi, then (WR,R) in po by "Internal" axiom and by (R,W) in po also (WR,W) in po. So (WR,W) in [W];po-loc;[W] <= WC_WC <= S (1). Or (WR,R) in rfe <= S and by (R,W) in rmw <= RC_WC <= S it is (WR,W) in S. Since co totally orders same-address writes and co <= S it is (WR,W) in co by (1). (3) Now assume there is such a write W'. Then since co <= S and co totally orders same-address writes, (WR,W') in co and (W',W) in co. But then (R,W') in fre and (W',W) in coe so that (R,W') in (fre;coe)&rmw. Contradiction to the "exclusives" axiom. Lemma B. Let (R,W) and (R',W') in rmw and R,R',W,W' all from the same thread to the same address. Let (RW,R),(RW',R') in rf. Then it cannot be RW = RW'. Proof. Assume RW = RW'. Then it is (RW,R),(RW,R'),(R,W),(R',W') in po. Assume without loss of generality (R,R') in po. Now because of (R,W) in rmw it cannot be (R,R'),(R',W). So po must be RW,R,W,R',W'. But then it must be (RW,W) in co, so (R',W) in fr. Contradiction to "internal" axiom: cycle in fr|po. Theorem. If a candidate execution (po,rf,co,rmw) is allowed in Flat-axiomatic-plus, then there exists a trace TR of Flat-operational where for each (W,R) in rf the read R is satisfied by the write W, where for each (W,W') in co the write W is propagated before W', where for each (R,W) in rmw R and W are a successful read-write-exclusive pair, and where the instruction tree viewed as a relation is po. Proof. Let S be a linearisation of 'Order' for this candidate execution and define rfs = Order & rf, rft = rf \ rfs, Rs = range rfs, Rt = range rft. Then rfe subset rfs since rfe <= Order. Now construct TR as follows: Start with an empty trace. Fetch all instructions one-by-one following po of the candidate-execution. In program order, for each write-exclusive W: - if W in range rmw, promise the success of the write-exclusive - if W not in (range rmw) promise the failure of the write-exclusive Take all enabled eager transitions. For each next element E of S: 1. - If E is a read R from Rt: satisfy R by forwarding from the unique W with (W,R) in rf using the transition T = T_mem_forward_write. - If E is a read R from Rs: satisfy R from Storage using the transition T = T_Flat_mem_satisfy_read with the write W with (W,R) in rf. - If E is a write W: Take the transition T = Propagate W to memory. - If B is a barrier: Take the transition T = Commit B. 2. Take all enabled eager non-fetch, non-barrier-commit transitions (these include the pseudocode-internal transitions, register reads and writes, initiating of reads and writes, completing loads and stores that have done all their respective reads and writes, and finishing instructions). Now show by induction on n: if TR is the trace constructed for S[0..n], then: 0. Assume (RE,[(W,_)]) in flat_ss_exclusive_reads after executing TR. Then (W,RE) in rf and there exists a write-exclusive WE such that (RE,WE) in rmw. Let WE be this write-exclusive. Then W in S[0..n] and WE not in S[0..n] and RE satisfied after TR. 1. For all satisfy-read transitions in TR for reads R (whether by forwarding or from memory), the write W it reads from is the write in (W,R) in the rf relation from the candidate execution above. For all write-propagate transitions in TR for writes W, they occur in TR consistent with the co relation in the candidate execution above. For each (R,W) in rmw in the candidate execution above, the Write-Exclusive W is successfully paired with R in TR. The instruction tree viewed as a relation is po. 2. TR involves no restarts and no discarding of branches in the instruction tree. 3. TR is a valid (partial) trace of Flat-operational. 4. If there are any enabled transitions after TR, they can only be one of: - satisfy memory read by forwarding - satisfy memory read from storage - propagate memory write - commit barrier - fetch next instruction Induction start, empty prefix of S. 0. After executing only fetch and promise-write-success/failure-transitions flat_ss_exclusive_reads is empty. 1. Since the fetch and promise-write-exclusive-success/failure-transitions do not satisfy reads or commit writes 1 holds for rf and co. Since by construction of TR the success of a write-exclusive W is promised if-and-only-if W in range rmw, 1 holds for rmw as well. By construction the instruction-tree unfolding matches po, since by proof of 2 the instruction tree is not pruned. 2. By definition of Flat-operational fetching and the promise-write-exclusive-success transition cannot cause restarts. The promise-write-exclusive-failure transition of a write WE can only restart reads that have read from WE. But by construction no reads are satisfied yet. Neither fetching nor the promise-write-exclusive-success/failure transition discard instruction tree branches. 3. By definition of the Flat-operational thread semantics for "constant" conditional branches both possible branch targets can be fetched, for computed branches any address can be fetched. 4. By definition, all the eager transitions (all except the ones in the inductive hypotheses) are taken. For each (R,W) in rmw it is (R,W) in po with no other exclusive instruction inbetween R and W in program order. So the promise-write-exclusive-success-transition for W is enabled. For all unsuccessful write-exclusive the promise-write-exclusive-failure-transition is enabled. Induction step: n => n+1. Now extend TR to TR' for the next element E = S[n+1]. ## Case E is a read R from Rt. 0. Only the promise-write-success, satisfy-read-in-memory and propagate-memory-write transitions change the flat_ss_exclusive_reads field. Since E is a read in Rt and satisfied by forwarding, 0 still holds by induction hypothesis. 1. By induction hypothesis the trace TR induces the rf, co, and rmw relations from the candidate execution, the instruction tree matches po, extending TR to TR' preserves this. Only have to show that extending TR to TR' for E preserves this as well. But this follows from the construction of TR': R is satisfied by W for (W,R) in rf. po, co and rmw are unaffected. 2. Have to show that the satisfaction of R does not cause the restart or discarding of any instructions. By definition T does not discard instruction tree branches. Let restart_roots be the result of calling pop_memory_read_action_restart_roots. These and their dependent reads or writes are the instructions that will be restarted as part of the memory read action. Then restart_roots is the set of all reads R' with (R,R') in po-loc for which R' has been satisfied, but by neither W nor a write po-between R and R' after executing TR'. Show that restart_roots for this memory read action T is empty. From that follows that extending the trace for E preserves (2). Assume there is such a read R' that is satisfied by a write W' that is neither W nor po-between R and R' after executing TR'. (In the following <= is the subset relation.) We have (R,R') in po-loc, and therefore (R,R') in [R];(po-loc\(rf^-1;rf)\(po; rf));[R] <= RS_RS <= S. By construction R' can only have already been satisfied if (R',R) in S. But already have (R,R') in S. Contradiction to the acyclicity of S. 3. To be able satisfy R from W by thread-internal forwarding, - 3.1 R must be in MOS_pending_mem_read state and read-request-cand hold, - 3.2 W must be in state MOS_potential_mem_write with data available - 3.3 W must be before R in the instruction tree - 3.4a there must be no W' to the same address in between R and W in the instruction tree - 3.4b there must be no R' to the same address in between R and W in the instruction tree that read from another write W <> W' - 3.5 if R is a read-acquire, then W is not a store-exclusive 3.1) After executing TR R is in MOS_pending_mem_read state and read-request-cand holds. The reads feeding into the register reads of R have been satisfied and therefore eagerly completed: for any read R' whose read value feeds into the address of R it is (R',R) in RS_RS <= Order <= S; therefore by construction R' is satisfied and eagerly completed after executing TR; the register writes of any write-exclusive WE's success bits feeding into the address of R are eagerly done, since by construction WE has been promised to be successful; any other, non-memory, instructions feeding into R's registers reads have been done eagerly. Remains to show that read-request-cand holds and that R is not already satisfied after TR. Then, since register read transitions and load initiation are eager, 3.1 follows. Show pop_memory_read_request_cand holds. This is true if a) All po-earlier DMB.SYs, ISBs, DMB.LDs are finished. b) If R is an acquire read then all po-earlier write releases are finished. c) All po-earlier acquire reads are completed. a) It is [DMB.SY|ISB|DMB.LD]; po; [R] <= BC_RS <= S. So by construction of TR all DMB.SYs, ISBs, DMB.LDs are committed after executing TR, and therefore eagerly finished after TR. b) It is [Rel]; po; [Acq] in WC_RS <= S. So by construction all po-earlier write releases are propagated if R is an acquire, and therefore eagerly completed and finished after TR. c) It is [Acq]; po; [R] <= RS_RS <= S. So by construction all acquires po-before R are satisfied and eagerly completed after executing TR. R cannot be satisfied after TR: by construction R could only be satisfied after TR if (R,R) in S, but S is acyclic. 3.2) After executing TR W is in state MOS_potential_mem_write state with data available. The memory reads feeding into the register reads of W have been completed: for any read R' that feeds into W's address or data it is (R',R) in RS_RS <= Order <= S; therefore by construction R' is satisfied after executing TR' and therefore eagerly completed; any register write of a write-exclusive's success bit feeding into the address or data of W has been done eagerly, since all write-exclusives have already promised success or failure after TR; all non-memory instructions feeding into W' registers reads have been done eagerly. To show W is in state MOS_potential_mem_write after executing TR, remains to show that W has not propagated yet. By construction W can only be propagated if (W,R) in S, so assume (W,R) in S. But by definition of rft it is (R,W) in S. Since S acyclic, contradiction: (W,W) in S. 3.3) This follows by definition of the operational model if (W,R) in po. As shown above, rfe subset rfs. Therefore, since rft = rf\rfs it (W,R) in rfi, so W and R from the same thread. And it must be (W,R) in po because the coherence axiom requires the acyclicity of po | rf. 3.4a) This follows by definition of the operational model if there is no (W,W') and (W',R) in po for a W' to the same address. There cannot be such (W,W') and (W',R) in po since by the coherence axiom in the candidate execution po | co | rf | fr acyclic. 3.4b) Assume there is such a read R'. Then by construction (R',R) in S, and by induction hypothesis (W',R') in rf of the candidate execution. Now in the candidate execution W and W' must be coherence related. If it is (W',W) in co, then (R',W');(W',W) in rf^-1;co=fr and there is a cycle in fr;po, contradiction to the assumption that Flat-axiomatic's 'internal' axiom holds for the candidate execution. So assume it is (W,W') in co. By assumption it is (W,R) in rf, so (R,W') in fr of the candidate execution. But then there is a cycle in fr;rf;po in the candidate execution. Contradiction to the assumption that the 'internal' axiom holds. 3.5) Assume R is a read-acquire and W a store-exclusive. But then by definition of rfs the read R is required to be in Rs, since [Xw];rfi;[A] in WC_RS <= Order <= S, contradiction. ## Case E is a read R from Rs. 0. The transition T for R only changes the flat_ss_exclusive_reads field in case it is a read-exclusive RE with a program-order following write-exclusive WE for which TR contains the promise-write-exclusive-success transition. By construction in that case (RE,WE) in rmw. Let W be the write with (W,RE) in rf. Then T adds (RE,[(W,_)]) to flat_ss_write_exclusives. It is (W,RE) in rfs <= S, so W in S[0..n+1]. By construction, after TR' the read R = RE is satisfied. Left to show that WE not in S[0..n+1]. This follows by (RE,WE) in fr <= S. Since R is a read, for the other elements in flat_ss_exclusive_reads 0 still holds. 1. By induction hypothesis the trace TR induces the rf and co relation from the candidate execution, the instruction tree matches po. Have to show that extending TR to TR' for E preserves this. But this follows from the construction: R is satisfied by W for (W,R) in rf. po, co, and rmw are unchanged. 2. Have to show that the satisfaction of R does not cause the restart or discarding of any instructions. By definition T does not discard instruction-tree branches. Let restart_roots be the result of calling pop_memory_read_action_restart_roots. These and their dependent reads or writes are the instructions that will be restarted as part of the memory read action. Then restart_roots is the set of all reads R' with (R,R') in po-loc for which R' has read neither from W nor from a write po-between R and R'. Show that after executing TR restart_roots for this memory read action T is empty. From that follows that extending the trace to TR' for E preserves (2). Assume there is such a read R' that is satisfied by a write W' that is neither W nor po-between R and R' after executing TR'. Then (R,R') in [R];(po-loc\(rf^-1;rf)\(po;rf));[R] <= RS_RS <= S. But by construction R' can only have been satisfied if it was already satisfied, so if (R',R) in S. Contradiction to the acyclicity of S. 3. To be able to satisfy R from W in memory after TR', - 3.1 R must be in MOS_pending_mem_read state and read-request-cand hold, - 3.2 W must be propagated to memory. - 3.3 There must not be W' to the same address that propagated after W - 3.4 If R is an exclusive read there must be no (RE',[(W',_)]) in flat_ss_exclusive_reads where RE' and R are to the same address but from different threads. 3.1) After executing TR R is in state MOS_pending_mem_read state and read-request-cand holds. The memory reads feeding into the register reads of R have been completed: for any read R' whose read value feeds into R's address it is (R',R) in RS_RS <= Order <= S; therefore by construction R' is satisfied and therefore eagerly completed after executing TR; the register writes of the success bit of any store-exclusive feeding into R's register reads have been done eagerly since all write-exclusives have promised success or failure after TR; all non-memory data dependent instructions feeding into R's registers reads have been done eagerly. Remains to show pop_memory_read_request_cand holds and that R is not already satisfied after TR. Since register read transitions and load initiation are eager, then 3.1 follows. Show pop_memory_read_request_cand holds. This is true if a) All po-earlier DMB.SYs, ISBs, DMB.LDs are finished. b) If R is an acquire read then all po-earlier write releases are finished. c) All po-earlier acquires are completed. a) It is [DMB.SY|ISB|DMB.LD]; po; [R] <= BC_RS <= S. So by construction of TR all DMB.SYs, ISBs, DMB.LDs are committed after executing TR, and therefore eagerly finished. b) It is [Rel]; po; [Acq] in WC_RS <= S. So by construction all po-earlier write releases are propagated if R is an acquire, and therefore eagerly completed and finished. c) It is [Acq]; po; [R] in RS_RS <= S. So by construction all acquires po-before R are satisfied and eagerly completed after executing TR. R cannot be satisfied after TR: by construction R could only have been satisfied if there was (R,R) in S. But S is acyclic. 3.2) After executing TR W is propagated. By definition of rfs it is (W,R) in rfs <= S, so by construction W has already been propagated. 3.3) Assume after executing TR there is a write W' to the same address that propagated after W. Then this must be because (W,W') and (W',R) in S. Since co <= S and co totally orders same-address writes it must be (W,W') in co and therefore (R,W') in fr <= S. But then S cyclic. Contradiction. 3.4) Assume R is a read exclusive R = RE and (RE,WE) in rmw. Assume there is such a (RE',[(W',_)]) in flat_ss_exclusive_reads. Then by induction hypothesis it is (W',RE') in rf, there is WE' such that (RE',WE') in rmw, W' in S[0..n], RE' satisfied after TR', and WE' not in S[0..n]. Then we have (W',RE) in S. Also have (W,RE) in rfs <= S. Since RE' is satisfied, by construction it must be RE' in S[0..n] and therefore (RE',RE) in S. And it is (RE,WE') in S. Also have (RE,WE) in fr <= S. Since by assumption RE and RE' from different threads it must also be that WE and WE' are from different threads. Now there are two cases: (a) (WE,WE') in S or (b) (WE',WE) in S. (a). Then it is (W',RE),(RE,WE),(WE,WE') in S. But WE and WE' are to the same address from different threads, contradicting Lemma A for rmw(R',WE'). (b). Then it is (W,RE),(RE,WE'),(WE',WE) in S. But WE and WE' are to the same address from different threads, contradicting Lemma A for rmw(R,WE). ####### Lemma 1: Let ST be a state of Flat-operational. For any read R: if R is satisfied, all writes and barriers E with (E,R) in (WC_RC | BC_RC); RC_RC* are finished, and all reads R' with (R',R) in RC_RC+ are satisfied, and all eager transitions taken, then R is finished. Proof by induction on the instruction tree. Induction start: if the instruction tree is empty there is no such read R in it. Induction hypothesis: the statement holds for some instruction tree IT. Show it also holds for adding a leaf II to IT. By the induction assumption the induction hypothesis holds for all satisfied reads R in IT also with II added. (The pop_finish_load_cand condition is unaffected by po-later instructions.) So remains to show that the induction hypothesis holds for II. Let R be the leaf II and assume R is satisfied, all writes and barriers E with (E,R) in (WC_RC | BC_RC); RC_RC* are finished and all reads R' with (R',R) in RC_RC+ are satisfied. Have to show that R is finished. For R to finish the following have to hold: a) commitDataflow. The memory reads feeding into the register reads of R have to be finished. Let R' be one such read. We have (R',R) in [R]; addr; [R] <= RC_RC. Therefore R' is satisfied in ST, all reads (R'',R') in RC_RC+ are satisfied since (R'',R'); (R',R) in RC_RC+; RC_RC = RC_RC+, and all writes E' are propagated and barriers E' committed and E' therefore eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC* since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*; RC_RC = (WC_RC | BC_RC); RC_RC*. Then by induction hypothesis R' is finished. b) commitControlflow. Conditional branches po-before R have to be finished. Assume there is an uncommitted branch instruction po-before R, let BR be the po-earliest one. BR's finish-transition is taken eagerly, so if BR is not finished, then it is because BR cannot finish yet. Since BR is the po-earliest unfinished branch its control flow is finished. So it must be BR's dataflow that is unfinished: there is at least one read R' that feeds into BR's register reads that is unfinished. But then (R',R) in [R]; ctrl; [R] <= RC_RC and therefore R' satisfied and all writes E' are propagated and barriers E' committed and E' therefore eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*, and all reads R'' with (R'',R') in RC_RC+ satisfied, since (R'',R'); (R',R) in RC_RC+. Therefore by induction hypothesis R' is finished, contradiction. c) All po-earlier DMB.SYs, DMB.LDs, ISBs are finished. Because of [DMB.SY|ISB|DMB.LD]; po; [R] in BC_RC by assumption all of these are finished in ST. d) All po-earlier acquire reads are finished. Let R' be a po-earlier acquire. So (R',R) in [Acq]; po; [R] <= RC_RC. Then R' satisfied. And all writes E' are propagated and barriers E' committed and E' therefore eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*, and all reads R'' with (R'',R') in RC_RC+ are satisfied, since (R'',R'); (R',R) in RC_RC+. Then by induction hypothesis R' is finished. e) If R is an acquire then all po-earlier releases are finished. This is true in ST, since by [Rel]; po; [Acq] in WC_RC these write releases are propagated and therefore eagerly finished. f) If the closest po-earlier write W to the same address was forwarded to R the memory reads feeding into W's register reads must be finished. Let R' be one such read. Then (R',R) in [R]; (addr|data); rfi in RC_RC. Therefore R' is satisfied, all reads R'' with (R'',R') in RC_RC+ are satisfied, since (R'',R'); (R',R) in RC_RC+, and all writes E' are propagated and barriers E' committed and therefore E' eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*. Then by induction hypothesis R' is finished. g) If the closest po-earlier write W to the same address was not forwarded to R it must be propagated. By assumption R is satisfied in ST. Now there are two cases: g1) (W,R) in rf or g2) not. g1) Since by assumption W was not forwarded to R, the read R read from W in memory, so W propagated. g2) Then (W,R) in [W];(po-R-loc\rf);[R] <= WC_RC, so by assumption W is finished and therefore propagated. h) All memory accesses between this write W and R have their address-feeding memory reads finished and are initiated. Let R' be such an address-feeding read. Then (R',R) in [R]; addr; po-no-W-loc; [R] in RC_RC <= S. Therefore R' is satisfied, all reads R'' with (R'',R') in RC_RC+ are satisfied, since (R'',R'); (R',R) in RC_RC+, and all writes E' are propagated and barriers E' committed and therefore E' eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*. Then by induction hypothesis R' is finished. Moreover, since all such R' are satisfied, all memory accesses between W and R have eagerly done the register reads necessary to determine their address and eagerly initiated. i) All reads R' to the same address between W and R must be not-restartable. Let R' be one such read. Then (R',R) in [R]; po-R-loc; [R] in RC_RC. Therefore R' is satisfied, all reads R'' with (R'',R') in RC_RC+ are satisfied, since (R'',R'); (R',R) in RC_RC+, and all writes E' are propagated and barriers E' committed and therefore E' finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*. Then by induction hypothesis R' is finished. Now have: R is satisfied, so the load is eagerly completed. Since all the conditions for finishing R hold in ST and the read-finish transition is an eager transition, R is finished and the induction hypothesis holds. ####### ## Case E is a write W 0. Assume t = (RE',[(W',_)]) in flat_ss_exclusive_reads after executing TR'. Now there are two possibilities: a) t in flat_ss_exclusive_reads before T or b) not. a) Then by induction hypothesis there it is (W',RE') in rf and there exists WE' such that (RE',WE') in rmw with W' in S[0..n] and WE' not in S[0..n] and RE' satisfied after TR'. Have to show T preserves this. rf and rmw are unaffected by T, after T it will still be W' in S[0..n+1] and RE' still satisfied. So have to show WE' not in S[0..n+1]. Assume it is. Then W = WE'. But then T = T_propagate_write W is annotated with RE' and (RE',[(W',_)]) deleted by T from flat_ss_exclusive_reads, contradiction. b) Then RE' is a read exclusive that was satisfied by thread-internal forwarding from W and it is paired with a write-exclusive WE' for which TR contains the promise-write-success transition. Then we have (W,RE') in rf. By construction it is (RE',WE') in rmw. RE' is satisfied and it is W in S[0..n+1]. Still have to show that WE' not in S[0..n+1]. By (RE',WE') in rmw it is (RE',WE') in po. Since RE' read from W by thread-internal forwarding it is also (W,RE') in po, so (W,WE') in po and they are writes to the same location. But then it is (W,WE') in [W];po-loc;[W] <= WC_WC <= S, and since W = S[n+1] it must be that WE' not in S[0..n+1]. 1. By induction hypothesis the trace TR induces the rf, co, and rmw relation from the candidate execution, the instruction tree matches po. Have to show that extending TR' to TR'' for E preserves this. Since in the operational model the coherence relation is determined by the order in which writes reach memory, have to show: a) for all (W',W) in co W' has already been propagated after TR and b) for all W' to the same address as W that have been propagated before W in TR it is (W',W) in co. a) Since co <= S by construction of TR all such writes W' have already been propagated. b) Let W' be any same-address write that is propagated after in TR. Then by construction of TR it must be (W',W) in S. But since co <= S, since co totally orders all same-address writes, and since S is acyclic W' it must be (W',W) in co. po, rf, and rmw are unaffected by T, so 1 follows. 2. Have to show that the propagation of W does not cause the restart or discarding of any instructions. By definition T does not discard instruction-tree branches. Let restart_roots be the result of calling propagate_write_action_restart_roots. These and their dependent reads or writes are the instructions that will be restarted as part of the memory write propagate action. Then restart_roots is the set of all reads R with (W,R) in po-loc for which R has been satisfied, but neither from W nor from a write po-between W and R after TR. Show that after executing TR restart_roots for this memory write action is empty. From that follows that extending the trace for E preserves (2). Assume there is such a read R that is satisfied by a write W' that is neither W nor po-between W and R after executing TR. Then (W,R) in [W];(po-loc\rf\(po;rf));[R] <= WC_RS <= S. By construction of TR R can only have already been satisfied if (R,W) in S. Contradiction to the acyclicity of S. 3. To propagate W, its store has to be committed and pop_write_co_check has to hold. Since write commitment transitions are eager only have to show: - 3.1 W must be in state MOS_potential_mem_write with its data available - 3.2 pop_commit_store_cand and pop_write_co_check must hold - 3.3 There exists no (RE',[(W',_)]) in flat_ss_exclusive_reads after TR' where RE' and W are to the same address but from different threads. 3.1) After executing TR W is in state MOS_potential_mem_write state. The memory reads feeding into the register reads of W have been satisfied: for any read R whose read value feeds into the address and data of W it is (R,W) in RC_WC <= order <= S; therefore by construction of TR R is satisfied and eagerly completed after executing TR; the register writes of the success bit of any write-exclusives are done eagerly since all write-exclusives have promised their success or failure; all non-memory data dependent instructions that feed into W's register reads have been done eagerly. 3.2) a) All po-earlier DMB.SYs, ISBs, DMB.LDs, DMB.STs are finished. b) If W is a release then all po-earlier reads and writes are finished. c) All po-earlier read acquires are finished. d) commitDataflow holds. e) commitControlflow holds. f) pop_write_co_check holds. g) If W is a successful store exclusive WE paired with a load exclusive RE, then RE is finished, and if RE read from a same-thread write W', that write is propagated. a) It is [F]; po; [W] <= BC_WC <= Order <= S. So by construction of TR all po-earlier DMB.SYs, ISBs, DMB.LDs, DMB.STs are committed, and therefore eagerly finished. b) Assume W is a write release. Let W' be a po-earlier write. Then by (W',W) in [W];po;[Rel] <= WC_WC <= S and by construction of TR, W' is propagated and therefore eagerly finished. Let R be a po-earlier read. Then by (R,W) in [R];po;[Rel] <= RC_WC <= S, and by construction of TR R is satisfied. Moreover, by construction of TR all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,W) in RC_RC+; RC_WC <= S, and all writes E' are propagated and barriers E' are committed and E' therefore eagerly finished for (E',R) in (WC_RC | BC_RC); RC_RC*, since (E',R);(R,W) in (WC_RC | BC_RC); RC_RC*; RC_WC <= S. Then by Lemma 1 R is finished. c) Let R be a read acquire po-before W. Then (R,W) in [Acq];po;[W] <= RC_WC <= S. So by construction of TR R is satisfied, all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,W) in RC_RC+; RC_WC <= S, and all writes E' are propagated and barriers E' are committed and E' therefore eagerly finished for (E',R) in (WC_RC | BC_RC); RC_RC*, since (E',R); (R,W) in (WC_RC | BC_RC); RC_RC*; RC_WC. Then by Lemma 1, R is finished. d) The memory reads feeding into the register reads of W have to be finished. Let R be one such read. Then R's read value feeds into the address or data of W and we have (R,W) in [R];(addr|data);[W] <= RC_WC <= S. Then by construction R is satisfied and eagerly completed, all reads (R',R) in RC_RC+ are satisfied since (R',R); (R,W) in RC_RC+; RC_WC and all writes E are propagated and barriers E committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC* since (E,R); (R,W) in (WC_RC | BC_RC); RC_RC*; RC_WC. Then by Lemma 1 R is finished. e) Conditional branches po-before W have to be finished. Assume there is an unfinished branch instruction po-before W, let BR be the po-earliest one. BR's finish-transition is taken eagerly, so if BR is unfinished, then it is because BR cannot finish yet. Since BR is the po-earliest unfinished branch its control flow is finished. So it must be BR's dataflow that is unfinished: there is at least one read R that feeds into BR's register reads that is unfinished. But then (R,W) in [R];ctrl;[W] <= RC_WC <= S. So by construction of TR R is satisfied, all writes E are propagated and barriers E are committed for (E,R) in (WC_RC | BC_RC); RC_RC* are therefore eagerly finished, since (E,R); (R,W) in (WC_RC | BC_RC); RC_RC*; RC_WC <= S, and all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,W) in RC_RC*; RC_WC <= S. So by Lemma 1 R is finished. f0) All po-previous same-address writes have to be propagated. Have [W];po-loc;[W] in WC_WC <= S, so all po-earlier writes to W's address are propagated and therefore eagerly finished. f1) All po-previous memory accesses have their address-feeding memory reads finished and are initiated. Let R be such an address-feeding read. Then (R,W) in [R];addr;po;[W] <= RC_WC <= S. So by construction R is satisfied, all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,W) in RC_RC+; RC_WC <= S, and all writes E' are propagated and barriers E' are committed and E' therefore eagerly finished for (E',R) in (WC_RC | BC_RC); RC_RC*, since (E',R); (R,W) in (WC_RC | BC_RC); RC_RC*; RC_WC. Then by Lemma 1 R is finished. Moreover, since all such R are satisfied, all memory accesses po-before W have eagerly done the register reads necessary to determine their address and have eagerly initiated. f2) All po-previous memory reads to the same address must be satisfied, and unrestartable. Let R be one such read. Then it is (R,W) in fr <= S and by construction of S the read R is satisfied. (By f0 have that all same-address writes po-before W are propagated.) Remains to show that all po-earlier memory reads to the same address are not-restartable. Show instead that they are all already finished by induction on the program order prefix of W. Induction start. For the empty program-order prefix all reads to W's location are trivially finished. Induction hypothesis. Assume the statement holds for PREFIX, show appending an instruction II to the prefix preserves the statement. Only have to show that if II is a same-address read it is finished. For R to finish the following have to hold: f21) commitDataflow The memory reads feeding into the register writes R reads from have to be finished. Since R's register reads are all used to determine the address of R these memory reads are all finished by proof of f1). f22) commitControlflow Since any instruction that R is control-flow dependent on, W is also control-flow dependent on, these instructions have to be finished by proof of e). f23) All po-earlier DMB.SYs, DMB.LDs, ISBs are finished. By proof of a) all po-earlier DMB.SYs, DMB.LDs, ISBs are finished. f24) All po-earlier acquire reads are finished. By proof of c) all po-earlier read acquires are finished. f25) If R is an acquire then all po-earlier releases are finished. If R is an acquire then it has to be finished by proof of c). f26) If the closest po-earlier write W' to the same address was forwarded to R the memory reads feeding into the register reads of W' must be finished. As proved above all po-earlier writes W' to the same address as W are propagated, hence committed, which includes having the memory reads feeding into W' finished. f27) If the closest po-earlier write W' to the same address was not forwarded to R it must be propagated. As proved above all po-earlier writes W' to the same address as W are propagated. f28) All memory accesses between W' and R have their address-feeding memory reads finished. As proved for f1) all memory accesses po-before W have their address-feeding memory reads finished. f29) All reads R' to the same address between W' and R must not be restartable anymore. Since all these reads are po-before W and to the same address as W by the induction hypothesis they are finished, so not restartable anymore. Now since R is satisfied it can be completed. Since pop_finish_load_cand holds R can finish. Since memory-read-finish transitions are eager, R is finished. f3) Any read R that was partially satisfied from W must have requested its unsatisfied slices from storage. By assumption all memory accesses have the same size, so if R was partially satisfied from W it is completely satisfied and there are no such unsatisfied slices. g) Assume W is a successful store exclusive WE that is paired with a load exclusive RE. Then RE must be po-before WE. By proof of f2) all po-earlier memory reads to the same address are already finished, so RE finished. Assume RE read from a same-thread write W'. Then it is (W',RE),(RE,W) in po. And since W' and W have the same address, by proof of f0 the write W' is propagated. 3.3) Assume there exists (RE',[(W',_)]) in flat_ss_exclusive_reads after TR' where RE' and W are to the same address but from different threads. Then by 0 of the induction hypothesis it is (W',RE') in rf and there exists a write-exclusive WE' such that (RE',WE') in rmw, W' in S[0..n], and WE' not in S[0..n]. Since RE' and W from different threads by assumption, also W and WE' from different threads, so WE' <> W and WE' not in S[0..n+1] either, and therefore (W,WE') in S. Since it is W' in S[0..n] it is also (W',W) in S. So (W',W),(W,WE') in S where W and WE' are from different threads but to the same address. Contradiction to Lemma A for rmw(RE',WE'). ## Case E is a barrier B 0. Committing a barrier does not change flat_ss_exclusive_reads, and since E is not a write, 0 still holds. 1. By induction hypothesis the trace TR for E induces the po, rf, co, and rmw relations from the candidate execution. Since B does not fetch, satisfy reads, propagate writes, or promise the success/failure of write-exclusives this is still true for TR'. 2. Have to show that committing B does not cause the restart of any instructions or discard instruction-tree branches. But this follows from the definition of pop_commit_barrier_action. 3. TR' is a valid trace of Flat-operational. Have to show that committing B is enabled after TR. a) commitDataflow Since B has no data this is vacuously true. b) commitControlflow Conditional branches po-before B have to be finished. Assume there is an unfinished branch instruction po-before B, let BR be the po-earliest one. BR's finish transition is taken eagerly, so if BR is unfinished, then it is because BR cannot finish yet. Since BR is the po-earliest unfinished branch its control flow is finished. So it must be BR's dataflow that is unfinished: there is at least one read R that feeds into BR's register reads that is unfinished. But then (R,B) in [R]; ctrl; [F] <= RC_BC <= S. So by construction of TR R is satisfied, all writes E are propagated and barriers E are committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*, since (E,R); (R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC, and all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC*; RC_BC <= S. Then R is finished. c) If B is a DMB.SY all barriers, reads, and writes, are finished. Let B' be a barrier po-before B. Then (B',B) in [F]; po; [DMB.SY] <= BC_BC <= S, so by construction of TR B is committed and therefore eagerly finished. Let W be a po-earlier write. then (W,B) in [W]; po; [DMB.SY] <= WC_BC <= S. So by construction of TR W is propagated and therefore eagerly completed and finished. Let R be a po-earlier read. Then (R,B) in [R]; po; [DMB.SY] <= RC_BC <= S. So by construction R is satisfied, all writes E are propagated and barriers E are committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*, since (E,R); (R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC <= S, and all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC+; RC_BC. Then by Lemma 1 R is finished. d) All po-earlier DMB.SYs are finished. Let B' be a DMB.SY po-before B. Then (B',B) in [DMB.SY]; po; [F] <= BC_BC <= S. So by construction of TR' B' is committed and therefore eagerly finished. e) If B is an ISB all po-earlier memory accesses have their address-feeding memory reads finished and have initiated. Let R be a memory read feeding into the address of a po-earlier memory access. Then (R,B) in [R]; addr; po; [ISB] <= RC_BC <= S. So by construction R is satisfied, all writes E are propagated and barriers E committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*, since (E,R); (R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC <= S, and all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC+; RC_BC. Then by Lemma 1 R is finished. Moreover, since all such R are satisfied, all memory accesses program-order before B have eagerly done the register reads necessary to determine their address and have eagerly initiated. f) If B is a DMB.LD all po-earlier memory loads are finished. Let R be a po-earlier read. Then (R,B) in [R]; po; [DMB.LD] <= RC_BC <= S. So by construction R is satisfied, all writes E are propagated and barriers E are committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*, since (E,R); (R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC <= S, and all reads R' with (R',R) in RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC+; RC_BC. Then by Lemma 1 R is finished. g) If B is a DMB.ST all po-earlier memory stores are finished. Let W be a po-earlier write. Then (W,B) in [W]; po; [DMB.ST] <= WC_BC <= S. So by construction of TR the write W is propagated and therefore eagerly finished. Repeatedly extend the trace TR' to TR'' for enabled eager transitions T, until there are no more enabled eager transitions. 0. Only the promise-write-success, satisfy-read-in-memory and propagate-memory-write transitions change the flat_ss_exclusive_reads field. These are not eager, so flat_ss_exclusive_reads unchanged. And since the prefix of S has not changed: 0 still holds. 1. By definition of transition-eagerness, T does not fetch, satisfy a read, propagate a write, or promise success or failure of a write exclusive, so 1 is preserved. 2. Restarts are caused only by the promise-write-exclusive-failure, satisfy-read-by-forwarding, satisfy-read-from-memory, and propagate-memory-write transitions. By definition these transitions are not eager, so T does not cause restarts. Only finishing of branch instructions causes instruction tree branches to be discarded. A branch BR can only finish when the memory reads feeding into its register reads are finshed. By induction hypothesis for these reads the induced reads-from relation is a subset of rf of the candidate execution, so BR's successor is termined as in the candidate execution's po, and by induction hypothesis the instruction tree viewed as a relation matches po. Therefore, finishing such a branch BR does not discard any instruction tree branches. 3. Since by assumption T is enabled after TR', TR'' is a valid trace. When no more eager transitions are enabled, we establish the inductive hypothesis 4: 4. The eager transitions have all been taken by construction.