Allowed by Flat-axiomatic => allowed by Flat-operational
--------------------------------------------------------
********************************************************************************
This proof assumes
0. Finite candidate executions and finite Flat traces.
1. That if a write-exclusive is successfully paired with a read-exclusive, then
they are to the same address.
2. Axiomatic and Operational model use the same instruction semantics.
3. (E,E') in addr iff: E is a read and E' is a read or a write and E feeds into
a register write that affects the address of E'. Note: this means addr(E,E')
does not hold if E is a write-exclusive.
4. (E,E') in data iff: E is a read and E' is a write and E feeds into a register
write that affects the value written by E'. Note: this means data(E,E') does not
hold if E is a write-exclusive.
5. (E,E') in ctrl iff: E is a read and E' is a write and E feeds into a register
write that affects a conditional branch or a computed branch instruction
program-order before E'. Note: this means ctrl(E,E') does not hold if E is a
write-exclusive.
********************************************************************************
Note: In herd a successful write-exclusive's success bit introduces dependencies
*from its matching read-exclusive*: For example, in the program-fragment
a: LDXR W1,[X0]
b: STXR W2,W1,[X0]
c: STR W2,[X3]
where c stores the success bit of b into some location in memory, in herd it
will be (a,c) in data.
This does not match ARMv8's concurrency semantics where neither (a,c)
in data nor (b,c) in data.
********************************************************************************
Lemma A.
Let (po,rf,co,rmw) be a candidate execution of Flat-axiomatic. Let
S be a linearisation of 'Order'. Let (R,W) be in rmw with R and W
to the same address, WR the write such that (WR,R) in rf. Then
1. (WR,W) in S,
2. (R,W) in S,
3. and there is no W' with (WR,W'), (W',W) in S for a write W' to
the same address as W but from a different thread than W.
Proof.
It is (R,W) in rmw <= RC_WC <= S, so 2 holds.
Now either (WR,R) in rfi, then (WR,R) in po by "Internal" axiom and by (R,W) in
po also (WR,W) in po. So (WR,W) in [W];po-loc;[W] <= WC_WC <= S (1).
Or (WR,R) in rfe <= S and by (R,W) in rmw <= RC_WC <= S it is (WR,W) in S.
Since co totally orders same-address writes and co <= S it is (WR,W) in co by
(1).
(3) Now assume there is such a write W'. Then since co <= S and co totally
orders same-address writes, (WR,W') in co and (W',W) in co. But then (R,W') in
fre and (W',W) in coe so that (R,W') in (fre;coe)&rmw. Contradiction to the
"exclusives" axiom.
Lemma B.
Let (R,W) and (R',W') in rmw and R,R',W,W' all from the same thread to the
same address. Let (RW,R),(RW',R') in rf. Then it cannot be RW = RW'.
Proof.
Assume RW = RW'. Then it is (RW,R),(RW,R'),(R,W),(R',W') in po. Assume
without loss of generality (R,R') in po. Now because of (R,W) in rmw it
cannot be (R,R'),(R',W). So po must be RW,R,W,R',W'. But then it must be
(RW,W) in co, so (R',W) in fr. Contradiction to "internal" axiom: cycle
in fr|po.
Theorem.
If a candidate execution (po,rf,co,rmw) is allowed in Flat-axiomatic-plus,
then there exists a trace TR of Flat-operational where for each (W,R) in rf
the read R is satisfied by the write W, where for each (W,W') in co the write
W is propagated before W', where for each (R,W) in rmw R and W are a
successful read-write-exclusive pair, and where the instruction tree viewed
as a relation is po.
Proof.
Let S be a linearisation of 'Order' for this candidate execution and define rfs
= Order & rf, rft = rf \ rfs, Rs = range rfs, Rt = range rft. Then rfe subset
rfs since rfe <= Order. Now construct TR as follows:
Start with an empty trace.
Fetch all instructions one-by-one following po of the candidate-execution. In
program order, for each write-exclusive W:
- if W in range rmw, promise the success of the write-exclusive
- if W not in (range rmw) promise the failure of the write-exclusive
Take all enabled eager transitions.
For each next element E of S:
1. - If E is a read R from Rt: satisfy R by forwarding from the unique W with
(W,R) in rf using the transition T = T_mem_forward_write.
- If E is a read R from Rs: satisfy R from Storage using the
transition T = T_Flat_mem_satisfy_read with the write W with (W,R) in rf.
- If E is a write W: Take the transition T = Propagate W to memory.
- If B is a barrier: Take the transition T = Commit B.
2. Take all enabled eager non-fetch, non-barrier-commit transitions
(these include the pseudocode-internal transitions, register reads and
writes, initiating of reads and writes, completing loads and stores
that have done all their respective reads and writes, and finishing
instructions).
Now show by induction on n: if TR is the trace constructed for S[0..n], then:
0. Assume (RE,[(W,_)]) in flat_ss_exclusive_reads after executing TR. Then
(W,RE) in rf and there exists a write-exclusive WE such that (RE,WE) in
rmw. Let WE be this write-exclusive. Then W in S[0..n] and WE not in S[0..n]
and RE satisfied after TR.
1. For all satisfy-read transitions in TR for reads R (whether by forwarding or
from memory), the write W it reads from is the write in (W,R) in the rf
relation from the candidate execution above. For all write-propagate
transitions in TR for writes W, they occur in TR consistent with the co
relation in the candidate execution above. For each (R,W) in rmw in the
candidate execution above, the Write-Exclusive W is successfully paired with
R in TR. The instruction tree viewed as a relation is po.
2. TR involves no restarts and no discarding of branches in the instruction
tree.
3. TR is a valid (partial) trace of Flat-operational.
4. If there are any enabled transitions after TR, they can only be one of:
- satisfy memory read by forwarding
- satisfy memory read from storage
- propagate memory write
- commit barrier
- fetch next instruction
Induction start, empty prefix of S.
0. After executing only fetch and promise-write-success/failure-transitions
flat_ss_exclusive_reads is empty.
1. Since the fetch and promise-write-exclusive-success/failure-transitions do
not satisfy reads or commit writes 1 holds for rf and co. Since by
construction of TR the success of a write-exclusive W is promised
if-and-only-if W in range rmw, 1 holds for rmw as well. By construction the
instruction-tree unfolding matches po, since by proof of 2 the instruction
tree is not pruned.
2. By definition of Flat-operational fetching and the
promise-write-exclusive-success transition cannot cause restarts. The
promise-write-exclusive-failure transition of a write WE can only restart
reads that have read from WE. But by construction no reads are satisfied yet.
Neither fetching nor the promise-write-exclusive-success/failure transition
discard instruction tree branches.
3. By definition of the Flat-operational thread semantics for "constant"
conditional branches both possible branch targets can be fetched, for
computed branches any address can be fetched.
4. By definition, all the eager transitions (all except the ones in
the inductive hypotheses) are taken.
For each (R,W) in rmw it is (R,W) in po with no other exclusive instruction
inbetween R and W in program order. So the
promise-write-exclusive-success-transition for W is enabled. For all
unsuccessful write-exclusive the promise-write-exclusive-failure-transition
is enabled.
Induction step: n => n+1.
Now extend TR to TR' for the next element E = S[n+1].
## Case E is a read R from Rt.
0. Only the promise-write-success, satisfy-read-in-memory and
propagate-memory-write transitions change the flat_ss_exclusive_reads
field. Since E is a read in Rt and satisfied by forwarding, 0 still holds by
induction hypothesis.
1. By induction hypothesis the trace TR induces the rf, co, and rmw relations
from the candidate execution, the instruction tree matches po, extending TR to
TR' preserves this. Only have to show that extending TR to TR' for E preserves
this as well. But this follows from the construction of TR': R is satisfied by
W for (W,R) in rf. po, co and rmw are unaffected.
2. Have to show that the satisfaction of R does not cause the restart or
discarding of any instructions. By definition T does not discard instruction
tree branches. Let restart_roots be the result of calling
pop_memory_read_action_restart_roots. These and their dependent reads or writes
are the instructions that will be restarted as part of the memory read action.
Then restart_roots is the set of all reads R' with (R,R') in po-loc for which R'
has been satisfied, but by neither W nor a write po-between R and R' after
executing TR'.
Show that restart_roots for this memory read action T is empty. From that
follows that extending the trace for E preserves (2).
Assume there is such a read R' that is satisfied by a write W' that is neither W
nor po-between R and R' after executing TR'.
(In the following <= is the subset relation.) We have (R,R') in po-loc, and
therefore (R,R') in [R];(po-loc\(rf^-1;rf)\(po; rf));[R] <= RS_RS <= S. By
construction R' can only have already been satisfied if (R',R) in S. But already
have (R,R') in S. Contradiction to the acyclicity of S.
3. To be able satisfy R from W by thread-internal forwarding,
- 3.1 R must be in MOS_pending_mem_read state and read-request-cand hold,
- 3.2 W must be in state MOS_potential_mem_write with data available
- 3.3 W must be before R in the instruction tree
- 3.4a there must be no W' to the same address in between R and W in the
instruction tree
- 3.4b there must be no R' to the same address in between R and W in the
instruction tree that read from another write W <> W'
- 3.5 if R is a read-acquire, then W is not a store-exclusive
3.1) After executing TR R is in MOS_pending_mem_read state and read-request-cand
holds.
The reads feeding into the register reads of R have been satisfied and therefore
eagerly completed: for any read R' whose read value feeds into the address of R
it is (R',R) in RS_RS <= Order <= S; therefore by construction R' is satisfied
and eagerly completed after executing TR; the register writes of any
write-exclusive WE's success bits feeding into the address of R are eagerly
done, since by construction WE has been promised to be successful; any other,
non-memory, instructions feeding into R's registers reads have been done
eagerly.
Remains to show that read-request-cand holds and that R is not already satisfied
after TR. Then, since register read transitions and load initiation are eager,
3.1 follows.
Show pop_memory_read_request_cand holds. This is true if
a) All po-earlier DMB.SYs, ISBs, DMB.LDs are finished.
b) If R is an acquire read then all po-earlier write releases are finished.
c) All po-earlier acquire reads are completed.
a) It is [DMB.SY|ISB|DMB.LD]; po; [R] <= BC_RS <= S. So by construction of TR
all DMB.SYs, ISBs, DMB.LDs are committed after executing TR, and therefore
eagerly finished after TR.
b) It is [Rel]; po; [Acq] in WC_RS <= S. So by construction all po-earlier write
releases are propagated if R is an acquire, and therefore eagerly completed and
finished after TR.
c) It is [Acq]; po; [R] <= RS_RS <= S. So by construction all acquires po-before
R are satisfied and eagerly completed after executing TR.
R cannot be satisfied after TR: by construction R could only be satisfied after
TR if (R,R) in S, but S is acyclic.
3.2) After executing TR W is in state MOS_potential_mem_write state with data
available.
The memory reads feeding into the register reads of W have been completed: for
any read R' that feeds into W's address or data it is (R',R) in RS_RS <= Order
<= S; therefore by construction R' is satisfied after executing TR' and
therefore eagerly completed; any register write of a write-exclusive's success
bit feeding into the address or data of W has been done eagerly, since all
write-exclusives have already promised success or failure after TR; all
non-memory instructions feeding into W' registers reads have been done eagerly.
To show W is in state MOS_potential_mem_write after executing TR, remains to
show that W has not propagated yet.
By construction W can only be propagated if (W,R) in S, so assume (W,R) in
S. But by definition of rft it is (R,W) in S. Since S acyclic, contradiction:
(W,W) in S.
3.3) This follows by definition of the operational model if (W,R) in po. As
shown above, rfe subset rfs. Therefore, since rft = rf\rfs it (W,R) in rfi, so W
and R from the same thread. And it must be (W,R) in po because the coherence
axiom requires the acyclicity of po | rf.
3.4a) This follows by definition of the operational model if there is no (W,W')
and (W',R) in po for a W' to the same address. There cannot be such (W,W') and
(W',R) in po since by the coherence axiom in the candidate execution po | co |
rf | fr acyclic.
3.4b) Assume there is such a read R'.
Then by construction (R',R) in S, and by induction hypothesis (W',R') in rf of
the candidate execution. Now in the candidate execution W and W' must be
coherence related. If it is (W',W) in co, then (R',W');(W',W) in rf^-1;co=fr and
there is a cycle in fr;po, contradiction to the assumption that Flat-axiomatic's
'internal' axiom holds for the candidate execution.
So assume it is (W,W') in co. By assumption it is (W,R) in rf, so (R,W') in fr
of the candidate execution. But then there is a cycle in fr;rf;po in the
candidate execution. Contradiction to the assumption that the 'internal' axiom
holds.
3.5) Assume R is a read-acquire and W a store-exclusive. But then by definition
of rfs the read R is required to be in Rs, since [Xw];rfi;[A] in WC_RS <= Order
<= S, contradiction.
## Case E is a read R from Rs.
0. The transition T for R only changes the flat_ss_exclusive_reads field in case
it is a read-exclusive RE with a program-order following write-exclusive WE for
which TR contains the promise-write-exclusive-success transition. By
construction in that case (RE,WE) in rmw. Let W be the write with (W,RE) in
rf. Then T adds (RE,[(W,_)]) to flat_ss_write_exclusives. It is (W,RE) in rfs <=
S, so W in S[0..n+1]. By construction, after TR' the read R = RE is
satisfied. Left to show that WE not in S[0..n+1]. This follows by (RE,WE) in fr
<= S. Since R is a read, for the other elements in flat_ss_exclusive_reads 0
still holds.
1. By induction hypothesis the trace TR induces the rf and co relation
from the candidate execution, the instruction tree matches po. Have to
show that extending TR to TR' for E preserves this. But this follows
from the construction: R is satisfied by W for (W,R) in rf. po, co,
and rmw are unchanged.
2. Have to show that the satisfaction of R does not cause the restart or
discarding of any instructions. By definition T does not discard
instruction-tree branches. Let restart_roots be the result of calling
pop_memory_read_action_restart_roots. These and their dependent reads or writes
are the instructions that will be restarted as part of the memory read action.
Then restart_roots is the set of all reads R' with (R,R') in po-loc for which R'
has read neither from W nor from a write po-between R and R'.
Show that after executing TR restart_roots for this memory read action T is
empty. From that follows that extending the trace to TR' for E preserves (2).
Assume there is such a read R' that is satisfied by a write W' that is neither W
nor po-between R and R' after executing TR'.
Then (R,R') in [R];(po-loc\(rf^-1;rf)\(po;rf));[R] <= RS_RS <= S. But by
construction R' can only have been satisfied if it was already satisfied, so if
(R',R) in S. Contradiction to the acyclicity of S.
3. To be able to satisfy R from W in memory after TR',
- 3.1 R must be in MOS_pending_mem_read state and read-request-cand hold,
- 3.2 W must be propagated to memory.
- 3.3 There must not be W' to the same address that propagated after W
- 3.4 If R is an exclusive read there must be no (RE',[(W',_)]) in
flat_ss_exclusive_reads where RE' and R are to the same address but from
different threads.
3.1) After executing TR R is in state MOS_pending_mem_read state and
read-request-cand holds.
The memory reads feeding into the register reads of R have been completed: for
any read R' whose read value feeds into R's address it is (R',R) in RS_RS <=
Order <= S; therefore by construction R' is satisfied and therefore eagerly
completed after executing TR; the register writes of the success bit of any
store-exclusive feeding into R's register reads have been done eagerly since all
write-exclusives have promised success or failure after TR; all non-memory data
dependent instructions feeding into R's registers reads have been done eagerly.
Remains to show pop_memory_read_request_cand holds and that R is not already
satisfied after TR. Since register read transitions and load initiation are
eager, then 3.1 follows.
Show pop_memory_read_request_cand holds. This is true if
a) All po-earlier DMB.SYs, ISBs, DMB.LDs are finished.
b) If R is an acquire read then all po-earlier write releases are finished.
c) All po-earlier acquires are completed.
a) It is [DMB.SY|ISB|DMB.LD]; po; [R] <= BC_RS <= S. So by construction of TR
all DMB.SYs, ISBs, DMB.LDs are committed after executing TR, and therefore
eagerly finished.
b) It is [Rel]; po; [Acq] in WC_RS <= S. So by construction all po-earlier write
releases are propagated if R is an acquire, and therefore eagerly completed and
finished.
c) It is [Acq]; po; [R] in RS_RS <= S. So by construction all acquires po-before
R are satisfied and eagerly completed after executing TR.
R cannot be satisfied after TR: by construction R could only have been satisfied
if there was (R,R) in S. But S is acyclic.
3.2) After executing TR W is propagated. By definition of rfs it is (W,R) in rfs
<= S, so by construction W has already been propagated.
3.3) Assume after executing TR there is a write W' to the same address that
propagated after W. Then this must be because (W,W') and (W',R) in S. Since co
<= S and co totally orders same-address writes it must be (W,W') in co and
therefore (R,W') in fr <= S. But then S cyclic. Contradiction.
3.4) Assume R is a read exclusive R = RE and (RE,WE) in rmw. Assume there is
such a (RE',[(W',_)]) in flat_ss_exclusive_reads. Then by induction hypothesis
it is (W',RE') in rf, there is WE' such that (RE',WE') in rmw, W' in S[0..n],
RE' satisfied after TR', and WE' not in S[0..n]. Then we have (W',RE) in S. Also
have (W,RE) in rfs <= S. Since RE' is satisfied, by construction it must be RE'
in S[0..n] and therefore (RE',RE) in S. And it is (RE,WE') in S.
Also have (RE,WE) in fr <= S. Since by assumption RE and RE' from different
threads it must also be that WE and WE' are from different threads.
Now there are two cases: (a) (WE,WE') in S or (b) (WE',WE) in S.
(a). Then it is (W',RE),(RE,WE),(WE,WE') in S. But WE and WE' are to the same
address from different threads, contradicting Lemma A for rmw(R',WE').
(b). Then it is (W,RE),(RE,WE'),(WE',WE) in S. But WE and WE' are to the same
address from different threads, contradicting Lemma A for rmw(R,WE).
#######
Lemma 1: Let ST be a state of Flat-operational. For any read R: if R is
satisfied, all writes and barriers E with (E,R) in (WC_RC | BC_RC); RC_RC*
are finished, and all reads R' with (R',R) in RC_RC+ are satisfied, and all
eager transitions taken, then R is finished.
Proof by induction on the instruction tree.
Induction start: if the instruction tree is empty there is no such read R in it.
Induction hypothesis: the statement holds for some instruction tree IT. Show it
also holds for adding a leaf II to IT.
By the induction assumption the induction hypothesis holds for all
satisfied reads R in IT also with II added. (The pop_finish_load_cand
condition is unaffected by po-later instructions.)
So remains to show that the induction hypothesis holds for II. Let R be the leaf
II and assume R is satisfied, all writes and barriers E with (E,R) in (WC_RC
| BC_RC); RC_RC* are finished and all reads R' with (R',R) in RC_RC+ are
satisfied. Have to show that R is finished.
For R to finish the following have to hold:
a) commitDataflow. The memory reads feeding into the register reads of R
have to be finished.
Let R' be one such read. We have (R',R) in [R]; addr; [R] <= RC_RC.
Therefore R' is satisfied in ST, all reads (R'',R') in RC_RC+ are satisfied
since (R'',R'); (R',R) in RC_RC+; RC_RC = RC_RC+, and all writes E' are
propagated and barriers E' committed and E' therefore eagerly finished for
(E',R') in (WC_RC | BC_RC); RC_RC* since (E',R'); (R',R) in (WC_RC |
BC_RC); RC_RC*; RC_RC = (WC_RC | BC_RC); RC_RC*. Then by induction
hypothesis R' is finished.
b) commitControlflow. Conditional branches po-before R have to be finished.
Assume there is an uncommitted branch instruction po-before R, let BR be
the po-earliest one. BR's finish-transition is taken eagerly, so if BR is
not finished, then it is because BR cannot finish yet.
Since BR is the po-earliest unfinished branch its control flow is
finished. So it must be BR's dataflow that is unfinished: there is at least
one read R' that feeds into BR's register reads that is unfinished. But
then (R',R) in [R]; ctrl; [R] <= RC_RC and therefore R' satisfied and all
writes E' are propagated and barriers E' committed and E' therefore eagerly
finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in
(WC_RC | BC_RC); RC_RC*, and all reads R'' with (R'',R') in RC_RC+
satisfied, since (R'',R'); (R',R) in RC_RC+. Therefore by induction
hypothesis R' is finished, contradiction.
c) All po-earlier DMB.SYs, DMB.LDs, ISBs are finished.
Because of [DMB.SY|ISB|DMB.LD]; po; [R] in BC_RC by assumption all of these
are finished in ST.
d) All po-earlier acquire reads are finished.
Let R' be a po-earlier acquire. So (R',R) in [Acq]; po; [R] <= RC_RC. Then
R' satisfied. And all writes E' are propagated and barriers E' committed
and E' therefore eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC*,
since (E',R'); (R',R) in (WC_RC | BC_RC); RC_RC*, and all reads R'' with
(R'',R') in RC_RC+ are satisfied, since (R'',R'); (R',R) in RC_RC+. Then by
induction hypothesis R' is finished.
e) If R is an acquire then all po-earlier releases are finished.
This is true in ST, since by [Rel]; po; [Acq] in WC_RC these write releases
are propagated and therefore eagerly finished.
f) If the closest po-earlier write W to the same address was forwarded to R
the memory reads feeding into W's register reads must be finished.
Let R' be one such read. Then (R',R) in [R]; (addr|data); rfi in
RC_RC. Therefore R' is satisfied, all reads R'' with (R'',R') in RC_RC+ are
satisfied, since (R'',R'); (R',R) in RC_RC+, and all writes E' are
propagated and barriers E' committed and therefore E' eagerly finished for
(E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC |
BC_RC); RC_RC*. Then by induction hypothesis R' is finished.
g) If the closest po-earlier write W to the same address was not forwarded
to R it must be propagated.
By assumption R is satisfied in ST. Now there are two cases: g1) (W,R) in rf
or g2) not.
g1) Since by assumption W was not forwarded to R, the read R read from W in
memory, so W propagated.
g2) Then (W,R) in [W];(po-R-loc\rf);[R] <= WC_RC, so by assumption W is
finished and therefore propagated.
h) All memory accesses between this write W and R have their
address-feeding memory reads finished and are initiated.
Let R' be such an address-feeding read. Then (R',R) in [R]; addr;
po-no-W-loc; [R] in RC_RC <= S. Therefore R' is satisfied, all reads R''
with (R'',R') in RC_RC+ are satisfied, since (R'',R'); (R',R) in RC_RC+,
and all writes E' are propagated and barriers E' committed and therefore E'
eagerly finished for (E',R') in (WC_RC | BC_RC); RC_RC*, since (E',R');
(R',R) in (WC_RC | BC_RC); RC_RC*. Then by induction hypothesis R' is
finished. Moreover, since all such R' are satisfied, all memory accesses
between W and R have eagerly done the register reads necessary to determine
their address and eagerly initiated.
i) All reads R' to the same address between W and R must be
not-restartable.
Let R' be one such read. Then (R',R) in [R]; po-R-loc; [R] in RC_RC.
Therefore R' is satisfied, all reads R'' with (R'',R') in RC_RC+ are
satisfied, since (R'',R'); (R',R) in RC_RC+, and all writes E' are
propagated and barriers E' committed and therefore E' finished for (E',R')
in (WC_RC | BC_RC); RC_RC*, since (E',R'); (R',R) in (WC_RC | BC_RC);
RC_RC*. Then by induction hypothesis R' is finished.
Now have: R is satisfied, so the load is eagerly completed. Since all the
conditions for finishing R hold in ST and the read-finish transition is an
eager transition, R is finished and the induction hypothesis holds.
#######
## Case E is a write W
0. Assume t = (RE',[(W',_)]) in flat_ss_exclusive_reads after executing
TR'. Now there are two possibilities: a) t in flat_ss_exclusive_reads before
T or b) not.
a) Then by induction hypothesis there it is (W',RE') in rf and there exists
WE' such that (RE',WE') in rmw with W' in S[0..n] and WE' not in S[0..n] and
RE' satisfied after TR'. Have to show T preserves this. rf and rmw are
unaffected by T, after T it will still be W' in S[0..n+1] and RE' still
satisfied. So have to show WE' not in S[0..n+1]. Assume it is. Then W =
WE'. But then T = T_propagate_write W is annotated with RE' and
(RE',[(W',_)]) deleted by T from flat_ss_exclusive_reads, contradiction.
b) Then RE' is a read exclusive that was satisfied by thread-internal
forwarding from W and it is paired with a write-exclusive WE' for which TR
contains the promise-write-success transition. Then we have (W,RE') in rf. By
construction it is (RE',WE') in rmw. RE' is satisfied and it is W in
S[0..n+1]. Still have to show that WE' not in S[0..n+1].
By (RE',WE') in rmw it is (RE',WE') in po. Since RE' read from W by
thread-internal forwarding it is also (W,RE') in po, so (W,WE') in po and
they are writes to the same location. But then it is (W,WE') in
[W];po-loc;[W] <= WC_WC <= S, and since W = S[n+1] it must be that WE' not in
S[0..n+1].
1. By induction hypothesis the trace TR induces the rf, co, and rmw relation
from the candidate execution, the instruction tree matches po. Have to show that
extending TR' to TR'' for E preserves this. Since in the operational model the
coherence relation is determined by the order in which writes reach memory, have
to show: a) for all (W',W) in co W' has already been propagated after TR and b)
for all W' to the same address as W that have been propagated before W in TR it
is (W',W) in co.
a) Since co <= S by construction of TR all such writes W' have already been
propagated.
b) Let W' be any same-address write that is propagated after in TR. Then by
construction of TR it must be (W',W) in S. But since co <= S, since co totally
orders all same-address writes, and since S is acyclic W' it must be (W',W) in
co.
po, rf, and rmw are unaffected by T, so 1 follows.
2. Have to show that the propagation of W does not cause the restart or
discarding of any instructions. By definition T does not discard
instruction-tree branches. Let restart_roots be the result of calling
propagate_write_action_restart_roots. These and their dependent reads or writes
are the instructions that will be restarted as part of the memory write
propagate action. Then restart_roots is the set of all reads R with (W,R) in
po-loc for which R has been satisfied, but neither from W nor from a write
po-between W and R after TR.
Show that after executing TR restart_roots for this memory write action is
empty. From that follows that extending the trace for E preserves (2).
Assume there is such a read R that is satisfied by a write W' that is neither W
nor po-between W and R after executing TR. Then (W,R) in
[W];(po-loc\rf\(po;rf));[R] <= WC_RS <= S.
By construction of TR R can only have already been satisfied if (R,W) in
S. Contradiction to the acyclicity of S.
3. To propagate W, its store has to be committed and pop_write_co_check has to
hold. Since write commitment transitions are eager only have to show:
- 3.1 W must be in state MOS_potential_mem_write with its data available
- 3.2 pop_commit_store_cand and pop_write_co_check must hold
- 3.3 There exists no (RE',[(W',_)]) in flat_ss_exclusive_reads after TR' where
RE' and W are to the same address but from different threads.
3.1) After executing TR W is in state MOS_potential_mem_write state.
The memory reads feeding into the register reads of W have been satisfied: for
any read R whose read value feeds into the address and data of W it is (R,W) in
RC_WC <= order <= S; therefore by construction of TR R is satisfied and eagerly
completed after executing TR; the register writes of the success bit of any
write-exclusives are done eagerly since all write-exclusives have promised their
success or failure; all non-memory data dependent instructions that feed into
W's register reads have been done eagerly.
3.2)
a) All po-earlier DMB.SYs, ISBs, DMB.LDs, DMB.STs are finished.
b) If W is a release then all po-earlier reads and writes are finished.
c) All po-earlier read acquires are finished.
d) commitDataflow holds.
e) commitControlflow holds.
f) pop_write_co_check holds.
g) If W is a successful store exclusive WE paired with a load exclusive RE, then
RE is finished, and if RE read from a same-thread write W', that write is
propagated.
a) It is [F]; po; [W] <= BC_WC <= Order <= S. So by construction of TR all
po-earlier DMB.SYs, ISBs, DMB.LDs, DMB.STs are committed, and therefore eagerly
finished.
b) Assume W is a write release. Let W' be a po-earlier write. Then by (W',W) in
[W];po;[Rel] <= WC_WC <= S and by construction of TR, W' is propagated and
therefore eagerly finished. Let R be a po-earlier read. Then by (R,W) in
[R];po;[Rel] <= RC_WC <= S, and by construction of TR R is satisfied.
Moreover, by construction of TR all reads R' with (R',R) in RC_RC+ are
satisfied, since (R',R); (R,W) in RC_RC+; RC_WC <= S, and all writes E' are
propagated and barriers E' are committed and E' therefore eagerly finished for
(E',R) in (WC_RC | BC_RC); RC_RC*, since (E',R);(R,W) in (WC_RC | BC_RC);
RC_RC*; RC_WC <= S. Then by Lemma 1 R is finished.
c) Let R be a read acquire po-before W. Then (R,W) in [Acq];po;[W] <= RC_WC <=
S. So by construction of TR R is satisfied, all reads R' with (R',R) in RC_RC+
are satisfied, since (R',R); (R,W) in RC_RC+; RC_WC <= S, and all writes E' are
propagated and barriers E' are committed and E' therefore eagerly finished for
(E',R) in (WC_RC | BC_RC); RC_RC*, since (E',R); (R,W) in (WC_RC | BC_RC);
RC_RC*; RC_WC. Then by Lemma 1, R is finished.
d) The memory reads feeding into the register reads of W have to be finished.
Let R be one such read. Then R's read value feeds into the address or data of W
and we have (R,W) in [R];(addr|data);[W] <= RC_WC <= S. Then by construction R
is satisfied and eagerly completed, all reads (R',R) in RC_RC+ are satisfied
since (R',R); (R,W) in RC_RC+; RC_WC and all writes E are propagated and
barriers E committed and E therefore eagerly finished for (E,R) in (WC_RC |
BC_RC); RC_RC* since (E,R); (R,W) in (WC_RC | BC_RC); RC_RC*; RC_WC. Then by
Lemma 1 R is finished.
e) Conditional branches po-before W have to be finished.
Assume there is an unfinished branch instruction po-before W, let BR be the
po-earliest one. BR's finish-transition is taken eagerly, so if BR is
unfinished, then it is because BR cannot finish yet.
Since BR is the po-earliest unfinished branch its control flow is finished. So
it must be BR's dataflow that is unfinished: there is at least one read R that
feeds into BR's register reads that is unfinished.
But then (R,W) in [R];ctrl;[W] <= RC_WC <= S. So by construction of TR R is
satisfied, all writes E are propagated and barriers E are committed for (E,R) in
(WC_RC | BC_RC); RC_RC* are therefore eagerly finished, since (E,R); (R,W) in
(WC_RC | BC_RC); RC_RC*; RC_WC <= S, and all reads R' with (R',R) in RC_RC+ are
satisfied, since (R',R); (R,W) in RC_RC*; RC_WC <= S. So by Lemma 1 R is
finished.
f0) All po-previous same-address writes have to be propagated.
Have [W];po-loc;[W] in WC_WC <= S, so all po-earlier writes to W's address are
propagated and therefore eagerly finished.
f1) All po-previous memory accesses have their address-feeding memory reads
finished and are initiated.
Let R be such an address-feeding read. Then (R,W) in [R];addr;po;[W] <= RC_WC <=
S. So by construction R is satisfied, all reads R' with (R',R) in RC_RC+ are
satisfied, since (R',R); (R,W) in RC_RC+; RC_WC <= S, and all writes E' are
propagated and barriers E' are committed and E' therefore eagerly finished for
(E',R) in (WC_RC | BC_RC); RC_RC*, since (E',R); (R,W) in (WC_RC | BC_RC);
RC_RC*; RC_WC. Then by Lemma 1 R is finished. Moreover, since all such R are
satisfied, all memory accesses po-before W have eagerly done the register reads
necessary to determine their address and have eagerly initiated.
f2) All po-previous memory reads to the same address must be satisfied, and
unrestartable.
Let R be one such read. Then it is (R,W) in fr <= S and by construction of S the
read R is satisfied.
(By f0 have that all same-address writes po-before W are propagated.)
Remains to show that all po-earlier memory reads to the same address are
not-restartable. Show instead that they are all already finished by induction
on the program order prefix of W.
Induction start. For the empty program-order prefix all reads to W's location
are trivially finished.
Induction hypothesis. Assume the statement holds for PREFIX, show appending an
instruction II to the prefix preserves the statement. Only have to show that if
II is a same-address read it is finished.
For R to finish the following have to hold:
f21) commitDataflow
The memory reads feeding into the register writes R reads from have to be
finished. Since R's register reads are all used to determine the address of R
these memory reads are all finished by proof of f1).
f22) commitControlflow
Since any instruction that R is control-flow dependent on, W is also
control-flow dependent on, these instructions have to be finished by proof of
e).
f23) All po-earlier DMB.SYs, DMB.LDs, ISBs are finished.
By proof of a) all po-earlier DMB.SYs, DMB.LDs, ISBs are finished.
f24) All po-earlier acquire reads are finished.
By proof of c) all po-earlier read acquires are finished.
f25) If R is an acquire then all po-earlier releases are finished.
If R is an acquire then it has to be finished by proof of c).
f26) If the closest po-earlier write W' to the same address was forwarded to R
the memory reads feeding into the register reads of W' must be finished.
As proved above all po-earlier writes W' to the same address as W are
propagated, hence committed, which includes having the memory reads feeding into
W' finished.
f27) If the closest po-earlier write W' to the same address was not forwarded to
R it must be propagated.
As proved above all po-earlier writes W' to the same address as W are propagated.
f28) All memory accesses between W' and R have their address-feeding memory
reads finished.
As proved for f1) all memory accesses po-before W have their address-feeding
memory reads finished.
f29) All reads R' to the same address between W' and R must not be restartable
anymore.
Since all these reads are po-before W and to the same address as W by the
induction hypothesis they are finished, so not restartable anymore.
Now since R is satisfied it can be completed. Since pop_finish_load_cand holds R
can finish. Since memory-read-finish transitions are eager, R is finished.
f3) Any read R that was partially satisfied from W must have requested its
unsatisfied slices from storage. By assumption all memory accesses have the same
size, so if R was partially satisfied from W it is completely satisfied and
there are no such unsatisfied slices.
g) Assume W is a successful store exclusive WE that is paired with a load
exclusive RE. Then RE must be po-before WE. By proof of f2) all po-earlier
memory reads to the same address are already finished, so RE finished. Assume RE
read from a same-thread write W'. Then it is (W',RE),(RE,W) in po. And since W'
and W have the same address, by proof of f0 the write W' is propagated.
3.3) Assume there exists (RE',[(W',_)]) in flat_ss_exclusive_reads after TR'
where RE' and W are to the same address but from different threads.
Then by 0 of the induction hypothesis it is (W',RE') in rf and there exists a
write-exclusive WE' such that (RE',WE') in rmw, W' in S[0..n], and WE' not in
S[0..n].
Since RE' and W from different threads by assumption, also W and WE' from
different threads, so WE' <> W and WE' not in S[0..n+1] either, and therefore
(W,WE') in S. Since it is W' in S[0..n] it is also (W',W) in S. So
(W',W),(W,WE') in S where W and WE' are from different threads but to the same
address. Contradiction to Lemma A for rmw(RE',WE').
## Case E is a barrier B
0. Committing a barrier does not change flat_ss_exclusive_reads, and since E is
not a write, 0 still holds.
1. By induction hypothesis the trace TR for E induces the po, rf, co, and rmw
relations from the candidate execution. Since B
does not fetch, satisfy reads, propagate writes, or promise the success/failure
of write-exclusives this is still true for TR'.
2. Have to show that committing B does not cause the restart of any instructions
or discard instruction-tree branches. But this follows from the definition of
pop_commit_barrier_action.
3. TR' is a valid trace of Flat-operational.
Have to show that committing B is enabled after TR.
a) commitDataflow
Since B has no data this is vacuously true.
b) commitControlflow
Conditional branches po-before B have to be finished. Assume there is an
unfinished branch instruction po-before B, let BR be the po-earliest one. BR's
finish transition is taken eagerly, so if BR is unfinished, then it is because
BR cannot finish yet.
Since BR is the po-earliest unfinished branch its control flow is finished. So
it must be BR's dataflow that is unfinished: there is at least one read R that
feeds into BR's register reads that is unfinished.
But then (R,B) in [R]; ctrl; [F] <= RC_BC <= S. So by construction of TR R is
satisfied, all writes E are propagated and barriers E are committed and E
therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*, since (E,R);
(R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC, and all reads R' with (R',R) in RC_RC+
are satisfied, since (R',R); (R,B) in RC_RC*; RC_BC <= S. Then R is finished.
c) If B is a DMB.SY all barriers, reads, and writes, are finished.
Let B' be a barrier po-before B. Then (B',B) in [F]; po; [DMB.SY] <= BC_BC <= S,
so by construction of TR B is committed and therefore eagerly finished. Let W
be a po-earlier write. then (W,B) in [W]; po; [DMB.SY] <= WC_BC <= S. So by
construction of TR W is propagated and therefore eagerly completed and
finished.
Let R be a po-earlier read. Then (R,B) in [R]; po; [DMB.SY] <= RC_BC <= S. So by
construction R is satisfied, all writes E are propagated and barriers E are
committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*,
since (E,R); (R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC <= S, and all reads R' with
(R',R) in RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC+; RC_BC. Then by
Lemma 1 R is finished.
d) All po-earlier DMB.SYs are finished.
Let B' be a DMB.SY po-before B. Then (B',B) in [DMB.SY]; po; [F] <= BC_BC <= S.
So by construction of TR' B' is committed and therefore eagerly finished.
e) If B is an ISB all po-earlier memory accesses have their address-feeding
memory reads finished and have initiated.
Let R be a memory read feeding into the address of a po-earlier memory
access. Then (R,B) in [R]; addr; po; [ISB] <= RC_BC <= S. So by construction R
is satisfied, all writes E are propagated and barriers E committed and E
therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*, since (E,R);
(R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC <= S, and all reads R' with (R',R) in
RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC+; RC_BC. Then by Lemma 1 R is
finished. Moreover, since all such R are satisfied, all memory accesses
program-order before B have eagerly done the register reads necessary to
determine their address and have eagerly initiated.
f) If B is a DMB.LD all po-earlier memory loads are finished.
Let R be a po-earlier read. Then (R,B) in [R]; po; [DMB.LD] <= RC_BC <= S. So by
construction R is satisfied, all writes E are propagated and barriers E are
committed and E therefore eagerly finished for (E,R) in (WC_RC | BC_RC); RC_RC*,
since (E,R); (R,B) in (WC_RC | BC_RC); RC_RC*; RC_BC <= S, and all reads R' with
(R',R) in RC_RC+ are satisfied, since (R',R); (R,B) in RC_RC+; RC_BC. Then by
Lemma 1 R is finished.
g) If B is a DMB.ST all po-earlier memory stores are finished.
Let W be a po-earlier write. Then (W,B) in [W]; po; [DMB.ST] <= WC_BC <= S. So
by construction of TR the write W is propagated and therefore eagerly finished.
Repeatedly extend the trace TR' to TR'' for enabled eager transitions T, until
there are no more enabled eager transitions.
0. Only the promise-write-success, satisfy-read-in-memory and
propagate-memory-write transitions change the flat_ss_exclusive_reads
field. These are not eager, so flat_ss_exclusive_reads unchanged. And since the
prefix of S has not changed: 0 still holds.
1. By definition of transition-eagerness, T does not fetch, satisfy a read,
propagate a write, or promise success or failure of a write exclusive, so 1 is
preserved.
2. Restarts are caused only by the promise-write-exclusive-failure,
satisfy-read-by-forwarding, satisfy-read-from-memory, and propagate-memory-write
transitions. By definition these transitions are not eager, so T does not cause
restarts. Only finishing of branch instructions causes instruction tree branches
to be discarded. A branch BR can only finish when the memory reads feeding into
its register reads are finshed. By induction hypothesis for these reads the
induced reads-from relation is a subset of rf of the candidate execution, so
BR's successor is termined as in the candidate execution's po, and by induction
hypothesis the instruction tree viewed as a relation matches po. Therefore,
finishing such a branch BR does not discard any instruction tree branches.
3. Since by assumption T is enabled after TR', TR'' is a valid trace.
When no more eager transitions are enabled, we establish the inductive
hypothesis 4:
4. The eager transitions have all been taken by construction.