Allowed by Flat-operational => allowed by ARMv8.2-axiomatic
-----------------------------------------------------------


********************************************************************************

This proof assumes

0. Finite candidate executions and finite Flat traces.

1. That if a write-exclusive is successfully paired with a read-exclusive, then
they are to the same address.

2. Axiomatic and Operational model use the same instruction semantics.

3. (E,E') in addr iff: E is a read and E' is a read or a write and E feeds into
a register write that affects the address of E'. Note: this means addr(E,E')
does not hold if E is a write-exclusive.

4. (E,E') in data iff: E is a read and E' is a write and E feeds into a register
write that affects the value written by E'. Note: this means data(E,E') does not
hold if E is a write-exclusive.

5. (E,E') in ctrl iff: E is a read and E' is a write and E feeds into a register
write that affects a conditional branch or a computed branch instruction
program-order before E'. Note: this means ctrl(E,E') does not hold if E is a
write-exclusive.

********************************************************************************

Note: In herd a successful write-exclusive's success bit introduces dependencies
*from its matching read-exclusive*: For example, in the program-fragment

a: LDXR W1,[X0]   
b: STXR W2,W1,[X0]
c: STR W2,[X3]

where c stores the success bit of b into some location in memory, in herd it
will be (a,c) in data.

This does not match ARMv8's concurrency semantics where it is neither (a,c) in
data, nor (b,c) in data.

********************************************************************************


Let (po,rf,co,rmw) be a candidate execution.


Ignoring weak acquire loads ("Q"), the ARMv8.2 model is this:

(* Coherence-after *)
let ca = fr | co

(* Per-location coherence requirement *)
acyclic po-loc | ca | rf as coherence (* ca only used here *)

(* Observed-before *)
let obs = rfe | fre | coe

(* Dependency-ordered-before *)
let dob = addr | data
	| ctrl; [W]
	| (ctrl | (addr; po)); [ISB]; po; [R]
	| addr; po; [W]
	| (addr | data); rfi
	| (ctrl | data); coi

(* Atomic-ordered-before *)
let aob = rmw
	| [range(rmw)]; rfi; [A]

(* Barrier-ordered-before *)
let bob = po; [dmb.full]; po
	| [L]; po; [A]
	| [R]; po; [dmb.ld]; po (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the right *)
	| [A]; po               (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the right *)
	| [W]; po; [dmb.st]; po; [W]
	| po; [L]               (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the left *)
	| po; [L]; coi          (* if there's a cycle using this edge, there will also be one restricting to [R|W] on the left *)

(* Ordered-before *)
let rec ob = obs
	| dob
        | aob
	| bob
	| ob; ob

irreflexive ob as ob    (* ob is only used here, so can remove ob's recursion and check for acyclicity *)

(* Atomic: Basic LDXR/STXR constraint to forbid intervening writes. *)
empty rmw & (fre; coe) as atomic


This is equivalent to this, according to the comments.

acyclic po-loc | fr | co | rf as coherence (* ca only used here *)

let obs = rfe | fre | coe

let dob = addr | data
	| ctrl; [W]
	| (ctrl | (addr; po)); [ISB]; po; [R]
	| addr; po; [W]
	| (addr | data); rfi
	| (ctrl | data); coi

(* Atomic-ordered-before *)
let aob = rmw
	| [range(rmw)]; rfi; [A]

let bob = po; [dmb.full]; po
	| [L]; po; [A]
	| [R]; po; [dmb.ld]; po; [R|W]
	| [A]; po; [R|W]
	| [W]; po; [dmb.st]; po; [W]
	| [R|W]; po; [L]
	| [R|W]; po; [L]; coi

let ob = (obs | aob | dob | bob)+

irreflexive ob as ob
empty rmw & (fre; coe) as atomic


Lemma 2a: Let TR be a valid trace of Flat-operational that induces the relations
po, co, rf, and rmw. Then there is a valid trace TR' that induces the same po,
co, rf, and rmw relations but has no restarts.

Lemma 2b: Let TR be a valid trace of Flat-operational that induces the relations
po, co, rf, and rmw. Then there is a valid trace TR' that induces the same po,
co, rf, and rmw relations but has no discarded instruction tree branches
("always speculates the correct successor instruction of a branch").

Lemma 3. Let TR be a valid trace of the operational model that has no restarts
or discarded instruction tree branches, which induces the relations po, co, rf,
and rmw.

Let EDGE be an edge from ARMv8.2's "ob". Then the following property holds TR: for
- EDGE: barrier E -> barrier E'. In TR E is committed before E'.
- EDGE: barrier E -> write E'. In TR E is committed before E' is propagated.
- EDGE: write E -> barrier E'. In TR E is propagated before E' is committed.
- EDGE: write E -> write E'. In TR E is propagated before E'.
- EDGE: barrier E -> read R. In TR E is committed before R is satisfied.
- EDGE: write E -> read R. In TR E is propagated before R is satisfied.
- EDGE: read R -> barrier E. In TR R is satisfied before E is committed.
- EDGE: read R -> write E. In TR R is satisfied before E is propagated.
- EDGE: read R -> read R'. In TR R is satisfied before R'.


Proof by induction on the definition of ob.

Induction step: EDGE in obs | aob | dob | bob.

Then there are multiple cases:
EDGE = rfe

Then EDGE: W -> R and W and R are from different threads. In Flat-operational
for R to read from R, W must be propagated to memory, so the induction statement
holds.

EDGE = coe

Then EDGE: W -> W', and W and W' in co. By assumption the coherence order
induced by TR is the same as co. By definition for (W,W') in the operational
model's coherence order W has to propagated to memory before W'. So the
induction statement holds.

EDGE = fre

Then EDGE: R -> W, R and W are from different threads, and there is W' with
(W',R) in rf and (W',W) in co. Now there are two cases: R is satisfied by
forwarding from W' or not.

a) By forwarding: then R is satisfied before W' propagates in TR and by the
proof for EDGE = coe the write W' propagates to memory before W. So R is
satisfied before W propagates to memory and the induction statement holds.

b) Then when R is satisfied W' is already propagated and in TR the write W'
propagates before W. Since R reads from the last write to R's location and
(W',R) in rf it must be that W' is in memory and W is not yet in memory. So R
satisfies before W is propagated to memory.

EDGE = rmw

Then EDGE: R -> W. Then R and W are a successful load/store-exclusive pair and
by definition of pop_commit_store_cand R must be finished, and therefore 
satisfied, for W to commit, hence propagate.

EDGE = [range(rmw)]; rfi; [A]

Then EDGE: W -> R.

Then W is a successful store exclusive and R an acquire read A. Since by
definition of Flat-operational R cannot read from W by forwarding W must commit
to memory before R can satisfy.

EDGE = addr

Then a) EDGE: R -> R' or b) EDGE R -> W.

a) In TR R' can only be satisfied if initiated, so after its address is
available, so R must be satisfied before R' is.

b) In TR W can only propagate if initiated, so after its address is available,
so R must be satisfied before W propagates.

EDGE = data

Then EDGE: R -> W. W can only propagate after committing, and commit if its
data-feeding memory reads are finished, including being satisfied. So R is
satisfied before W propagates in TR.

EDGE = ctrl; [W]

Then EDGE : R -> W.  W can only propagate when committed, and by
pop_commit_store_cand commit when the memory reads feeding into the register
reads of conditional branches are finished, including being satisfied. So
R is satisfied before W propagates in TR.

EDGE = (ctrl | (addr; po)); [ISB]; po; [R]

Then EDGE: R -> R'.

Before R' can finish it has to satisfy. By definition of
pop_memory_read_request_cand R' can only satisfy if all po-earlier ISBs are
finished and therefore committed. By pop_commit_barrier_cand any such ISB can
only commit if the memory reads feeding into the conditional branches po-before
the ISB are finished, including being satisfied, and if all po-earlier memory
accesses have their address-feeding memory reads finished, including being
satisfied. Therefore R must be satisfied before R' can be satisfied in TR.

EDGE = addr; po; [W]

Then EDGE: R -> W. By pop_commit_cand before W can only propagate when
committed, and commit only when the address-feeding memory reads of all
po-earlier memory accesses are finished, including being satisfied. Therefore R
must be satisfied in TR before W propagates.

EDGE = (addr | data); rfi

Then EDGE: R -> R'. For R' to satisfy, the write it reads from must have both
address and data available, so any memory read feeding into the address or data
register reads of this write has to be satisfied. Therefore R must be satisfied
before R' can satisfy.

EDGE = (ctrl | data); coi

Then EDGE: R -> W and there is W' such that (R,W') in (ctrl|data) and (W',W) in
coi. By definition of how the operational model induces coherence W' must
propagate to memory before W does. Before W' propagates it must commit, and
therefore all memory reads feeding into the data registers reads of W' and all
memory reads feeding into the register reads by conditional branch instructions
po-before W' must be finished, including being satisfied. Therefore R must
be satisfied before W' can propagate, before W can propagate in TR.


EDGE = po; [dmb.full]; po

Let DMB be the dmb.

Case a) EDGE: RBW (read/write/barrier) -> R. R can only satisfy when DMB is
finished, hence committed. DMB can only commit when all po-earlier instructions
are finished, including being satisfied (for reads)/committed (for
barriers)/propagated (for writes).

Case b) EDGE: RBW (read/write/barrier) -> B. B can only commit when DMB is
finished, hence committed. DMB can only commit when all po-earlier instructions
are finished, including being satisfied (for reads)/committed (for
barriers)/propagated (for writes).

Case c) EDGE: RBW (read/write/barrier) -> W. W can only propagate when
committed, and only commit when DMB is finished, hence committed. DMB can only
commit when all po-earlier instructions are finished, including being satisfied
(for reads)/committed (for barriers)/propagated (for writes).

EDGE = [L]; po; [A]

Then EDGE: W -> R. By definition of pop_memory_read_request_cand, the acquire
read R can only initiate when all po-earlier releases are finished and hence
propagated. So W propagated before R satisfied in TR.

EDGE = [R]; po; [dmb.ld]; po; [R|W]

Let B be the DMB.LD

Case a) EDGE: R -> W

Then B can only commit after R is finished, including being satisfied, and W can
only commit and hence propagate after B is finished and hence committed. So R is
satisfied before W propagates in TR.

Case b) EDGE: R -> R'

Then B can only commit after R is finished, including being satisfied, and by
definition of read_request_cand R' can only satisfy, if B is finished and
therefore committed. So R is satisfied before R' in TR.

EDGE = [A]; po; [R|W]

Case EDGE: R -> R')

Then by definition of read_request_cand R must be completed and hence satisfied
before R' can satisfy.

Case EDGE: R -> W)

Then by definition of pop_commit_cand W can only commit and hence propagate
after R is finished, including being satisfied in TR.

EDGE = [W]; po; [dmb.st]; po; [W]

Then EDGE: W -> W'. Let B be the dmb.st. Then by definition of
pop_commit_barrier_cand B can only commit when W is finished and hence
propagated and W' can only commit and hence propagate when B is finished and
hence committed. So W propagates before W' in TR.

EDGE = [R|W]; po; [L]

Case EDGE: R -> W) Then by definition of pop_commit_store_cand W can only commit
and hence propagate when R is finished and hence satisfied in TR.

Case EDGE: W -> W') Then by definition of pop_commit_store_cand W' can only commit
and hence propagate when W is finished and hence propagated in TR.

EDGE = [R|W]; po; [L]; coi

Let L be the write release.

Case EDGE: R -> W) As shown above L can only commit when R is satisfied,
and by definition of pop_write_co_check W can only propagate when L is
propagated. So R must be satisfied before W can propagate in TR.

Case EDGE: W -> W') As shown above L can only commit when W is propagated, and
by definition of pop_write_co_check W' can only propagate when L is
propagated. So W must propagate before W' in TR.


Induction step: EDGE = EDGE'; EDGE'' in ob; ob. By transitivity of implication.

Let TR be a (partial) trace of Flat-operational. By Lemma 2a and Lemma 2b there
is an equivalent trace TR' for TR with the same co, rf, and rmw relations, with
no restarts and no discarded instruction tree branches. 

So let TR' be such a trace of Flat-operational trace with no restarts and no
discarded instruction tree branches and let rf, co, rmw, and po be such that
(W,R) in rf if the read R is satisfied by the write W in TR, where (W,W') in co
if W is propagated to memory before W' in TR, where (RE,WE) in rmw if RE and WE
are a successful read-exclusive/write-exclusive pair in TR, and where po is the
(linear) instruction tree viewed as a relation. Then the execution is allowed by
the ARMv8.2-axiomatic model.

Proof.

Assume in the ARMv8.2 model there is a cycle in ob. Let EDGE be the cycle.

So EDGE : R -> R or EDGE : B -> B or EDGE : W -> W.

Case EDGE: R -> R).  But from Lemma 3 follows that in TR' R is satisfied before
R is satisfied. But since TR' has no restarts or discarded branches, R can only
be satisfied once, contradiction.

Case EDGE: B -> B). But from Lemma 3 follows that in TR' B is committed before B
is committed. But every barrier is only committed once, contradiction.

Case EDGE: W -> W). But from Lemma 3 follows that in TR' W is propagtated before
W is propagated. But every write is only propagated once, contradiction.


Remains to show: by definition of Flat-operational's thread semantics: po-loc |
fr | co | rf acyclic, and rmw & (fre; coe) empty.

Assume a cycle in po-loc | fr | co | rf as induced by the trace TR' and let C be
the cycle that is derived using a minimal number of po-loc edges, and among the
cycles with the same number of po-loc edges minimises the total number of edges.

Then C has exactly one po-loc edge.

   Assume C has no po-loc edge.

   Then C in (rf | co | fr)+ = ([W]; rf; [W] | [W]; co; [W] | [R]; rf^-1; co; [W])+.

   But then C is in (rf | co): assume it is not, so it includes at least one fr
   edge. (By type rf^-1; co itself is acyclic.) Then C = (E1,E2); ...; (E3,E4);
   (E4,E5); rf^-1; co.  Then E5 is a read and by type (E4,E5) in rf.  So C =
   (E1,E2); ...; (E3,E4); rf; rf^-1; co.  But then C' = (E1,E2); ...; (E3,E4);
   co is a shorter cycle with no po-loc edges.

   So C in (rf | co)+. But then by type C in co+: No edge in (rf | co)+ starts
   from a read, so rf cannot participate in C. By definition of
   Flat-operational's co relation it is (W,W') in co if W propagates to memory
   before W' in TR'. But since every write only propagates once, co+ is acyclic.

   So assume C has at least one po-loc edge. Show it cannot have
   multiple. Assume C has multiple po-loc edges.

   Then it must be C = (E1,E2); ...; (E3,E4); po-loc; (E5,E6); ...; (E7,E8);
   po-loc, without loss of generality assuming (E5,E6); ...; (E7,E8) not
   containing a po-loc edge. Since po-loc; po-loc <= po-loc assume also (E5,E6);
   ...; (E7,E8) non-empty. So (E5,E6); ...; (E7,E8) <= (co|rf|fr)+.

   Now there are different cases for the type of (E4,E5) in po-loc:

   E4 -> E5: W -> W'. Then it must be (E4,E5) in co, otherwise it is (E5,E4) in
   co and (E5,E4); (E4,E5) is a cycle in co; po-loc with fewer po-loc edges. But
   then C can be constructed without using (E4,E5) as po-loc but using (E4,E5)
   as co, contradicting the minimality of C's po-loc edges.

   E4 -> E5: W -> R. If it is (E4,E5) in rf then (E4,E5) can be derived using rf
   instead of po-loc, contradicting the minimality of po-loc edges in C. So
   assume (E4,E5) not in rf. So (W',E5) in rf for some W'. If (W,W') in co it is
   (E4,E5) in co; rf; contradicting the minimality of po-loc edges in C. So
   assume (W',W) in co. But then it is (E5,E4) in fr and (E5,E4); (E4,E5) is a
   cycle in fr; po-loc with a smaller number of po-loc edges than C.

   E4 -> E5: R -> W. Let W' be such that (W',R) in rf. Now it is either W' = W
   or not. If W'= W then (E5,E4); (E4,E5) is a cycle in rf; po-loc with fewer
   po-loc edges. So assume W <> W'. Then it is either (W,W') in co or (W',W) in
   co. If (W',W) in co then (E4,E5) in fr and (E4,E5) and the same cycle can be
   derived with fewer po-loc edges, contradiction. If (W,W') in co then (W',E4);
   (E4,E5); (E5,W') is a cycle in rf; po-loc; co with fewer po-loc edges.

   E4 -> E5: R -> R'. Now there are two cases: R and R' read the same write or
   not. Assume R and R' read from the same write W. By type (E5,E6) must be
   fr. But since R and R' read from the same write it is also (E4,E6) in fr,
   contradiction, since then (E1,E2); ...; (E3,E4); (E4,E6); ...; (E7,E8);
   po-loc is a cycle with fewer po-loc edges. So assume (W,R) in rf and (W',R')
   in rf for W <> W'.

   Now there are two cases: (W,W') in co or (W',W) in co. If (W,W') in co it is
   (R,W') in fr, so (E4,E5) can be derived using fr; rf contradicting the
   minimality of po-loc edges in C. So assume (W',W) in co. But then (R',W) in
   fr and (E5,W); (W,E4); (E4,E5) is a cycle in fr; rf; po-loc with fewer po-loc
   edges, contradiction.

So assume C has exactly one po-loc edge.

Then C in (co|rf|fr)+; po-loc, so C = C'; P for C in (co|rf|fr)+ and P in
po-loc. Now look at all possible cases for the type of P.

## Case P: W -> W'.

Then C': W' -> W in co+.

   Assume otherwise. Then by type it must be

   C' = C''; rf; fr; C''' = C''; rf; rf^-1; co; C''' = C''; co; C'''. Contradiction
   to the assumption that C is the shortest cycle.

If C' in co+, then by definition of the operational model's coherence it must be
that W' propagates to memory before W. But by pop_write_co_check and (W,W') in
po-loc W' can only propagate after W is propagated to memory, contradiction.


## Case P: W -> R.

Then C': R -> W in fr.

   Assume otherwise. Then by type

   either C' in fr; co+ = rf^-1; co; co+ = rf^-1; co+ = rf^-1; co = fr as by
   definition of the operational model's coherence relation co is transitive.

   or C' = fr; C''; rf; fr; C''' for fr; C'' in co* and C''' in (co | rf | fr)*
   but then C' = fr; C''; rf; rf^-1; co; C''' = fr; C''; co; C'''. Contradiction
   to the assumption of C's minimality.

So C': R -> W in fr.

Let W' be the write R reads from. So we have (W',R) in rf and (W',W) in co. 
Now there are two cases: a) R is satisfied by forwarding or b) in storage.

a) Then (W',R) in po-loc because by definition of the satisfy-read-by-forwarding
transition W' must be before R in the instruction tree.

Now assume there is another write po-between W' and R to the address. Let W'' be
the po-closest predecessor to R. Then when W'' propagates, by definition of
pop_memory_write_propagate_action_restart_roots it restarts R since R did not read
from W and not from a po-succcessor of W''. Contradiction to the assumption of
TR' not having restarts.

So W' must be the closest po-predecessor of R to the same address. Now there are
two cases: a1) (W,W') in po or a2) (W',W) in po.

a1) Then it is (W,W') and (W',R) in po, so (W,W') in po, But then W' by
definition of pop_write_co_check can only propagate when W is propagated, so it is
(W,W') in co. Contradiction to the assumption that (W',W) in co.

a2) (W',W) in po. Since W' is the closest po-predecessor of R to the same
address it must be (R,W) in po. Contradiction to (W,R) in po-loc.

b) Then W' is the most recent write to the same address in memory when R is
satisfied. By (W',W) in co W' propagates before W. So R is satisfied before W
propagates. But when W propagates by definition of
pop_memory_write_propagate_action_restart_roots R is restarted since R is to W's
address and read from a different write but not by forwarding. Contradiction to
the assumption that TR' has no restarts.


## Case P: R -> W

Then C': W -> R in co?; rf.

   Proof. Assume otherwise. Then by type

   C' = C''; rf; fr; co+; rf = C''; rf; rf^-1; co+; rf = C''; co+;
   rf. Contradiction to the assumption of the minimality of C.

So it is W -co?-> W' -rf-> R for some W'.

Now there are two cases: a) W = W' or b) W <> W'.

a) If R is satisfied by forwarding it is (W,R) in po, contradiction to the
assumption. So assume R is satisfied in memory. Then W must be propagated before
R is satisfied. But by pop_write_co_check R must be satisfied before
W can propagated, and will not be restarted and satisfied again later,
contradiction.

b) Now there are two cases: (b1) R satisfied by forwarding or (b2) from storage.

(b1) Then it is (W',R) in po-loc and by (R,W) in po-loc also (W',W) in
po-loc. But then W' can only propagate when W is propagated and it must be
(W',W) in co, contradiction.

(b2) Then W' reaches memory before R is satisfied. By assumption of no restarts
in TR' R is only satisfied once. By definition of pop_write_co_check R must be
satisfied before W propagates. So W' propagates before W and it must be (W',W)
in co, contradiction.


# Case P: R -> R'

Then C': R' -> R  in fr; rf.

Proof: assume otherwise. By type C' starts with fr and ends with rf.

   So either a) C' = fr; co+; rf or b) C' = fr; C''; rf; fr; C'''; rf.

   a) But then C' = rf^-1; co; co+; rf = rf^-1; co+; rf = rf^-1; co; rf = fr;
   rf. 

   b) But then C' = fr; C''; rf; rf^-1; co; C'''; rf = fr; C''; co; C''';
   rf. Contradiction to the assumption of C's minimality.

Let (W,R) in rf and (W',R') in rf. Then it is (W',W) in co by definition of fr.

W is either from the same thread as R and R' or not. Assume W is from the same
thread as R and R'. Then it must be either (R,W) in po or (W,R) in po. If (R,W)
in po then there is a cycle in rf;po-loc using only R and W that has already
been dealt with in ## Case P: R -> W. If (W,R) in po, then it is (W,R') =
(W,R);(R,R') in po-loc and there is a smaller cycle in po-loc;fr using only R'
and W that has been dealt with in ## Case P: W -> R.

So assume W is from a different thread than R and R'. Then R must read from W in
memory.

It cannot be (R,W') in po since otherwise there would be a smaller cycle of R,
W', and W in po-loc;co;rf. 

Now there are two cases: R' is satisfied before R or after. 

If R' is satisfied before R, at the point where R is satisfied by definition of
satisfy_read_action_restart_roots R' is restarted since it reads from a
different write from R that is not po-after R, contradicting the assumption of
no restarts in TR'. So assume R is satisfied before R'.

By (W',W) in co it must be that W' propagates to memory before W does, and since
R reads from W in memory, W propagates to memory before R satisfies. So W'
propagates, and then W propagates, before R' satisfies. But then R' cannot read
from W': it cannot read from W' by thread-local forwarding, since W' is already
propagated when R' is satisfied; and it cannot read from W' in memory since W
propagated to memory after W' before R' is satisfied, overwriting W'.

Thus in TR' by definition of the thread semantics: po-loc | fr | co | rf
acyclic.


Now assume rmw & (fre; coe) non-empty. So there exists a successful
load/store-exclusive pair (RE,WE) such that RE reads from a write W and there
exists a write W' to the same address as RE but from a different thread such
that (W,W'),(W',WE) in co.

Then it must be that W propagates to memory before W', and W' propagates to
memory before WE. 

Now there are two cases: a) RE is satisfied by thread-internal forwarding or b)
not.

a) Here there are again two cases: a1) At the point where W propagates to memory
WE has promised its success or a2) not.

a1) Then when W propagates to memory it adds t=(RE,[(W,_)]) to
flat_ss_exclusive_reads and by definition of Flat-operational t can only be
removed by the unique write-exclusive WE paired with RE.

Since W' propagates after W and before WE, when W' propagates t is in
flat_ss_exclusive_reads. But since RE and W' are from a different thread and to
the same address W' cannot propagate, contradiction.

a2) WE must promise its success before propagating. When WE promises its success
W is already propagated to memory, by assumption. For WE to be able promise its
success there can be no write coherence-after W (overlapping W) from a different
thread than RE in memory. So W' cannot be propagated yet and must propagate
after the promise-write-success transition of WE.

WE's promise-write-success transition now adds (RE,[(W,_)]) to
flat_ss_exclusive_reads. Since (RE,[(W,_)]) can only be removed from
flat_ss_exclusive_reads when propagating WE, and since W' propagates before WE
by assumption, this element must be in flat_ss_exclusive_reads when W'
propagates. But since W' overlaps W and is from a different thread than RE, the
write W' cannot propagate. Contradiction.


b) Since RE reads from W in memory it has to enter memory after W. RE's
satisfy-read-transition returns the last memory write written to its location,
and since W' propagates after W, the read RE satisfies before W' propagates,
which in turn is before WE propagates.

Now there are two cases: (b1) when RE satisfies WE has promised its success or
(b2) not.

b1) Then when RE reads from W in memory it adds t=(RE,[(W,_)]) to
flat_ss_exclusive_reads. Since t can only be removed when propagating WE and
since W' propagates before WE, when W' propagates t must be in
flat_ss_exclusive_reads. But since W' is to the same address as RE and W but
from a different thread than RE, the write W' cannot propagate, contradiction.

b2) WE must promise its success before propagating. For WE to promise its
success after RE is satisfied there must be no write overlapping W from a
different thread coherence-after W in memory. So W' cannot be propagated to
memory when promising the success of WE.

When WE promises its success the promise-write-success transition adds
t=(RE,[(W,_)]) to flat_ss_exclusive_reads. Since t can only be removed when
propagating WE and since W' propagates before WE, when W' propagates, t must be
in flat_ss_eclusives. But since W' is to the same address as RE and W but from a
different thread than RE, the write W' cannot propagate, contradiction.

Therefore rmw & (fre;coe) empty.