Weak Memory Ordering Notes

Comp Arch 2005/6 DJG

Sequential Consistency

Assume we have cache consistency, but is this enough ?
Sequential consistency -
- All operations happen in the order specified by the code
- All CPUs' views of these operations are consistent with a global ordering of the combined operations
Expensive to maintain on modern CPUs
Tantamount to an implicit memory barrier (fence) instruction between each memory operation (load or store).
Actually, need to wait for all possible invalidations or exceptions from a previous instruction to have settled before starting the next.

Single CPU Guarantees

1. A given CPU always perceives its own memory operations as occurring in program order. That is, memory-reordering issues arise only when a CPU is observing other CPUs' memory operations.

2. An operation is reordered with a store only if the operation accesses a different location from the store.

3. Aligned simple loads and stores are atomic.

Self-Modifying Code

Not supported since I/D cache separation.

Needed for JIT and OS loader.

Flush whole I cache gets expensive if frequent.

Used deliberate alias techniques for very small, known caches

Better to use an OS primitive that maps to partial flush/invalidate instructions.

Possible Consistency Models

Three (at least) main models should be considered:

1. Sequential Consistency

2. Processor consistency (W to R ordering): Writes from a given processor are always seen by others as in that processor's order but stores by different processors may not be seen in true order.

2. Weak Consistency:

All operations on synchronisation variables are sequentially consistent,
No synchronisation variables may be accessed until data operations are complete,
No data variables may be accessed until synchronisation operations are complete.

A write is 'complete' when no subsequent read can find a different value.

(A read is 'complete' when no subsequent write can change the value read.)

Do we need cache consistency between barriers ?

OO Race Problems

Java has synchronised keyword

C C++ has volatile keyword

Processor 1 executes its code first. Its cache happens to flush the modification of Foo.bar back to main memory, but none of the other writes.

Then, processor 2 executes its code, it loads its cache only with the new value of Foo.bar from main memory (finding old values in the cache for all other memory locations).

Will processor 2 see garbage for integer x ? What if it were an object handle ?