### A word on C11/C++11 low-level atomics

```
std::atomic<int> flag0(0),flag1(0),turn(0);
void lock(unsigned index) {
   if (0 == index) {
                                                     Atomic variable declaration
       flag0.store(1, std::memory_order_relaxed);
       turn.exchange(1, std::memory_order_acq_rel);
       while (flag1.load(std::memory_order_acquire) 
           && 1 == turn.load(std::memory_order_relaxed))
           std::this_thread::yield();
   } else {
       flag1.store(1, std::memory_order_relaxed);
                                                                 New syntax for
       turn.exchange(0, std::memory_order_acq_rel);
                                                                 memory accesses
       while (flag0.load(std::memory_order_acquire)
           && 0 == turn.load(std::memory_order_relaxed))
           std::this_thread::yield();
    }
}
void unlock(unsigned index) {
                                                                Qualifier
   if (0 == index) {
       flag0.store(0, std::memory_order_release);
   } else {
       flag1.store(0, std::memory_order_release);
    ł
```

#### Low-level atomics



# MO\_SEQ\_CST

The compiler must ensure that MO\_SEQ\_CST accesses have sequentially consistent semantics.

| Thread 0                         | Thread 1                  |
|----------------------------------|---------------------------|
| <pre>x.store(1,MO_SEQ_CST)</pre> | y.store(1,MO_SEQ_CST)     |
| $r1 = y.load(MO_SEQ_CST)$        | $r2 = x.load(MO_SEQ_CST)$ |

The program above cannot end with r1 = r2 = 0.

Sample compilation on x86:Sample compilation on Power:store: MOV; MFENCEstore: HWSYNC; STload: MOVload: HWSYNC; LD; CMP; BC; ISYNC

# MO\_RELEASE / MO\_ACQUIRE

Supports a fast implementation of the message passing idiom:

| Thread 0              | Thread 1                |
|-----------------------|-------------------------|
| x.store(1,MO_RELAXED) | r1 = y.load(MO_ACQUIRE) |
| y.store(1,MO_RELEASE) | r2 = x.load(MO_RELAXED) |

The program above cannot end with r1 = 1 and r2 = 0.

Accesses to the data structure can be reordered/optimised (MO\_RELAXED).

Sample compilation on x86:Sample compilation on Power:store: MOVstore: LWSYNC; STload: MOVload: LD; CMP; BC; ISYNC

# MO\_RELEASE / MO\_CONSUME

Supports a fast implementation of the message passing idiom on Power:

| Thread 0                         | Thread 1                      |
|----------------------------------|-------------------------------|
| <pre>x.store(1,MO_RELAXED)</pre> | r1 = y.load(x,MO_CONSUME)     |
| y.store(&x,MO_RELEASE)           | $r2 = (*r1).load(MO_RELAXED)$ |

The program above cannot end with r1 = 1 and r2 = 0.

The two loads have an address dependency, Power won't reorder them.

Sample compilation on x86: Sample compilation on Power:

store: MOV load: MOV

store: LWSYNC; ST load: LD