MPhil course in Multicore Programming - Exercise Sheet 1 Nov 2017 Please hand to student admin as usual. Deadline: 1/12/2017 12:00 (last day of Michaelmas term) 1. Read the chapters of the Intel SDM and AMD APM about memory ordering and atomicity: https://software.intel.com/en-us/articles/intel-sdm (vol.3A Ch.8.1-8.3) http://developer.amd.com/resources/developer-guides-manuals/ (APM vol.2 Ch. 7.1-7.3) Consider this x86 example: Initially all registers and [x] and [y] are 0. Thread 0 Thread 1 Thread 2 MOV [x] <- 1 MOV EBX <- [x] MOV [y] <- 1 MOV ECX <- [y] MOV EAX <- [x] Finally: Thread 1: EBX=1 Thread 1: ECX=0 Thread 2: EAX=0 a) Is this allowed with respect to an SC semantics? b) Can you argue whether this is allowed or not with respect to each of those chapters? c) Prove whether or not it is allowed with respect to the x86-TSO abstract machine. d) Write a version of this in the litmus test format (by analogy with the rmem "Hand-written" litmus test library examples). e) Run your test exhaustively in rmem (http://www.cl.cam.ac.uk/users/pes20/rmem, best used in chromium) using the TSO model and discuss the results. f) Run your test on actual hardware using the litmus tool (part of the diy7 tool suite, diy.inria.fr) and discuss the results. 2. Using rmem, for each of the following ARM tests (in the on-line library, "Tutorial" section), either explain why it doesn't have any non-SC behaviour or describe the abstract-machine trace of one non-SC behaviour (explain what's going on, but don't write out the full state after each step of the trace in excruciating detail). Include a shortened-URL link to your trace (using the rmem "link to this state" dropdown). When doing exhaustive search in rmem for ARM, you'll want the "Flat" model (the default), the search option "hash prune" on, and the execution option "all eager" on. a) MP b) MP+dmb.sy+ctrl c) WRC d) WRC+addrs e) PPOAA 3. Either prove or give a counterexample for the conjecture that any candidate execution in which the reads and writes are all to the same location and that does not contain any of the five coherence shapes is sequentially consistent. 4. Explain the intuition for C11 release/acquire synchronisation and explain how it is implemented on x86 and on ARM.