MPhil ACS / Part III / Part II course in Multicore Semantics and Programming Exercise Sheet 1 Nov 2018 Deadline: Friday 2018/11/30 1. Read the chapters of the Intel SDM and AMD APM about memory ordering and atomicity: https://software.intel.com/en-us/articles/intel-sdm (vol.3A Ch.8.1-8.3) http://developer.amd.com/resources/developer-guides-manuals/ (APM vol.2 Ch. 7.1-7.3) Consider this x86 example: Initially all registers and [x] and [y] are 0. Thread 0 Thread 1 Thread 2 MOV [x] <- 1 MOV EBX <- [x] MOV [y] <- 1 MOV ECX <- [y] MOV EAX <- [x] Finally: Thread 1: EBX=1 Thread 1: ECX=0 Thread 2: EAX=0 a) Is this allowed with respect to an SC semantics? b) Can you argue whether this is allowed or not with respect to each of those chapters? c) Prove whether or not it is allowed with respect to the x86-TSO abstract machine. d) Write a version of this in the litmus test format (by analogy with the rmem "Hand-written" litmus test library examples). e) Run your test exhaustively in rmem (http://www.cl.cam.ac.uk/users/pes20/rmem, best used in chromium) using the TSO model and discuss the results. f) Run your test on actual hardware using the litmus tool (part of the diy7 tool suite, diy.inria.fr) and discuss the results. 2. Using rmem, for each of the following ARM tests (in the on-line library, "Tutorial" section), either explain why it doesn't have any non-SC behaviour or describe the abstract-machine trace of one non-SC behaviour (explain what's going on, but don't write out the full state after each step of the trace in excruciating detail). Include a shortened-URL link to your trace (using the rmem "link to this state" dropdown). When doing exhaustive search in rmem for ARM, you'll want the "Flat" model (the default), the search option "hash prune" on, and the execution option "all eager" on. a) MP b) MP+dmb.sy+ctrl c) WRC d) WRC+addrs e) PPOAA 3 Read the ARM archictecture specification "Definition of the ARMv8 memory model B2.3.1 .. B2.3.3" https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile (ignoring the speculative barriers and limited ordering regions) and Sections 5-7 of the paper http://www.cl.cam.ac.uk/~pes20/armv8-mca/armv8-mca-draft.pdf (especially Section 7). For the first three tests (a,b, and c) of Question 2, explain their allowed behaviours (i) according to the ARM architecture specification, and (ii) according to the formal axiomatic model of Section 7 of the paper. 4 (optional for Part II students). Explain the intuition for C11 release/acquire synchronisation and explain how can be implemented on x86 and on ARM.