MPhil course in Multicore Programming - Exercise Sheet 1 Version of Dec 3 2012 Due: 28 January 2013. Please hand to student admin, with a coversheet. 1. Read the chapters of the Intel SDM and AMD APM about memory ordering: Intel Vol. 3 Chapter 8: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html AMD Vol. 2 Chapter 7: http://developer.amd.com/Resources/documentation/guides/Pages/default.aspx Consider this x86 example: Initially all registers and [x] and [y] are 0. Thread 0 Thread 1 Thread 2 MOV [x] <- 1 MOV EAX <- [x] MOV [y] <- 1 MOV EBX <- [y] MOV ECX <- [x] Finally: Thread 1: EAX=1 Thread 1: EBX=0 Thread 2: ECX=0 a) Is this allowed with respect to an SC semantics? b) Can you argue whether this is allowed or not with respect to each of those chapters? c) Prove whether or not it is allowed with respect to the x86-TSO abstract machine. 2. Using ppcmem, http://www.cl.cam.ac.uk/~pes20/ppcmem, for each of the following tests, either explain why it doesn't have any non-SC behaviour or describe the abstract-machine trace of one non-SC behaviour (explain what's going on, but don't write out the full state after each step of the trace in excruciating detail). a) WRC+lwsync+isync b) IRIW+lwsync+lwsync c) MP+lwsync+addr d) SB+sync+lwsync 3. Give two different implementations (code implementing lock and unlock) for a simple spinlock in C11, using different atomic primitives (and not using the native C11 lock and unlock!). Explain why they are correct, being clear about what that means. You may want to use cppmem to explore their behaviour. 4. Read the deque algorithm in Figure 3 of Thread Scheduling for Multiprogrammed Multiprocessors Nimar S. Arora Robert D. Blumofe C. Greg Plaxton http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.9754 which is the basis for the pseudocode algorithm in Tim Harris's slides. The algorithm presumes a sequentially consistent memory. a) Describe what changes (e.g. of additional barriers) that one would have to use to make this work correctly above the POWER memory model, including an informal argument for why they are sufficient and necessary. a) Describe what changes (e.g. of memory order annotations) that one would have to use to make this work correctly above the C11 memory model, including an informal argument for why they are sufficient and necessary. You might like to produce compilable C11 code, following the example code and Makefile on the course web page.