MPhil course in Multicore Programming - Exercise Sheet 1 Version of Mon Oct 31 21:02:18 GMT 2011 Due: 31 November 2011. Please hand to student admin, with a coversheet. 1. In terms of the semantics on Slides 11-21: a) Give two different transitions, showing how they were derived using the rules in the notes (in the same style as the bottom part of Slide 18), of the process t1:(*x=(*y=*z)). b) Give a complete transition sequence (a sequence of transitions ending in a state that has no more transitions) of the whole-system state . Is it unique? (There is no need to give the derivations of each transition, but you should at least be thinking about what they are.) c) By enumerating the possible whole-system transitions (without giving their derivations in detail), or otherwise, prove that cannot reach a state of the form . 2. Read the chapters of the Intel SDM and AMD APM cited on Slide 45. Consider this x86 example: Initially all registers and [x] and [y] are 0. Thread 0 Thread 1 Thread 2 MOV [x] <- 1 MOV EAX <- [x] MOV [y] <- 1 MOV EBX <- [y] MOV ECX <- [x] Finally: Thread 1: EAX=1 Thread 1: EBX=0 Thread 2: ECX=0 a) Is this allowed with respect to an SC semantics? b) Can you argue whether this is allowed or not with respect to each of those chapters? c) Prove whether or not it is allowed with respect to the x86-TSO abstract machine. 3. Consider the n6 example from Slide 58. a) Give a transition sequence of the x86-TSO abstract machine (as defined on slides 70-80) that proves that this behaviour is allowed there. b) Give an E and X that show that it is allowed with respect to the x86-TSO axiomatic model (as defined on Slides 88-101), checking each of the axioms carefully. 4. Discuss the issues (pro and con) involved in adopting TSO as a programming-language memory model. 5. Using ppcmem, http://www.cl.cam.ac.uk/~pes20/ppcmem, for each of the following tests, either explain why it doesn't have any non-SC behaviour or give the abstract-machine trace (as a list of choices, as produced by the tool) of one non-SC behaviour. a) WRC+lwsync+isync b) IRIW+lwsync+lwsync c) MP+lwsync+addr d) SB+sync+lwsync 6. Give two different implementations (code implementing lock and unlock) for a simple spinlock in C++11, using different atomic primitives (and not using the native C++11 lock and unlock!). Explain why they are correct, being clear about what that means. A. (Optional) a) Think of a new interesting x86 example that would be worth testing on actual hardware. b) Is your specified behaviour allowed or not with respect to x86-TSO? c) Test it using the Litmus tool, distributed as part of diy here: http://diy.inria.fr/