MPhil course in Multicore Programming - Exercise Sheet 1

      Version of Mon Oct 31 21:02:18 GMT 2011

Due: 31 November 2011. Please hand to student admin, with a coversheet.


1.   In terms of the semantics on Slides 11-21:

a) Give two different transitions, showing how they were derived using
the rules in the notes (in the same style as the bottom part of Slide
18), of the process t1:(*x=(*y=*z)).

b) Give a complete transition sequence (a sequence of transitions
ending in a state that has no more transitions) of the whole-system state 
<t1:(*x=(*y=*z)), {x=1,y=2,z=3}>.  Is it unique?    
(There is no need to give the derivations of each transition, but you
should at least be thinking about what they are.)

c) By enumerating the possible whole-system transitions (without
giving their derivations in detail), or otherwise, prove that
<t1:(*x=1);*y | t2:(*y=1);*x, {x=0,y=0}> cannot reach a state of the
form <t1:0 | t2:0, M>.


2.  Read the chapters of the Intel SDM and AMD APM cited on Slide 45. 

Consider this x86 example:

  Initially all registers and [x] and [y] are 0. 

    Thread 0         Thread 1           Thread 2

    MOV [x] <- 1     MOV EAX <- [x]     MOV [y] <- 1
                     MOV EBX <- [y]     MOV ECX <- [x]

  Finally: 
    Thread 1: EAX=1
    Thread 1: EBX=0
    Thread 2: ECX=0

a) Is this allowed with respect to an SC semantics?

b) Can you argue whether this is allowed or not with respect to each of
those chapters?

c) Prove whether or not it is allowed with respect to the x86-TSO
abstract machine.


3. Consider the n6 example from Slide 58.

a) Give a transition sequence of the x86-TSO abstract machine (as
defined on slides 70-80) that proves that this behaviour is allowed
there.

b) Give an E and X that show that it is allowed with respect to the
x86-TSO axiomatic model (as defined on Slides 88-101), checking each
of the axioms carefully.


4. Discuss the issues (pro and con) involved in adopting TSO as a
programming-language memory model. 


5. Using ppcmem, http://www.cl.cam.ac.uk/~pes20/ppcmem, for each of
the following tests, either explain why it doesn't have any non-SC
behaviour or give the abstract-machine trace (as a list of choices, as
produced by the tool) of one non-SC behaviour. 

a) WRC+lwsync+isync
b) IRIW+lwsync+lwsync
c) MP+lwsync+addr
d) SB+sync+lwsync


6. Give two different implementations (code implementing lock and
unlock) for a simple spinlock in C++11, using different atomic
primitives (and not using the native C++11 lock and unlock!).  Explain
why they are correct, being clear about what that means.


A. (Optional)

a) Think of a new interesting x86 example that would be worth testing
on actual hardware.

b) Is your specified behaviour allowed or not with respect to x86-TSO?

c) Test it using the Litmus tool, distributed as part of diy here:
   http://diy.inria.fr/