MPhil course in Multicore Programming - Exercise Sheet 1

      Version of Dec 3 2012

Due: 28 January 2013. Please hand to student admin, with a coversheet.


1.  Read the chapters of the Intel SDM and AMD APM about memory ordering:

Intel Vol. 3 Chapter 8:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

AMD Vol. 2 Chapter 7:
http://developer.amd.com/Resources/documentation/guides/Pages/default.aspx

Consider this x86 example:

  Initially all registers and [x] and [y] are 0. 

    Thread 0         Thread 1           Thread 2

    MOV [x] <- 1     MOV EAX <- [x]     MOV [y] <- 1
                     MOV EBX <- [y]     MOV ECX <- [x]

  Finally: 
    Thread 1: EAX=1
    Thread 1: EBX=0
    Thread 2: ECX=0

a) Is this allowed with respect to an SC semantics?

b) Can you argue whether this is allowed or not with respect to each of
those chapters?

c) Prove whether or not it is allowed with respect to the x86-TSO
abstract machine.


2. Using ppcmem, http://www.cl.cam.ac.uk/~pes20/ppcmem, for each of
the following tests, either explain why it doesn't have any non-SC
behaviour or describe the abstract-machine trace of one non-SC
behaviour (explain what's going on, but don't write out the full state
after each step of the trace in excruciating detail).

a) WRC+lwsync+isync
b) IRIW+lwsync+lwsync
c) MP+lwsync+addr
d) SB+sync+lwsync


3. Give two different implementations (code implementing lock and
unlock) for a simple spinlock in C11, using different atomic
primitives (and not using the native C11 lock and unlock!).  Explain
why they are correct, being clear about what that means.
You may want to use cppmem to explore their behaviour. 


4. Read the deque algorithm in Figure 3 of 

Thread Scheduling for Multiprogrammed Multiprocessors
Nimar S. Arora Robert D. Blumofe C. Greg Plaxton
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.9754

which is the basis for the pseudocode algorithm in Tim Harris's
slides.  The algorithm presumes a sequentially consistent memory.

a) Describe what changes (e.g. of additional barriers) that one would
have to use to make this work correctly above the POWER memory model,
including an informal argument for why they are sufficient and necessary.

a) Describe what changes (e.g. of memory order annotations) that one
would have to use to make this work correctly above the C11 memory
model, including an informal argument for why they are sufficient and
necessary.  You might like to produce compilable C11 code, following
the example code and Makefile on the course web page.