13th Feb 2015 - Mini Task Notes


Using integer arithmetic only, write a very short C program that can run in two modes: 
  1. all work is done in registers or very top of the stack
  2. the same work is spread out over an array in memory

Ideally the program can also be run under pthreads with two parallel cores in use.

Run the program on the simulator with and without caches on and with four levels of gcc -O flag.

Run it on the real board.  Information for that will be available later in the week.

Compare the execution time and energy use.

Look at the VPPET paper figures and see if they tie up.

Division of work
   DG :  (get cachebench working)
   MP :  (mp program)
   HX :  (hx program)                 

This does not really involve modifying abench1.h so we will ultimately
also need an extension that does.  The DRAM controller on the Zynq
has a mini L3 cache in it (fully associative) used for access schedulling.
(General DRAM notes http://www.cl.cam.ac.uk/teaching/1415/SysOnChip/SP1-intro_and_socparts/zhp343475fdd.html)
It might be good to implement this component and see how it effects the system model, although there
will not be much difference for small, single-threaded programs.


-------------------

Questions Arising

Q. How can I ensure that the work I am trying to do is done in
registers or memory? I guess I may use malloc() to allocate variables
to memory (and the "register" keyword to give compiler a hint to
allocate variables to registers?). But is it possible that the
compiler may change things for optimisation and change the location
where I actually do the work? Or there is a way to force the compiler
to do all the work purely in registers or memory?


A. You can be fairly sure that if your function uses only a few integer
variables (fewer than 5) and only has two arguments then all will be
done in registers, whereas if you call malloc to allocate heap memory
then the loads and stores to struct fields will not be in registers.
I suggest you write very short pieces of code, one or two lines in the
function body and look at the resulting disassembly or .S files to see
what is going on.


Q. If our program runs "under pthreads with two parallel cores in
use", what work is actually done on the two cores? Are we supposed to
run each mode (register or memory) on each core?

A. The simple bare-metal pthreads implementation called djgthreads.c is
non-preemptive and runs one thread per core.

Q. Are we supposed to run each mode (register or memory) on each core?

A. Yes, the number of cores in use is an orthogonal axis of
exploration and so all combinations should be looked at in turn, with
the non-idle cores doing identical work at any one time.