13th Feb 2015 - Mini Task Notes Using integer arithmetic only, write a very short C program that can run in two modes: 1. all work is done in registers or very top of the stack 2. the same work is spread out over an array in memory Ideally the program can also be run under pthreads with two parallel cores in use. Run the program on the simulator with and without caches on and with four levels of gcc -O flag. Run it on the real board. Information for that will be available later in the week. Compare the execution time and energy use. Look at the VPPET paper figures and see if they tie up. Division of work DG : (get cachebench working) MP : (mp program) HX : (hx program) This does not really involve modifying abench1.h so we will ultimately also need an extension that does. The DRAM controller on the Zynq has a mini L3 cache in it (fully associative) used for access schedulling. (General DRAM notes http://www.cl.cam.ac.uk/teaching/1415/SysOnChip/SP1-intro_and_socparts/zhp343475fdd.html) It might be good to implement this component and see how it effects the system model, although there will not be much difference for small, single-threaded programs. ------------------- Questions Arising Q. How can I ensure that the work I am trying to do is done in registers or memory? I guess I may use malloc() to allocate variables to memory (and the "register" keyword to give compiler a hint to allocate variables to registers?). But is it possible that the compiler may change things for optimisation and change the location where I actually do the work? Or there is a way to force the compiler to do all the work purely in registers or memory? A. You can be fairly sure that if your function uses only a few integer variables (fewer than 5) and only has two arguments then all will be done in registers, whereas if you call malloc to allocate heap memory then the loads and stores to struct fields will not be in registers. I suggest you write very short pieces of code, one or two lines in the function body and look at the resulting disassembly or .S files to see what is going on. Q. If our program runs "under pthreads with two parallel cores in use", what work is actually done on the two cores? Are we supposed to run each mode (register or memory) on each core? A. The simple bare-metal pthreads implementation called djgthreads.c is non-preemptive and runs one thread per core. Q. Are we supposed to run each mode (register or memory) on each core? A. Yes, the number of cores in use is an orthogonal axis of exploration and so all combinations should be looked at in turn, with the non-idle cores doing identical work at any one time.