Computer Laboratory

    Course Home

ACS P35 Persistent Course Material

This year (2014/15) we will be combining Accellera and Parallella !

We will use a high-level simulation model of this hardware platform and compare energy and performance figures predicted by the model with measurements on the cards.

I hope you have all looked through Vacation Slide Pack. We will review all of this in the first session. Please let me know which slides you'd like to go through in more detail.

Please also sign up for or browse the online magazine: DESIGN AND REUSE.

This year we will be looking at energy measurements on the Parallella card, but before we dive in to that we all need to comfortable with the 2.0-TOY-ESL SYSTEM and this will be our target for the first week or so. We will perhaps try to do some of the work plan of spEEDO2.

Local Online Resources (For ACS)

  • Set up paths and find local resources using TOOL INFO

  • A very useful resource is the lecture notes for the part II SoC course. Last year's slides are HERE. The slides for this year will be roughly the same, but with less emphasis on low-level SystemC and more emphasis on energy modelling.

  • We all need to comfortable with the 2.0-TOY-ESL SYSTEM and this will be our target for the first week or so.

  • Documents and Reference Materials (members of Computer Lab only) DOCUMENTS FOLDER.

  • Online magazine: DESIGN AND REUSE.

  • Design and Layout of 8-bit Kogge Stone Adder 8 Pages (PDF).

  • MOSIS AMI 0.5 micron Cell Library 98 Pages (PDF).

Week-by-Week

  • Week 1: In the first week, please work through the TOY ESL classes 1 to 4 (for now you can leave the last few sections that address TLM) then take a start at the first Assessed Exercise which is the same as the one from last year.

  • Week 2: We covered the basic TLM coding style in SystemC.

  • Week 3: We will talk about annotated TLM coding styles for timing. We will have a brief discussion of the first four papers on the reading list.

  • Week 4: Energy estimation techniques: we'll go through slide pack 4.2 Power. We'll talk about the papers on the reading list. I'll make an Introduction and Demo of the Parallella/Zynq BTLM SystemC.

  • Week 5: (13th Feb). Look at reading list. Look at BTLM simulator. Discuss experiments.

  • Week 6: (20th FEB) Assertion-Based Design (ABD).

  • Week 7: HLS perhaps

  • Week 8:

Zynq BTLM Reference System

The main testbed for this course is a SystemC TLM2.0 model of the Parallella card. This can run linux but we will mostly run bare metal for simplicity.

The code can be checked out from git if I grant you permission. The git repo (containing x86_64, Zynq, MIPS64 and OpenRISC) is

git clone https://bitbucket.org/prazorvhls/prazor-virtual-platform

The code is also installed on the Computer Lab file server and can copied or linked to at

/usr/groups/han/clteach/btlm/prazor-virtual-platform/vhls/src/arm
.

Instructions will be added on this link PRAZOR TEMPORARY MANUAL. tlm parallella model

Reading List

Reading list is on this page: Reading List.


THE MATERIAL BELOW THIS LINE IS FROM PREVIOUS YEARS. IT WILL BE EDITED AND PLACED ABOVE THE LINE AS WE PROCEED.

OLDER 2013/14 Information

The main practical work will use the 'OR1200 Blocking TLM Testbed' and a new version of that will be installed (from git this year) on about 20th January 2014.

Possible investigations for this year:

  • Energy efficiency of hardware transactional memory.
  • Getting accurate per-process energy use figures for a multithreaded operating system (with hardware accelerators in use).
  • Exporting energy use measurements through the OS so that an application can optimise itself.

Slide Packs

In general: please read through the next set of slides in advance of the lecture session and be prepared to say which bits you want lectured in detail. If this is all of it then that's ok!

Practical Systems

Toy ESL System

Please first become familiar with this very simple system. It is a stepping stone to the main OR1200 system: TOY ESL. It can be copied from the filesystem at /usr/groups/han/clteach/socdam/toyclasses.

Main OR1200 Blocking TLM Testbed



We will use this blocking TLM implementation as the basis for the majority of the exercises.

  • The main binary for the simulator is /groups/han/clteach/btlm-baseline/vhls/src/or1ksmp/

  • It is sensible to set up a link to the provided TLM resources such as
    PRAZOR=/usr/groups/han/clteach/btlm-baseline
    

  • The precompiled binary for the simulator can then be found in
    SIMULATOR=$(PRAZOR)/vhls/src/or1ksmp/vhls-or1ksmp
    But note you will later modify the simulator and some other contents of the or1ksmp folder so its good to take a copy of this in your own file space. Report files cannot be generated and so you will get protection errors if you try to run the simulator in my folder.

  • Some get-started demo programs are compiled for the simulator in here:
    SW=$(PRAZOR)/openrisc-sw

  • The simplest possible application is hello world found in
    $(SW)/hello-world.

    But note THIS DOES NOT USE LIBC AND SO PRINTF AND PTHREADS WILL NOT WORK.

  • A more interesting application is a machine code monitor found in
    $(SW)/mixbug

    that uses a toy version of libc.

  • Other programs use uClibc which is a full-featured version of the C library (but the filing system backdoors are not implemented). Command line args and env variables may be accessed as usual for C programs.

  • Other applications and the SPLASH-2 benchmarks and the linux kernel can also be run if you like. Please ask.

A minimal first-step to using the precompiled system is as follow (note miss out the paren around macro names outside of makefiles):

export PRAZOR=/usr/groups/han/clteach/btlm-baseline
export SW=$PRAZOR/openrisc-sw
export SIMULATOR=$PRAZOR/vhls/src/or1ksmp/vhls-or1ksmp
cp $PRAZOR/vhls/src/or1ksmp/Makefile.djg .
make -f Makefile.djg

Adjust the IMAGE setting in the Makefile to run some other programs or now start to write your own.

Other scraps for discussion

Scraps for this year so far (will be reorganised) :

Local Online Resources (For ACS)

  • Preparatory Work (SystemC): Please become familar with this material over the Michaelmas Vacation so that it can be covered fairly rapidly in the first week (or two) of the Lent Term 2013: PREPARATION.

  • LG 1 slides (not all will be used):       1.1-RTL,       1.2-SystemC-Basic,       1.3-SoC-Parts-RTL.

  • LG 2 slides:       2.0-TOY-ESL,       2.1-ESL,

  • LG 4 slides:       4.1 Bus NoC - I'll just present the switch fabrics quickly so we can discuss contention modelling, then the DRAM slides,       4.2 Power.

  • LG 3 slides (lectured after LG4):       3.2 Higher-level design etc.(HLS), Week ?? slides: Safety Critical Systems and Fault Tolerance: PERHAPS TO BE ADDED.

Investigations for 2013

  1. Single-core algorithms are not the best on parallel architectures? We shall investigate how well the textbook algorithms work on contemporary and future architectures:
    • Tree Quicksort (Guy Belloch, CMU) ML FORM.
    • Polyphase radix sort. From the SPLASH benchmarks.

  2. Virtual machine versus real machine - what takes the most power when L1 I-cache struggles ? I will provide you with an application program coded in three ways: bytecode for the dotnet VM, a natively compiled version and a compiled to gates form. Which uses the least power or scales the best as cores are added? Is it worth having hardware support for the dotnet VM ?

  3. Repeat last year's exercise (power consumption for CRC co-processor compared with software).

Other (sketchy) Investigations for 2013

These are some of my research ideas that we can perhaps take further as exercises? Basically you will explore the behaviour of some application as it is partitioned differently: using various processor cores or hardware assist. You will perhaps be allocated tasks in pairs, with one writing the hardware and the other the software. This means you have to agree on the specification in advance.

  • RTL is King ? The King is dead? For too long RTL has been the narrow waist of chip design, connecting the back-end flow to the front-end flow. Can we do place and route while re-pipelining the design ?

  • Single-core algorithms are not the best on parallel architectures? We shall investigate how well the textbook algorithms work on contemporary and future architectures:

  • Anomalies in the ASUS mother board power probe? Can we get to the bottom of it (needs a kernel hacker)?