Computer Laboratory

    Course Home

ACS P35 Persistent Course Material

Lent Term 2016/17

Under construction ... in late Dec 2016 and early Jan 2017.

There will be a warm up tool run to try do before the first session: I will email those reading the module and post it here shortly.

This year we shall concentrate on modelling software accelerators that are generated by High-Level Synthesis (HLS) and peforming high-level modelling of the composite system consisting of conventional CPU cores and custom accelerators.

Specifically we will be exploring the tools and techniques for solving the following problem:

I want to compare a number of hardware accelerator designs for a problem that has two parts, one being CPU bound and the other being DRAM random access bound. I want to use high-level synthesis to generate my accelerators. The problem specification is not likely to change. How can I work out the energy use and performance advantage I get from my accelerators pre tape-out ?


Week-by-Week Lent Term 2015/2016

  • Week 1: In the first week, please work through the 2.0-TOY-ESL SYSTEM TOY ESL classes, (you can leave the final parts that use TLM until week 2 if you wish) then make a start at the first Assessed Exercise which will be the same as the one from last year. Please read the entries on this year's Reading List 1 and be ready to discuss them briefly at this week's session on 22nd Jan, probably leaving fuller discussion to next week's session.

  • Week 2: We (tried to) define the terms Reconfigurable Computing and Hardware Acceleration. We will talk about TLM coding for high-performance system modelling. We will look at how the Toy ESL system implemented TLM in its final classes. We looked at the centre slides in the ESL-2.1 Slide Pack. We also discussed the innards of a fax machine.

  • Week 3: We will each do 5 minute presentations of a selected paper from Reading List 1. We will introduce the Prazor/VHLS virtual platform as configured for the Zynq platform.

  • Week 4: A lecture on protocols, interfaces, interconnect styles and primitives and glue logic synthesis. We will create accounts on the parallella card and run the same binary on the simulator as the real platform. Slide packs: 4.1 Bus NoC and the cross-product synthesis method.

  • Week 5: (12th Feb): IP-block data sheets and test programmes: IP-XACT, OVM/UVM, Assertion-Based Design (ABD) and integration with Chisel discussions.

  • Week 6: (19th Feb): Higher-level design and High-Level Synthesis (HLS).

  • Week 7: (26th Feb): Performance Prediction - High-level modelling. See also the end of SP6-ESL undergraduate lectures.

  • Week 8: (4th March): Reconfigurable Computing

Lent 2016 Reading Lists

Reading lists will be on this page: Reading Lists.

  • Documents and Reference Materials (members of Computer Lab only) DOCUMENTS FOLDER.

Main Virtual Platform: Zynq BTLM Reference System

The main virtual platform for this course this year is again a SystemC TLM2.0 model of the ZedBoard or Parallella card. This can run linux but we will mostly run bare metal for simplicity.

The main platform is checked out on the file server or you can build your own live copy. If you are not a C++ and automake expert it can be a struggle to build so seek help. You will need to be able to edit it, so having at least your own makefile for the final arm7smp binary will be needed. If you give me your bitbucket user identifier you can clone a private copy of the virtual platform.

To run a demo of the checked out version, you can type 'make' in the following folder, but it may fail as you do not have relevant write permissions. I'll provide further help on this shortly.

Further instructions will be added on this link PRAZOR TEMPORARY MANUAL.

$ export PRAZOR=/usr/groups/han/clteach/btlm/pvp3/prazor-virtual-platform
$ export PRAZOR=/usr/groups/han/clteach/btlm/current/prazor-virtual-platform

$ cd $PRAZOR/vhls/images/hello-world
$ export TARCH=ARM32
$ make

To checkout and build your own copy it is best to clone the source code. It is not on open source release yet. You will need a userid at BitBucket and if you pass that on to me (djg) I will grant you access permission. The git repo (containing ARM32/Zynq, x86_64, MIPS64 and OpenRISC) is

To build, use something like

   git clone
   export CLTEACH=/usr/groups/han/clteach
   export BOOST=$CLTEACH/boost/boost_1_48_0
   export SYSTEMC=$CLTEACH/systemc/systemc-current
   export TLM_POWER3=$CLTEACH/tlm-power3
   export LDFLAGS="-L$SYSTEMC/lib-linux64 -L/usr/local/lib -L$TLM_POWER3/src/.libs"
   export CXXFLAGS="-I$SYSTEMC/include/ -I$BOOST_ROOT -I$SYSTEMC/include/tlm_core/tlm_2 -I$TLM_POWER3/include -g -O2"
   cd prazor-virtual-platform/vhls
   automake --add-missing
tlm parallella model

Lent 2016: Possible investigations ?

The topic(s) for this year's investigations will be resolved by Week 4.

  • OVM/UVM and Chisel Integration: Chisel is new and powerful. How can it be incorporated in industry-standard verification methodologies like OVM/UVM ?

  • Adapteva Epiphany Chip Modelling for Energy and Performance: We have this chip on a the Parallella card with supply instrumentation. We can run experiments for real and on a virtual platform, but we have to freshly incoroporate Epiphany ISS into the platform since this is not modelled.

  • Model Checking Data Flows: PSL concerns itself with events not abstract data flows. Lets write some assertions about data consistency for simple components like a FIFO.

  • Risc-V workshop: Design and perhaps implement a high-level model of the Orca platform.


ACS P35 Persistent Course Material - Last Year

This year (2014/15) we will be combining Accellera and Parallella !

We will use a high-level simulation model of this hardware platform and compare energy and performance figures predicted by the model with measurements on the cards.

I hope you have all looked through Vacation Slide Pack. We will review all of this in the first session. Please let me know which slides you'd like to go through in more detail.

Please also sign up for or browse the online magazine: DESIGN AND REUSE.

This year we will be looking at energy measurements on the Parallella card, but before we dive in to that we all need to comfortable with the 2.0-TOY-ESL SYSTEM and this will be our target for the first week or so. We will perhaps try to do some of the work plan of spEEDO2.

Local Online Resources (For ACS)

  • Set up paths and find local resources using TOOL INFO

  • A very useful resource is the lecture notes for the part II SoC course. Last year's slides are HERE. The slides for this year will be roughly the same, but with less emphasis on low-level SystemC and more emphasis on energy modelling.

  • We all need to comfortable with the 2.0-TOY-ESL SYSTEM and this will be our target for the first week or so.

  • Documents and Reference Materials (members of Computer Lab only) DOCUMENTS FOLDER.

  • Online magazine: DESIGN AND REUSE.

  • Design and Layout of 8-bit Kogge Stone Adder 8 Pages (PDF).

  • MOSIS AMI 0.5 micron Cell Library 98 Pages (PDF).

  • Zynq 7000 Technical Reference Manual (PDF)

  • Zynq 7000 Documents Folder.

Week-by-Week 2015

  • Week 1: In the first week, please work through the TOY ESL classes 1 to 4 (for now you can leave the last few sections that address TLM) then take a start at the first Assessed Exercise which is the same as the one from last year.

  • Week 2: We covered the basic TLM coding style in SystemC.

  • Week 3: We will talk about annotated TLM coding styles for timing. We will have a brief discussion of the first four papers on the reading list.

  • Week 4: Energy estimation techniques: we'll go through slide pack 4.2 Power. We'll talk about the papers on the reading list. I'll make an Introduction and Demo of the ZedBoard/Parallella/Zynq BTLM SystemC.

  • Week 5: (13th Feb). Look at reading list. Look at BTLM simulator. Discuss experiments.

  • Week 6: (20th FEB) Assertion-Based Design (ABD).

  • Week 7: (27th Feb) High Level Synthesis (HLS) and Reconfigurable Computing: 3.2 Higher-level design etc.(HLS).

  • Week 8: TBD.


OLDER 2013/14 Information

The main practical work will use the 'OR1200 Blocking TLM Testbed' and a new version of that will be installed (from git this year) on about 20th January 2014.

Possible investigations for this year:

  • Energy efficiency of hardware transactional memory.
  • Getting accurate per-process energy use figures for a multithreaded operating system (with hardware accelerators in use).
  • Exporting energy use measurements through the OS so that an application can optimise itself.

Slide Packs

In general: please read through the next set of slides in advance of the lecture session and be prepared to say which bits you want lectured in detail. If this is all of it then that's ok!

Practical Systems

Toy ESL System

Please first become familiar with this very simple system. It is a stepping stone to the main OR1200 system: TOY ESL. It can be copied from the filesystem at /usr/groups/han/clteach/socdam/toyclasses.

Main OR1200 Blocking TLM Testbed

We will use this blocking TLM implementation as the basis for the majority of the exercises.

  • The main binary for the simulator is /groups/han/clteach/btlm-baseline/vhls/src/or1ksmp/

  • It is sensible to set up a link to the provided TLM resources such as

  • The precompiled binary for the simulator can then be found in
    But note you will later modify the simulator and some other contents of the or1ksmp folder so its good to take a copy of this in your own file space. Report files cannot be generated and so you will get protection errors if you try to run the simulator in my folder.

  • Some get-started demo programs are compiled for the simulator in here:

  • The simplest possible application is hello world found in


  • A more interesting application is a machine code monitor found in

    that uses a toy version of libc.

  • Other programs use uClibc which is a full-featured version of the C library (but the filing system backdoors are not implemented). Command line args and env variables may be accessed as usual for C programs.

  • Other applications and the SPLASH-2 benchmarks and the linux kernel can also be run if you like. Please ask.

A minimal first-step to using the precompiled system is as follow (note miss out the paren around macro names outside of makefiles):

export PRAZOR=/usr/groups/han/clteach/btlm-baseline
export SW=$PRAZOR/openrisc-sw
export SIMULATOR=$PRAZOR/vhls/src/or1ksmp/vhls-or1ksmp
cp $PRAZOR/vhls/src/or1ksmp/Makefile.djg .
make -f Makefile.djg

Adjust the IMAGE setting in the Makefile to run some other programs or now start to write your own.

Other scraps for discussion

Scraps for this year so far (will be reorganised) :

Local Online Resources (For ACS)

  • Preparatory Work (SystemC): Please become familar with this material over the Michaelmas Vacation so that it can be covered fairly rapidly in the first week (or two) of the Lent Term 2013: PREPARATION.

  • LG 1 slides (not all will be used):       1.1-RTL,       1.2-SystemC-Basic,       1.3-SoC-Parts-RTL.

  • LG 2 slides:       2.0-TOY-ESL,       2.1-ESL,

  • LG 4 slides:       4.1 Bus NoC - I'll just present the switch fabrics quickly so we can discuss contention modelling, then the DRAM slides,       4.2 Power.

  • LG 3 slides (lectured after LG4):       3.2 Higher-level design etc.(HLS), Week ?? slides: Safety Critical Systems and Fault Tolerance: PERHAPS TO BE ADDED.

Investigations for 2013

  1. Single-core algorithms are not the best on parallel architectures? We shall investigate how well the textbook algorithms work on contemporary and future architectures:
    • Tree Quicksort (Guy Belloch, CMU) ML FORM.
    • Polyphase radix sort. From the SPLASH benchmarks.

  2. Virtual machine versus real machine - what takes the most power when L1 I-cache struggles ? I will provide you with an application program coded in three ways: bytecode for the dotnet VM, a natively compiled version and a compiled to gates form. Which uses the least power or scales the best as cores are added? Is it worth having hardware support for the dotnet VM ?

  3. Repeat last year's exercise (power consumption for CRC co-processor compared with software).

Other (sketchy) Investigations for 2013

These are some of my research ideas that we can perhaps take further as exercises? Basically you will explore the behaviour of some application as it is partitioned differently: using various processor cores or hardware assist. You will perhaps be allocated tasks in pairs, with one writing the hardware and the other the software. This means you have to agree on the specification in advance.

  • RTL is King ? The King is dead? For too long RTL has been the narrow waist of chip design, connecting the back-end flow to the front-end flow. Can we do place and route while re-pipelining the design ?

  • Single-core algorithms are not the best on parallel architectures? We shall investigate how well the textbook algorithms work on contemporary and future architectures:

  • Anomalies in the ASUS mother board power probe? Can we get to the bottom of it (needs a kernel hacker)?