Computer Laboratory

    Course Home

ACS P35 Persistent Course Material

Lent Term 2016/17

Take a mental note of this page's URL. Please dont get mixed up with the SoC D&M Bachelors' course. Some links lead away from the P35 course to that material and the 'Home' links will then take you to the undergraduate course pages of a very similar name.

Useful Links

The outline plan for 2016/17: Plan17

This year we shall concentrate on modelling software accelerators that are generated by High-Level Synthesis (HLS) and peforming high-level modelling of the composite system consisting of conventional CPU cores and custom accelerators.

Specifically we will be exploring the tools and techniques for solving the following problem:

I want to compare a number of hardware accelerator designs for a problem that has two parts, one being CPU bound and the other being DRAM random access bound. I want to use high-level synthesis to generate my accelerators. The problem specification is not likely to change. How can I work out the energy use and performance advantage I get from my accelerators pre tape-out ?
  1. Identify a problem that needs accelerating (database hashing?).
  2. Code up a toy demo of the inner loops in C,
  3. Run that C on the Prazor virtual platform, perhaps using several cores,
  4. Code the inner loops in C# and synthesise using Kiwi to SystemC and RTL,
  5. Add the systemC module to the Prazor platform,
  6. Optionally deploy it on the Zynq cards and see whether Prazor predicted the performance.

This Year - Week-by-Week Lent Term 2016/2017

  • P35 2016/17 Preliminary Preparation (MichVac)
    Before the first session please go to MichVac Preparation.

  • Week 1 - 20th Jan 2017: This week in the formal session we shall
    1. Say hello to each other and discuss PLAN17.
    2. Look at this diagram LINK
    3. Discssus FPGA in the cloud: Look at the Amazon EC2 F1 announcement LINK.
    And afterwards
    1. Please work through the SystemC and 2.0-TOY-ESL SYSTEM TOY ESL classes, (you can leave the final parts that use TLM until week 2 if you wish).
    2. Please read the entries on this year's Reading List 1 and be ready to discuss them briefly at next week's session, probably leaving fuller discussion to a following session.
    3. Review the first Assessed Exercise (not online yet).

  • Week 2 - 28th Jan 2017:
    1. We will talk about TLM coding for high-performance system modelling and the Toy ESL system implemented TLM in its final classes (using the centre slides in the ESL-2.1 Slide Pack).
    2. We will briefly discuss the main performance gain advantages and energy saving approaches exploited by hardware accelerators (slide pack 4.2 Power may be useful).
    3. We will briefly look at Reading List 1 in preparation for a group discussion next week.
    4. We will talk about the Assessed Exercise which will be mostly like last year's one.

  • Week 3 - 3rd Feb 2017:
    1. Quick Feedback on Assessed Exercise 1
    2. Reading List 1 discussion
    3. Introduction to the Prazor virtual Virtual Platform.
    4. Practical work: create your own ELF bare-metal binary (variant of Hello World) and run it on the pre-built Prazor simulator.

  • Week 4 - 10th Feb 2017:

    We will not have any formal reading or lecturing this week.

    Instead we will ensure we have mastered some basic practical skills needed for Exercise 3. You don't need to all gain all of them provided you are happy to help each other out. The skills are:

    1. P0: Compile an ARM binary and run the same .o file on Prazor as on the Zynq cards. You will need to link edit differently and also loop 1000 times on the real card to get a measurable execution time for something reasonable on Prazor. The energyshim.c file is useful to wrap around your program to collect performance and energy figures on both platforms. (Postscript: the Hello Word program was supposedly as simple as possible. Other builds in the images folder are more realistic and a better starting point for your own work: e.g. beebs or dfsin.)
    2. P1: compile your own version of Prazor and include a PERIPHERAL_DEVICE or another such tiny TLM model of your own design that you can demonstrate programmed-I/O access to/from your own C program.
    3. P2: install the Verilog RTL version of PERIPHERAL_DEVICE (or whatever) in the Zynq FPGA and demonstrate programmed I/O access to it from the ARM. You can start from scratch using the Vivado GUI or else use the ksubs2 Makefile which is mouse free. Replace the Kiwi-generated example RTL design in my folder with your own RTL target.

    Also, check you can log on to parcard-djg3.sm and parcard-djg1.sm - I will hand out passwords.

    Next week we will generate a SystemC model and the equivalent RTL implementation from HLS of an example application.

  • Week 5 - 17th Feb 2017:
    1. Quick Feedback on Assessed Exercise 1
    2. HLS Lecture: Higher-level design and High-Level Synthesis (HLS).
    3. Kiwi HLS - tool set up and demo - (SystemC output too?)
    4. Exercise 3 planning: Perhaps based on Design of power-efficient parallel pipelined Bloom filter by Deokho Kim.

  • Week 6 - 24th Feb 2017:
    1. Reading list 2 ...

OLD MATERIAL FOLLOWS FOR THE REST OF THIS PAGE:

Last Year - Week-by-Week Lent Term 2015/2016

  • Week 1: In the first week, please work through the 2.0-TOY-ESL SYSTEM TOY ESL classes, (you can leave the final parts that use TLM until week 2 if you wish) then make a start at the first Assessed Exercise which will be the same as the one from last year. Please read the entries on this year's Reading List 1 and be ready to discuss them briefly at this week's session on 22nd Jan, probably leaving fuller discussion to next week's session.

  • Week 2: We (tried to) define the terms Reconfigurable Computing and Hardware Acceleration. We will talk about TLM coding for high-performance system modelling. We will look at how the Toy ESL system implemented TLM in its final classes. We looked at the centre slides in the ESL-2.1 Slide Pack. We also discussed the innards of a fax machine.

  • Week 3: We will each do 5 minute presentations of a selected paper from Reading List 1. We will introduce the Prazor/VHLS virtual platform as configured for the Zynq platform.

  • Week 4: A lecture on protocols, interfaces, interconnect styles and primitives and glue logic synthesis. We will create accounts on the parallella card and run the same binary on the simulator as the real platform. Slide packs: 4.1 Bus NoC and the cross-product synthesis method.

  • Week 5: (12th Feb): IP-block data sheets and test programmes: IP-XACT, OVM/UVM, Assertion-Based Design (ABD) and integration with Chisel discussions.

  • Week 6: (19th Feb): Higher-level design and High-Level Synthesis (HLS).

  • Week 7: (26th Feb): Performance Prediction - High-level modelling. See also the end of SP6-ESL undergraduate lectures.

  • Week 8: (4th March): Reconfigurable Computing

Main Virtual Platform: Zynq BTLM Reference System

The main virtual platform for this course this year is again a SystemC TLM2.0 model of the Zynq FPGA on the ZedBoard or Parallella card. This can run linux but we will mostly run bare metal for simplicity.

The main platform is checked out on the file server or you can build your own live copy. If you are not a C++ and automake expert it can be a struggle to build so seek help. You will need to be able to edit it, so having at least your own makefile for the final arm7smp binary will be needed. If you give me your bitbucket user identifier you can clone a private copy of the virtual platform.

To run a simple demo of the checked out version, you can type 'make' in the following folder, but it may fail as you do not have relevant write permissions.

Further instructions are on this link PRAZOR TEMPORARY MANUAL.

 
  $ export PRAZOR=/usr/groups/han/clteach/btlm/current
  $ export STDLD=/usr/lib/gcc-cross/arm-linux-gnueabi/4.7
  $ cd $PRAZOR/vhls/images/hello-world
  $ export TARCH=ARM32
  $ make

2017 Prebuilt

The prebuilt in /usr/groups/han/clteach/btlm/pvp4 but use /usr/groups/han/clteach/btlm/current where current will be updated when changes are made by DJG

You can use the prebuilt to get started, but you will have to compile your own copy by the middle of term.

2017: New Phabriactor Repository

To build from git source you need a user ID on phabricator.xparch.com which DJG can organise.

Short form

    1. Get a user ID and password from me

    2. Login via the web interface (password required) to    http://phabricator.xparch.com
    2a and change your password.
    2b Install your public key on phabricator



    3. Use ssh-agent to put your private key in scope and then

    git clone ssh://vcs@phabricator.xparch.com:22/diffusion/P/vhls.git


    4. cd to vhls and follow the README.md

  
Longer form:
    git clone ssh://vcs@phabricator.xparch.com/diffusion/P/vhls.git
    export CLTEACH=/usr/groups/han/clteach
    export BOOST=$CLTEACH/boost/boost_1_48_0
    export BOOST_ROOT=$BOOST
    export SYSTEMC=$CLTEACH/systemc/systemc-current
    export TLM_POWER3=$CLTEACH/tlm-power3
    export LDFLAGS="-L$SYSTEMC/lib-linux64 -L/usr/local/lib -L$TLM_POWER3/src/.libs"
    export CXXFLAGS="-I$SYSTEMC/include/ -I$BOOST_ROOT -I$SYSTEMC/include/tlm_core/tlm_2 -I$TLM_POWER3/include -g -O2"
    cv vhls
    automake --add-missing
    autoconf
    automake
    ./configure
    make

    

If problems with autoconf, automake or configure, run autoreconf and try again.

Old BitBucket

To checkout and build your own copy it is best to clone the source code. It is not on open source release yet. You will need a userid at BitBucket and if you pass that on to me (djg) I will grant you access permission. The git repo (containing ARM32/Zynq, x86_64, MIPS64 and OpenRISC) is

djg11@bitbucket.org/prazorvhls/prazor-virtual-platform

To build, use something like

   git clone https://djg11@bitbucket.org/prazorvhls/prazor-virtual-platform
   export CLTEACH=/usr/groups/han/clteach
   export BOOST=$CLTEACH/boost/boost_1_48_0
   export BOOST_ROOT=$BOOST
   export SYSTEMC=$CLTEACH/systemc/systemc-current
   export TLM_POWER3=$CLTEACH/tlm-power3
   export LDFLAGS="-L$SYSTEMC/lib-linux64 -L/usr/local/lib -L$TLM_POWER3/src/.libs"
   export CXXFLAGS="-I$SYSTEMC/include/ -I$BOOST_ROOT -I$SYSTEMC/include/tlm_core/tlm_2 -I$TLM_POWER3/include -g -O2"
   cd prazor-virtual-platform/vhls
   automake --add-missing
   autoconf
   automake
   ./configure
   make
tlm parallella model

Lent 2016: Possible investigations ?

The topic(s) for this year's investigations will be resolved by Week 4.

  • OVM/UVM and Chisel Integration: Chisel is new and powerful. How can it be incorporated in industry-standard verification methodologies like OVM/UVM ?
    http://chisel.eecs.berkeley.edu
    http://www.doulos.com/knowhow/sysverilog/uvm/

  • Adapteva Epiphany Chip Modelling for Energy and Performance: We have this chip on a the Parallella card with supply instrumentation. We can run experiments for real and on a virtual platform, but we have to freshly incoroporate Epiphany ISS into the platform since this is not modelled.
    http://www.adapteva.com/epiphanyiv/

  • Model Checking Data Flows: PSL concerns itself with events not abstract data flows. Lets write some assertions about data consistency for simple components like a FIFO.

  • Risc-V workshop: Design and perhaps implement a high-level model of the Orca platform.
    http://riscv.org/workshop-jan2016.html
    http://riscv.org/workshop-jan2016/Wed1200%202016-01-05%20VectorBlox%20ORCA%20RISC-V%20DEMO.pdf

THE MATERIAL BELOW THIS LINE IS FROM PREVIOUS YEARS. SOME OF IT WILL BE RECYCLED AND PLACED ABOVE THE LINE AS WE PROCEED.

ACS P35 Persistent Course Material - Previous Year

This year (2014/15) we will be combining Accellera and Parallella !

We will use a high-level simulation model of this hardware platform and compare energy and performance figures predicted by the model with measurements on the cards.

I hope you have all looked through Vacation Slide Pack. We will review all of this in the first session. Please let me know which slides you'd like to go through in more detail.

Please also sign up for or browse the online magazine: DESIGN AND REUSE.

This year we will be looking at energy measurements on the Parallella card, but before we dive in to that we all need to comfortable with the 2.0-TOY-ESL SYSTEM and this will be our target for the first week or so. We will perhaps try to do some of the work plan of spEEDO2.

Local Online Resources (For ACS)

  • Set up paths and find local resources using TOOL INFO

  • A very useful resource is the lecture notes for the part II SoC course. Last year's slides are HERE. The slides for this year will be roughly the same, but with less emphasis on low-level SystemC and more emphasis on energy modelling.

  • We all need to comfortable with the 2.0-TOY-ESL SYSTEM and this will be our target for the first week or so.

  • Documents and Reference Materials (members of Computer Lab only) DOCUMENTS FOLDER.

  • Online magazine: DESIGN AND REUSE.

  • Design and Layout of 8-bit Kogge Stone Adder 8 Pages (PDF).

  • MOSIS AMI 0.5 micron Cell Library 98 Pages (PDF).

  • Zynq 7000 Technical Reference Manual (PDF)

  • Zynq 7000 Documents Folder.

Week-by-Week 2015

  • Week 1: In the first week, please work through the TOY ESL classes 1 to 4 (for now you can leave the last few sections that address TLM) then take a start at the first Assessed Exercise which is the same as the one from last year.

  • Week 2: We covered the basic TLM coding style in SystemC.

  • Week 3: We will talk about annotated TLM coding styles for timing. We will have a brief discussion of the first four papers on the reading list.

  • Week 4: Energy estimation techniques: we'll go through slide pack 4.2 Power. We'll talk about the papers on the reading list. I'll make an Introduction and Demo of the ZedBoard/Parallella/Zynq BTLM SystemC.

  • Week 5: (13th Feb). Look at reading list. Look at BTLM simulator. Discuss experiments.

  • Week 6: (20th FEB) Assertion-Based Design (ABD).

  • Week 7: (27th Feb) High Level Synthesis (HLS) and Reconfigurable Computing: 3.2 Higher-level design etc.(HLS).

  • Week 8: TBD.

THE MATERIAL BELOW THIS LINE IS FROM 2013/14 AND PREVIOUS YEARS. IT WILL BE RECYCLED AND PLACED ABOVE THE LINE AS WE PROCEED.

OLDER 2013/14 Information

The main practical work will use the 'OR1200 Blocking TLM Testbed' and a new version of that will be installed (from git this year) on about 20th January 2014.

Possible investigations for this year:

  • Energy efficiency of hardware transactional memory.
  • Getting accurate per-process energy use figures for a multithreaded operating system (with hardware accelerators in use).
  • Exporting energy use measurements through the OS so that an application can optimise itself.

Slide Packs

In general: please read through the next set of slides in advance of the lecture session and be prepared to say which bits you want lectured in detail. If this is all of it then that's ok!

Practical Systems

Toy ESL System

Please first become familiar with this very simple system. It is a stepping stone to the main OR1200 system: TOY ESL. It can be copied from the filesystem at /usr/groups/han/clteach/socdam/toyclasses.

Main OR1200 Blocking TLM Testbed



We will use this blocking TLM implementation as the basis for the majority of the exercises.

  • The main binary for the simulator is /groups/han/clteach/btlm-baseline/vhls/src/or1ksmp/

  • It is sensible to set up a link to the provided TLM resources such as
    PRAZOR=/usr/groups/han/clteach/btlm-baseline
    

  • The precompiled binary for the simulator can then be found in
    SIMULATOR=$(PRAZOR)/vhls/src/or1ksmp/vhls-or1ksmp
    But note you will later modify the simulator and some other contents of the or1ksmp folder so its good to take a copy of this in your own file space. Report files cannot be generated and so you will get protection errors if you try to run the simulator in my folder.

  • Some get-started demo programs are compiled for the simulator in here:
    SW=$(PRAZOR)/openrisc-sw

  • The simplest possible application is hello world found in
    $(SW)/hello-world.

    But note THIS DOES NOT USE LIBC AND SO PRINTF AND PTHREADS WILL NOT WORK.

  • A more interesting application is a machine code monitor found in
    $(SW)/mixbug

    that uses a toy version of libc.

  • Other programs use uClibc which is a full-featured version of the C library (but the filing system backdoors are not implemented). Command line args and env variables may be accessed as usual for C programs.

  • Other applications and the SPLASH-2 benchmarks and the linux kernel can also be run if you like. Please ask.

A minimal first-step to using the precompiled system is as follow (note miss out the paren around macro names outside of makefiles):

export PRAZOR=/usr/groups/han/clteach/btlm-baseline
export SW=$PRAZOR/openrisc-sw
export SIMULATOR=$PRAZOR/vhls/src/or1ksmp/vhls-or1ksmp
cp $PRAZOR/vhls/src/or1ksmp/Makefile.djg .
make -f Makefile.djg

Adjust the IMAGE setting in the Makefile to run some other programs or now start to write your own.

Other scraps for discussion

Scraps for this year so far (will be reorganised) :

Local Online Resources (For ACS)

  • Preparatory Work (SystemC): Please become familar with this material over the Michaelmas Vacation so that it can be covered fairly rapidly in the first week (or two) of the Lent Term 2013: PREPARATION.

  • LG 1 slides (not all will be used):       1.1-RTL,       1.2-SystemC-Basic,       1.3-SoC-Parts-RTL.

  • LG 2 slides:       2.0-TOY-ESL,       2.1-ESL,

  • LG 4 slides:       4.1 Bus NoC - I'll just present the switch fabrics quickly so we can discuss contention modelling, then the DRAM slides,       4.2 Power.

  • LG 3 slides (lectured after LG4):       3.2 Higher-level design etc.(HLS), Week ?? slides: Safety Critical Systems and Fault Tolerance: PERHAPS TO BE ADDED.

Investigations for 2013

  1. Single-core algorithms are not the best on parallel architectures? We shall investigate how well the textbook algorithms work on contemporary and future architectures:
    • Tree Quicksort (Guy Belloch, CMU) ML FORM.
    • Polyphase radix sort. From the SPLASH benchmarks.

  2. Virtual machine versus real machine - what takes the most power when L1 I-cache struggles ? I will provide you with an application program coded in three ways: bytecode for the dotnet VM, a natively compiled version and a compiled to gates form. Which uses the least power or scales the best as cores are added? Is it worth having hardware support for the dotnet VM ?

  3. Repeat last year's exercise (power consumption for CRC co-processor compared with software).

Other (sketchy) Investigations for 2013

These are some of my research ideas that we can perhaps take further as exercises? Basically you will explore the behaviour of some application as it is partitioned differently: using various processor cores or hardware assist. You will perhaps be allocated tasks in pairs, with one writing the hardware and the other the software. This means you have to agree on the specification in advance.

  • RTL is King ? The King is dead? For too long RTL has been the narrow waist of chip design, connecting the back-end flow to the front-end flow. Can we do place and route while re-pipelining the design ?

  • Single-core algorithms are not the best on parallel architectures? We shall investigate how well the textbook algorithms work on contemporary and future architectures:

  • Anomalies in the ASUS mother board power probe? Can we get to the bottom of it (needs a kernel hacker)?