This is the old version - for reference only.

SoCDAM OLD Learners' Guide

This document was updated after each lecture to summarise the important points covered. Things are in flux because this is the first time this course has been given! After the last lecture this Learners' Guide will be expanded with a full, local only, Raven access, web-page for each lecture.

If people email me questions then I am more than happy to answer them and I expect to place both the questions and the answers on this FAQ.

Lecture 1 (Friday)

Introduction to the course and what a SoC is

Review/revision of Verilog: structural RTL, synthesisable RTL and non-synthesisable constructs. Ideally, anything that is finite-state ought to be synthesisable. Ideally, the programmer should not be forced into such low-level expression or into excessively-parallel parallel thought patterns.

Lecture 2 (Monday 16th February 2009)

Continued with the RTL topic (LG1 - RTL)

An RTL program can be used both for simulation and synthesis.

The abstract syntax tree for synthesisable RTL supports a rich set of expression operators but just the assignment and branching commands (no loops). (Loops in synthesisable VHDL and Verilog are restricted to so-called structural generation statements that are fully unwound by the compiler front end and so have no data-dependent exit conditions).

Simulation uses event-driven simulation. Pointer to a toy implementation in ML. When using zero-delay models, we can use the compute/commit paradigm that requires the EDS kernel to be augmented to support delta cycles.

Synthesis involves converting to a parallel form with one right-hand-side expression per variable. Then converting each expression to a logic tree, preferably taking into account sub-expression sharing and user speed/power/area requirements. Pointers to ML fragments that implement a basic form of both halves of this process.

RTL synthesis tools are not normally expected to re-time a design, or alter the amount of state or state encodings. Newer languages and flows (such as Bluespec and Kiwi) still encourage the user to express a design in parallel terms, yet provide easier to use constructs with the expectation that detailed timing and encoding might be chosen by the tool.

Some blank looks when I said 'barrel shifter is easy enough' so here is its: ML fragment

Lecture 3 (Wednesday 18th February 2009)

Finished RTL topic (LG1 - RTL) with discussion of memories, pipeline hazards and automatic retiming.

Hazards with RAMs. RTL requires manual schedulling. FPGA tools do something automatically. Multipliers and FP units also typically present hazards. Retiming the program with holding registers is needed.

Notes on RAM testing: use embedded software, wafer probe or dedicated BIST logic.

Re-timing also helpful for timing closure. D-type migration transforms. RTL is not as expressive for algorithms as software. Higher-level entry forms are ideally needed, perhaps schedulling within a thread at compile-time and between threads at run time ?

Started LG2 SystemC. It's history, overview, and how to use SystemC on the PWF.

Looked at sc_int integers of non-native C precision and structural net list example.

Lecture 4 (Friday 20th February 2009)

Continued LG2: SystemC.

Comparing SC_THREADS with trampoline-style methods we can see the two main programming styles: blocking and non blocking.

For faster system modelling, we do not want to enter EDS kernel for every change of every net: let's pass larger objects around, or even send threads between components, like S/W does.

We can pass abstract data types inside an sc_signal if it implements the correct compute/commit virtual methods.

Alternatively, let's implement the OO S/W concept of adding an interface to a component by inheritance: hmmm we soon run in to the well-known OO problem with multiple instances of an interface: not often needed for S/W but common enough in H/W designs.

How can we convert between S/W and H/W styles: we will need a transactor. This is a small software entity that converts between the two modelling styles.

(NB: Question SC2 part c cannot be answered by everyone because the future developments material was not lectured.)

Lecture 5 (Monday 23th February 2009)

Started LG3: System Design.

Presented overview of bus structure in an example SoC. Spoke about allocation of devices to busses.

Looked at historic processor A16/D8 processor core and its address decode.

Then looked at simple I/O blocks and how they implement target-side, RTL-style bus interfaces and how they generate interrupts. Looked at circular buffer device driver for UART.

Lecture 6 (Wednesday 25th February 2009)

Look at further I/O blocks. Discuss overrun/underrun and the desire to minimise staging RAMs and FIFOs.

Look at details of an address decoder and simple bus structure in SystemC, with a full, worked example called {\tt nominalprocessor.} Then a bridged bus.

Lecture 7 (Friday 27th February 2009)

Temporal decoupling of requests and acks for crossing clock domains (and also network on chip later): bus must have elidable idle states between transactions. (NB: This is not the same as the temporal decoupling in ESL modelling.)

Look at LG4 ESL: look at the history of ESL, with firmware and behavioural models being two types of IP divided from each other despite being in a common language. Look at architecutal exploration using mixed-abstraction models.

See the ESL version of nominalprocessor making TLM calls to one memory, a TLM-style busmux and spoke about externally transacted calls to a pin-level RTL second memory.

Looked at blocking and non-blocking transaction styles and spoke briefly of time annotations to the various phases of the non-blocking style to give a detailed, approximately-timed system.

Then looked at much looser timing with the ESL models running ahead of each other and of the global simulation time, each keeping track of its local offset in a variable called delta and checkpointing with the EDS kernel when necessary or at intervals of a time quantum. Having a large quantum can expose design bugs (good) and leads to a fast model.

Lecture 8 (Monday 2nd March 2009)

Started LG 5: Assertion based design, talking in general terms about saftey and liveness properties of protocols, but emphasised that data conservation was something else we might want to reason about.

Had a brief tour of PSL in the abstract. Emphasised that a SERES should normally be used inside a suffix implication.

Future Lectures: Rough Time Plan and Narrative

Lecture 9 (Weds 4th March 2009)

Assertion based design: looked at some PSL examples online and other tools. Looked at combinational and sequential equivalence checking and mentioned a reachable state space checker. These formal methods can all be implemented in many ways, with different algorithms and explicit-state or symbolic-state representations.

Dwell on Formality

Lecture 10 (Friday 6th March 2009)

LG6: Spoke about wiring complexity and throughput, dependent on traffic pattern in simple bus, arbitrated bus, bridged bus and network on a chip structures.

Spoke about dynamic clock gating, as a pre-cursor to dynamic clock frequency and voltage scaling. Forgot to mention that clock gating is inserted by automatic tools!

The JTAG and test vector topics were not covered this year: lack of time. NB static timing analyser should be covered next year too.

A cell library in the public domain: TANNER AMI.

Lecture 11 (Monday 9th March 2009)

Will seek email suggestions for topics to go over again or visit for the first time in the last lecture: please email David.Greaves@cl.cam.ac.uk

Plan to cover LG 8.

Spoke about clock and power control: both can be switched on/off with global switches under manual of software control: useful for multi-function platform chips. Also, both can be varied over a linear range of frequency and voltage: must keep the voltage sufficient for the clock frequency using the technology de-rating functions (approximately linear). So a typical SoC uses not only dynamic clock gating, but also manual and automatic frequency and voltage variation.

Lecture 12 (Weds 11th March 2009)

Revision lecture: some topics may have not been clear or been covered too quickly in earlier lectures.

Plan perhaps also to cover some of LG 7 (as time permits) : No Printed Notes and Not Examinable This Year.

Worked and Running Examples

The material covered this year is organised into six classes.

1. Structural Netlists in SystemC.

2. The simple FIFO example from the SystemC Library.

3. The nominal processor ISS, extended as an RTL component.

4. The nominal processor ISS, extended as a loose-timed TLM component (TLM 1.0 style).

5. Two nominal processors plus bus arbiter.

6. DMA Controller. PRACTICALS.