ACS SOC D/M P35: Ex 5 Mini-Project and Research Essay

Title:  System On Chip Design and Modelling (50 Credits).

The deadline for this work is the first day of Easter Term (Tues 26th
April 2011). Please feel free to submit (once only) a draft or outline
version for feedback one or two weeks earlier.


Part A: (Mini-project II): Give an overview of a multi-core system on
chip architecture that you have implemented and tested using
SystemC TLM modelling or another ESL methodology.  Use up to 500 words
and an appropriate number of diagrams. Demonstrate that it worked.

Please refer to part A as a source of examples for part B.

Part B: (Assessed Essay) Write an essay entitled 'System On Chip
Design and Modelling' that consists of six sections whose headings are
taken from the list below.  Each section should consist of 300 to 600 words
of text and an appropriate number of diagrams.

Section headings (choose six and place in the best order):

1. Relative roles of ISS and cycle-callable models of processor cores
and cache systems. 

    Discuss various abstraction levels for modelling of processor cores, including
    cross compilation of firmware for the modelling workstation through
    to cycle-accurate models of every target instruction.  Mention styles
    for modelling caches (e.g. is the hit ratio estimated or measured ?).
    (You might compare performance or modelling style of the OR1200.cpp (verilated core)
    with the orsim_sc.cpp ISS.)

2. Use of assertions and temporal logic in SoC modelling.

    Using  several example assertions that are potentially relevant at several 
    levels of abstraction, explain how they are re-applied at each
    level or mention whether this is not always the case.


3. Ease of design re-partition and architectural exploration.

   Show how a target application can be 'run' (i.e. explored) on SoC models
   that vary in their level of abstraction and show how having a
   lower-level model makes architectural changes, such as changing the number
   of processors, memories or bus/network-on-chip structures more difficult.


4. Statistics collection and modelling contention and queueing.

   Show how performance can be estimated or measured using SoC
   models at various levels of abstraction according to how many
   of the contended resources, such as memory ports or bus bridges
   are modelled and the style of modelling used for them (e.g. actual
   queues versus estimated queuing delays).

5. Using direct calls between device drivers compared with abstract
and concrete bus/NoC models.

   Explain how firmware and high-level models of devices should
   be modified to run an application without any hardware model
   (i.e. with direct calls between h/w and s/w components) compared 
   with its final form. Include an interrupt service routine.

6. Clock frequency and power consumption modelling.

   Discuss how well a high-level SoC model can be used to
   estimate system clock frequency (i.e. critical path) and
   power consumption (including dynamic frequency and voltage 
   scaling) compared with pre-synthesis, post-synthesis and 
   post-layout RTL models.

7. The role of high-level synthesis and synthesis from formal
specifications in SoC design flow.

   Show how part of your example design could have looked
   if synthesised from a higher-level form. (You may have included
   this anyway.)

8. Evaluation and automatic generation of glue logic and/or SoC bus/NoC

   Explain how the components of your example design are
   connected to each other at the various levels of abstraction
   and perhaps discuss the potential for automatic generation of 
   address maps and automatic synthesis of the joining code or logic.
   (Perhaps refer to http://www.cl.cam.ac.uk/~djg11/pubs/joining-fdl10 or
    google for 'network on chip traffic generator')


9. A similar topic of your own choosing.


By 'various levels of abstraction' we refer to ESL models spanning:

  1. Application software and device drivers with no hardware model at all,
  2. High-level TLM modelling, loosely timed, with no models of bus or 
network structures,
  3. Lower-level TLM modelling that accurately models contention points,
  4. Cycle-accurate modelling.


Credit Matrix

Up to 10 credits will be awarded for Part A. These marks will be
awarded according to the range of techniques demonstrated from the ESL
methodology.  Credit will not be awarded for implementing an overly
large or complex system in itself.

For each of the six sections in the assessed essay, up to 6 credit
points will be awarded.

For overall presentation, up to 4 credit points will be awarded.

Total available credit: 50 points.

END


-----------------------------------------------------------------------------
Questions Arising

Q. In question one you use the term cycle-callable. Is this the same
as cycle-accurate?

A. cycle-callable is a cycle accurate model of a subsystem implemented in
a non-blocking style where one clock cycle is executed for each
call.


Q. Looking at the Part II course notes, does cross compiling the firmware
for the modelling workstation represent the "Functional Modelling"
level of abstraction?

A. In broad terms yes. My notes define this term to mean the output of
the simulation is correct.  It implies that the same algorithm is used
to arrive at that output as well. Cross compiling firmware should lead
to correct output but also models further aspects of the
implementation beyond those needed to just get the output correct. For
instance, the UML diagram of the class instances used by the
cross-compiled code would be the same as the actual implementation
whereas we can consider different implementations that still use the
same algorithm (e.g. minor variations in record field structure, a
different calling pattern between methods or executing on a different
number of CPU cores).

Q. Would I be correct in saying that if you cross compile the firmware
for the workstation then there is actually no model of the processor
core at all?

A. Yes, but one can still profile the code using gprof, valgrind,
oprofile or whatever to find out how many instructions it used and
method calls it made to get some idea of what the target processor
would have consumed.  Using oprofile you can get cache hits and misses
and other details.  This might be useful at the very early stages of
system development (e.g. for a new data coding scheme like low-density
parity checks or candidates for 4G mobile telephony or a replacement
for DES) to understand what class of core or number of cores are going
to be needed and to estimate the basic cost of a product based on this
technology.

Q. In question 5 you've asked for a description of an interrupt service
routine. Have you any instructions on how to write an ISR for the
or1k?

A. There is an example of an interrupt service routine in my notes and
this is the same as the SystemC UART with interrupts
(/home/djg11/example-uart/example-uart-with-interrupts/) I have not
made this work on the OR1K personally (I did not get as far as finding
the OR1K instruction to enable interrupts, which needs adding to the
crt.S).

The linux kernel compiled for OR1K uses them of course, so perhaps
look in a kernel source folder such as  linux-2.6.24/arch/or32/kernel
for real OR1K interrupt handlers.


Q. "Show how part of your example design could have looked
   if synthesised from a higher-level form. (You may have included
   this anyway."  By this do you mean, say, how would the ethernet TLM model
ook if it had actually been implemented in, say, Csharp with Kiwi attributes?

A. This question potentially covers a great deal of ground.  For
instance, you might generate a protocol or packet checker by compiling
a formal spec to include in the system RTL.

You could also cite work regarding synthesis of memory maps, bus
structures and other glue logic needed to connect parts together.

Perhaps the most obvious thing to do is to talk about compiling a
behavioural model of the subsystem into synthesisable RTL for the
target implementation, commenting on what is likely to work (or even
trying out one or two experiments on Kiwic or one of the online
C-to-gate servers).

Considering TLM, which you mention, manually-coded TLM models of
devices are very much like high-level behavioural models written
specifically for synthesis by C-to-gates flows.  So if comparing these
two forms you would mostly comment on what parts of the TLM model can
and cannot be expected to be synthesisable to RTL implementations.
There are a number of research papers on this but no accepted
standards.

Generating a TLM model from Kiwic is not something I have considered:
although Kiwic can generate SystemC output, this is RTL-style
code, not TLM code.  I guess when you say the 'ethernet TLM' you
really me the ethernet synthesisable RTL implementation that I spoke
of above under 'most obvious thing to do' ?


END.