ACS SOC D/M P35: Ex 4 A/B 2015/16: Mini-Project and Structured Research Essay. The deadline for all P35 work is the first day of Easter Term. Notes: Please ensure you have completed earlier exercises and feel to free to reuse text or results from earlier exercises for the Mini-Project (4a). Collaborating is not allowed for the Research Essay and is only allowed for any parts of the mini-project that are borrowed from the term-time work or with express permission that will only be granted if the nature of the collaboration will enable individual contributions to be clearly discriminated. Your audience is the External Examiner, Second Assessor and readers of Design and Reuse or Electronics Times. It is therefore worthwhile explaining material that would perhaps be well known to others directly involved in this module. Please feel free to contact DJG as much as you like for assistance and advice with Exercise 4 A/B over the Easter Vac. --------------------------------------------------------------------- Exercise 4a: See companion sheet. --------------------------------------------------------------------- Exercise 4b: Structured Research Essay Task Ex4 Part B accounts for 40 credit points. Marking will be based on 5 marks for each of your SIX sections and a further 10 marks for presentation, overall argument and the extent to which you have referred to the mini-project as a running example. Use the materials on the reading list, recent articles in Design+Reuses and EE Times, the undergraduate lecture notes for System on Chip, the more-advanced slides lectured for P35 and your own research. Write an essay following the structure that is precisely defined here. Please use Ex 4a, the mini-project, as a source of example material to illuminate your argument. Your essay should be titled 'System On Chip Design and Modelling' that consists of SIX sections whose headings are taken from the list below. Each section should consist of 300 to 600 words of text and an appropriate number of diagrams. Reiterate: Please refer to Ex 4a as a source of examples for part Ex 4b where possible. Section headings (choose SIX and place in the best order to construct an overall argument): 1. Architectural Exploration (AE) (and ease of design re-partition). Explain the need for AE. Describe techniques where a target application can be 'run' (i.e. explored) on SoC models that vary in their level of abstraction and show how having a lower-level model makes architectural changes, such as changing the number of processors, memories or bus/network-on-chip structures more difficult. Mention the relative roles of ISS and cycle-callable models of processor cores and cache systems. Discuss various abstraction levels for modelling of processor cores, including cross compilation of firmware for the modelling workstation through to cycle-accurate models of every target instruction. Mention styles for modelling caches (e.g. is the hit ratio estimated or measured ?). 2. Use of assertions and temporal logic in SoC modelling. Using several example assertions that are potentially relevant at several levels of abstraction, show how they are re-applied at each level or mention whether this is not always the case. 3. Network On Chip (NoC) Explain how and why the old idea of a bus has almost always now been replaced with something more like a NoC. Why are tri-states not used? What must be done to contact 'the other side of the chip' ? What techniques are used for flow control? What set of transactions is typically needed? How important is read and write ordering ? 4. Statistics collection and modelling contention and queuing. Show how performance can be estimated or measured using SoC models at various levels of abstraction according to how many of the contended resources, such as memory ports or bus bridges are modelled and the style of modelling used for them (e.g. actual queues versus estimated queuing delays). Perhaps review Sniper, Gem5, Prazor/VHLS, ZSim, Qemu and other high-level virtual platforms in terms of the modelling approximations they use to efficiently exploit a multi-core workstation to model a multi-core SoC. 5. Virtualization of embedded software. Consider how firmware and high-level models of devices should be modified to run an application without any hardware model (i.e. with direct calls between h/w and s/w components) compared with its final form. Include an interrupt service routine. For example, the ESL slide pack illustrates using direct calls between device drivers compared with abstract and concrete bus/NoC models. 6. Clock frequency and power consumption modelling. Discuss how well a high-level SoC model can be used to estimate system clock frequency (i.e. critical path) and power consumption (including dynamic frequency and voltage scaling) compared with pre-synthesis, post-synthesis and post-layout RTL models. 7. The role of high-level synthesis and synthesis from formal specifications in SoC design flow. Explore how part of your example design could have looked if synthesised from a higher-level form. (You may have included this anyway.) 8. Evaluation and automatic generation of glue logic and/or SoC bus/NoC Explain how the components of your example design are connected to each other at the various levels of abstraction and perhaps discuss the potential for automatic generation of address maps (via IP-XACT) and automatic synthesis of the joining code or logic. (Perhaps refer to http://www.cl.cam.ac.uk/~djg11/pubs/joining-fdl10 or Google for 'network on chip traffic generator') 9. Relative role of FPGA, CGRA and ASIC in contemporary products. Tape out mask costs are ever increasing. FPGA has now eaten all of the low volume ASIC market and the Structured ASIC has died. Which sectors still do SoC design ? Is the CGRA going to be useful? What style of re-configurable computing is best for embedded devices and what is best for HPC (high-performance computing) server blades? 10. A similar topic of your own choosing. In the above, by 'various levels of abstraction' we refer to ESL models spanning: 1. Application software and device drivers with no hardware model at all, 2. High-level TLM modelling, loosely timed, with no models of bus or network structures, 3. Lower-level TLM modelling that accurately models contention points, 4. Cycle-accurate modelling. END --------------------------------------------------------------------------- Further Notes arising this year (March/April 2016): An article or essay should argue some overall point. It should include at least a pair of paragraphs at the start and end that serve as an introduction and conclusion. Feel free to put these outside the required six-section structure if that is best. Also, having to strictly use the title 'System On Chip Design and Modelling', which is completely generic, is contrary to writing an overall essay or article that argues a particular point overall. So please feel free to use a different title that reflects your argument or point of view. If you state that a given approach is more suitable for FPGA than ASIC please explain why, since many aspects of the design flow are normally similar. When you write a phrase like 'higher-level model' the hyphen should be present. Gramatically this is a compound adjective. Q. I was getting started on Ex4a, however, when comparing the requirements there seem to be some oddities. The webpage states that its required to "further write up your mini project" (which implies that you state the facts about your contribution), but the references text file mentions that you have to write "an argument" (which is necessarily an opinionated piece). A. Sorry if it was not clear enough. There is scope for flexiblitly, but the envisaged structure is that Ex 3 should demonstrate a basic level of technical proficiency with a specific example including listings and details of practical work. Ex 4a can then discuss a state-of-the art or evolving technique and generate text worthy of publication of interest to industrial SoC designers in EE Times, Design+Reuse etc.. It will (generally) use Ex3 as its example design and will contain an alternative presentation of the same practical work. Then Ex4b can then cover other topics that were not the main focus of 4a but, but again, using the earlier work as a basis example. It is easier to write interesting text for all of Ex4 if it use an article or essay style that argues a specific viewpoint. --------------------------------------------------------------------------- Questions Previously Arising (clarification points from email exchanges from previous years.) Q. In question one you use the term cycle-callable. Is this the same as cycle-accurate? A. cycle-callable is a cycle accurate model of a subsystem implemented in a non-blocking style where one clock cycle is executed for each call (method invocation). Q. Looking at the Part II course notes, does cross compiling the firmware for the modelling workstation represent the "Functional Modelling" level of abstraction? A. In broad terms yes. My notes define this term to mean the output of the simulation is correct. It implies that the same algorithm is used to arrive at that output as well. Cross compiling firmware should lead to correct output but also models further aspects of the implementation beyond those needed to just get the output correct. For instance, the UML diagram of the class instances used by the cross-compiled code would be the same as the actual implementation whereas there could also be different implementations that still use the same algorithm (e.g. minor variations in record field structure, a different calling pattern between methods or executing on a different number of CPU cores). Q. Would I be correct in saying that if you cross compile the firmware for the workstation then there is actually no model of the processor core at all? A. Yes, but one can still profile the code using gprof, valgrind, oprofile or whatever to find out how many instructions it used and method calls it made to get some idea of what the target processor would have consumed. Using oprofile you can get cache hits and misses and other details. This might be useful at the very early stages of system development (e.g. for a new data coding scheme like low-density parity checks or candidates for 4G mobile telephony or a replacement for DES) to understand what class of core or number of cores are going to be needed and to estimate the basic cost of a product based on this technology. Q. In question 5 you've asked for a description of an interrupt service routine. Have you any instructions on how to write an ISR for the or1k? A. There is an example of an interrupt service routine in my notes and this is the same as the SystemC UART with interrupts (/home/djg11/example-uart/example-uart-with-interrupts/) I have not made this work on the OR1K personally (I did not get as far as finding the OR1K instruction to enable interrupts, which needs adding to the crt.S). The linux kernel compiled for OR1K uses them of course, so perhaps look in a kernel source folder such as linux-2.6.24/arch/or32/kernel for real OR1K interrupt handlers. Q. "Show how part of your example design could have looked if synthesised from a higher-level form. (You may have included this anyway." By this do you mean, say, how would the ethernet TLM model ook if it had actually been implemented in, say, Csharp with Kiwi attributes? A. This question potentially covers a great deal of ground. For instance, you might generate a protocol or packet checker by compiling a formal spec to include in the system RTL. You could also cite work regarding synthesis of memory maps, bus structures and other glue logic needed to connect parts together. Perhaps the most obvious thing to do is to talk about compiling a behavioural model of the subsystem into synthesisable RTL for the target implementation, commenting on what is likely to work (or even trying out one or two experiments on Kiwic or one of the online C-to-gate servers). Considering TLM, which you mention, manually-coded TLM models of devices are very much like high-level behavioural models written specifically for synthesis by C-to-gates flows. So if comparing these two forms you would mostly comment on what parts of the TLM model can and cannot be expected to be synthesisable to RTL implementations. There are a number of research papers on this but no accepted standards. Generating a TLM model from Kiwi is not something I have considered: although KiwiC can generate SystemC output, this is RTL-style code, not TLM code. I guess when you say the 'ethernet TLM' you really me the ethernet synthesisable RTL implementation that I spoke of above under 'most obvious thing to do' ? Q/A: Some combinations of numbers of cores and quantum keeper settings can cause the or1k simulator to spend a lot of time doing failed load-links and the fix to this (if you are using condition variables or barriers) may be be to edit djgthread.c cond_wait at line 212 by inserting a call to yield for as follows: while (_djg_Threads[tid].waiting == cond) { _core_pause(tid); ... } I have made this edit on the copy installed in clteach/btlm-baseline. END.