# SoC D/M Exercises 10/11

These exercises are allocated marks at Tripos examination level, with 20 marks making a full exam question. Example answers are available to supervisors. There is some repetition of material between the exercises, so a suitable target is to solve approximately half of them. Those marked (**N!R**) are difficult to answer and attempting complete answers is not recommended. Those marked with a heart ( $\heartsuit$ ) are recommended.

## 1 LG12: SoC Bus and NoC Structures.

LG12.1.  $\heartsuit$  : SoC Structure.

a) Sketch the block diagram for a SoC with one processor, one SRAM, one ROM, one Counter/Timer block and one PIO section, all connected to a single bus without any bus bridges. [5 Marks]

b) List the (main) net-level interface signals needed for a bus port that enables multiple bus operations to be in flight at once (such as the BVCI port lectured, or an IP block interface of similar functionality) and explain the protocol. [6 Marks]

c) Is it appropriate for DMA to be supported or used in the SoC of part a? [3 Marks]

d) How are interrupt signals routed in the SoC of part a)? [3 Marks]

e) What modifications are needed if a second processor core were to be added ? Is a second bus a good idea ?[3 Marks]

LG12.2. : Multiple Busses With Bridges.

a) In SoC terms, what is a bus and how does it compare with the 1980's concept of a motherboard bus (such as the ISA or PCI bus) ? [2 Marks]

b) How might the destination port for a transaction over such a bus be decided ? [2 Marks]

c) What is a bus bridge, what transactions might it support and what internal operations might it implement ? [4 Marks]

d) If a SoC is designed with a number of bridged busses, what are the main aspects that determine the allocation of initiators and targets to the busses ? [3 Marks]

e) Is there no real difference between a Network On Chip and a set of bus bridges ? [3 Marks]

f) What form of bus protocol is needed for good performance on a SoC that uses a number of bridges busses or clock domains ? [3 Marks]

g) How is contention for destinations handled in a SoC that uses a number of bridges busses compared with a NoC (network on chip) ? [3 Marks]

LG12.3. : Network-On-Chip (NoC). Material to answer the last two parts of this exercise was not lectured/examinable in 10/11.

a) What is meant by the term Network-on-Chip and what are the main differences between using a number of bus bridges and a network fabric? [5 Marks]

b) Describe two buffering techniques that might be used in a NoC ? [2 Marks]

c) Describe a flow control technique used in a NoC ? [2 Marks]

d) **N!R**What can be done to avoid NoC deadlock ? How can it be detected ? What should be done when it is detected ? [6 Marks]

e) N!RWhat is the flattened-butterfly NoC topology and why is it considered ? [5 Marks]

LG12.4. : DRAM and Cache.

a) What are the main features of DRAM and why is it not commonly integrated as part of a SoC? [5 Marks]

b) Why should out-of-order read responses ideally be supported by a SoC Bus or NoC ? [5 Marks]

c) Using a system clock of 400 MHz, a 32 bit MIPS/ARM-like CPU is served without a cache by a 16-bit DRAM system with the following parameters

| Operation     | Clock cycles | Function                                |
|---------------|--------------|-----------------------------------------|
| RAS           | 3            | Sending row address,                    |
| CAS           | 1            | Read or write 16 bits in current row,   |
| RAS precharge | 2            | Write back time when finished with row. |

Making some assumptions about the pattern of access that the processor will make of the memory, estimate its performance in terms of instructions per second. [5 Marks]

d) If all instructions for inner loops are copied to a 32-bit wide on-chip SRAM (that provides true random access at 400 MHz) at code start, what is the performance now. [5 Marks]

e) If a cache structure with 98 percent instruction and 80 percent data hit rate is applied, what processor performance is now achieved ? You may consider in-order and out-of-order processors but full credit awarded for either. [5 Marks]

LG12.5. : JTAG Port and Test Modes. (In 10/11 JTAG was only mentioned in relationship to debugging and GDB remote serial protocol (RSP)).

a) Why do ASICs commonly support special test modes? [4 Marks]

b) Define and compare boundary scan with full scan test path [4 Marks]

c) Briefly describe the structure and operation of the JTAG test port used on many chips. [4 Marks]

d) How can JTAG ports be combined and is this a good idea within a single SoC ? [4 Marks]

e) What other uses can the JTAG port frequently be put to ? [4 Marks]

LG12.6. : Cell Library.

a) Give a short list of logic cells to be found in a standard cell library. [5 Marks]

b) List five types of information that should be stored about each cell. [5 Marks]

c) N!R How can an algorithm that chooses an assembler instruction from an instruction set in the back end of a compiler be used for choosing a cell from a cell library have in the back end of a logic synthesiser ? *Non-examinable.* [5 Marks]

d) Name several illustrative, specialist VLSI structures or components that cannot readily be made out of standard logic cells and explain why custom design is needed. [5 Marks]

## 2 LG13: SoC Tools

#### LG13.1. $\heartsuit$ : Static Timing Analysers

a) Draw a gate-level circuit for a divide-by-eight synchronous counter. Annotate the timing delays relative to the master clock of each net for a technology that has the following properties: [8 Marks]

| Gate   | Parameter         | Value    |
|--------|-------------------|----------|
| AND    | propagation delay | 0.1 ns   |
| OR     | propagation delay | 0.1 ns   |
| INV    | propagation delay | 0.05  ns |
| XOR    | propagation delay | 0.15  ns |
| D-type | clock-to-q time   | 0.2 ns   |
| D-type | set up time       | 0.05  ns |

b) Describe the algorithm for a static timing analyser and show its operation on your circuit, giving the maximum clock frequency. [7 Marks]

c) Draw a circuit where a static timing analyser will give an overly poor answer. [3 Marks]

#### LG13.2. $\heartsuit$ : Memory Macrocell Generator (RAM Compiler).

a) What input parameters might we expect to give to a generator program that creates multi-ported SRAM memories for use in a System on Chip ? [5 Marks]

b) What output files might we expect from the memory generator program ? [5 Marks]

c) Sketch either a TLM-style or RTL-style simulation model in RTL or SystemC code for a SRAM memory with two read ports and one write port. [5 Marks]

d) What differences in terms of timing and contention might we see if a model of a memory subsystem is populated with TLM-style models of the RAMs compared with RTL-style models. [5 Marks]

Bonus: What problems might there be if the simulation model from part c were fed into a logic synthesiser for use on an actual ASIC or FPGA ?

### 3 LG14: Architectual Exploration and Design Partition

#### LG14.1. $\heartsuit$ : Design Partition

a) What are the major costs and risks in SoC development? [5 Marks]

b) What factors commonly influence the choice between using standard parts and an ASIC or SoC? [5 Marks]

c) What factors tend to make a hardware implementation preferable to a software implementation? Give an example of each approach. [5 Marks]

d) When is a standard processor preferable to a custom processor ? [5 Marks]

#### LG14.2. FPGA:

a) What are the principal differences between an FPGA and a masked ASIC for implementation of a SoC ? [5 Marks]

b) How can a SoC design team use FPGAs to prototype their product before SoC fabrication? [5 Marks]

c) When would it be sensible to ship an FPGA instead of a masked ASIC in production runs? [5 Marks]

LG14.3. Cost and Power

a) Summarise the historical trends that affect the relative merits of FPGA and custom silicon in consumer, professional and military, mains-powered applications [5 Marks].

b) How does the argument differ for battery-powered devices ? [5 Marks]

c) What are the main power consuming components in FPGA, embedded processors, custom silicon and programmable core silicon ? [5 Marks]

d) Discuss whether multi-core processor chips can/should take over from FPGA and custom silicon in various applications. Consider Picochip, XMOS and ARC if you are familiar with them. [5 Marks]

### 4 LG15: Power Estimation and Control

#### LG15.1. Dynamic Clock Gating.

a) What is dynamic clock gating and why is it used ? [4 Marks]

b) Compare coarse-grained manual and fine-grained automatic clock gating. [4 Marks]

c) Describe some common clock-gate insertion transformations. [6 Marks]

d) Compare dynamic clock gating with power isolation islands in terms of automation, scale and functionality. [6 Marks]

LG15.2. ♡: VLSI Energy Use. For this question, use the following figures and assume values for or look up values of any other information that you feel you need. Credit is awarded for the method and not for the numerical results.

| Parameter               | Value        | Unit         |
|-------------------------|--------------|--------------|
| Drawn Gate Length       | 0.08         | $\mu m$      |
| Metal Layers            | 6 to 9       | layers       |
| Gate Density            | 400K         | $gates/mm^2$ |
| Track Width             | 0.25         | $\mu m$      |
| Track Spacing           | 0.25         | $\mu m$      |
| Gate Output Capacitance | 0.06         | fF           |
| Gate Input Capacitance  | 0.03         | fF           |
| Tracking Capacitance    | 1            | fF/mm        |
| Core Supply Voltage     | 0.9  to  1.4 | V            |
| FO4 Delay               | 51           | ps           |
| Leakage current         | 21           | nA/gate      |

A processor core in the above technology uses 200k gates, excluding cache memories. It has two operating conditions: 100 MHz at 0.9 volts or 400 MHz at 1.4 volts. The average net activity ratio during halt is negligible and 0.3 when running.

Give all working and intermediate results. State any additional assumptions you need or use.

a) Estimate the area of the processor. [2 Marks]

b) Compute the power consumed per gate at each operating condition when driving a tracks of 0 mm and 1 mm. [2 Marks]

c) Estimate the power consumption of the processor core when halted and running for each operating condition. [3 Marks]

d) Compared with having the processor running at full performance all the time, how much power is saved just by halting the processor when it is idle ? [2 Marks]

e) How much power is saved by dynamic frequency scaling? [2 Marks]

f) How does dynamic frequency scaling compare with halting ? [2 Marks]

g) How much power is saved by combined dynamic voltage and frequency scaling ? [2 Marks]

h) How much power might be saved by power gating (i.e. power isolation)? [2 Marks]

i) Estimate the relative costs of performing a 32 bit addition and sending the 32 bit result 1 mm over the chip [3 Marks]

LG15.3. : Dynamic Voltage and Frequency Scaling.

a) Give a formula for the power dissipation associated with a net on a silicon chip. [3 Marks]

b) What is meant by dynamic clock gating and compare this to a technique where software writes to a control register to turn off a clock generator ? [3 Marks]

c) For a fixed supply voltage, quantify the power benefits of frequency scaling. In other words, compare computing quickly and halting with computing more-slowly and finishing just in time. [3 Marks]

d) Give two ways that the supply voltage to a region may be varied? [3 Marks]

e) Using variable supply voltages, quantify the power benefits of frequency scaling. [3 Marks]

f) N!R Sketch the architecture of an ASIC (or part of) that uses all of these techniques. [5 Marks]

#### LG15.4. N!R : Power Consumption

a) What are the main components of power consumption in a laptop computer? [5 Marks]

b) How does clock frequency affect power consumption ? [5 Marks]

c) How might clock frequency be controller in a laptop and for what reasons? [5 Marks]

d) When viewing a DVD (including moving video and audio) on a laptop, what is the best clock frequency policy? [5 Marks]

#### Additional Material: Engineering and Physical Considerations.

#### EP1. N!R: Delay and Power

a) It is necessary to send a one-bit value a distance of 11 mm over the surface of a silicon chip where the clock available is 300 MHz. Determine how many D-types should be re-used in the path of the signal based on the maximum spacing in millimetres they should have? State any assumptions made. [5 Marks]

b Consider sending a 32-bit value the same distance over the same chip. Compare serial and parallel transmission of the data in terms of latency, throughput and power consumption. [15 Marks]

#### EP2. N!R: Logical Effort

NB: Detailed material to answer this question is unlikely to be lectured this year.

a) When sending a signal a long distance over a chip, compare using powerful drivers with a repeater arrangement that uses a larger number of less-powerful drivers. [5 Marks]

- b) When building a multi-stage logic circuit, what arrangement gives least area? [5 Marks]
- c) When building a multi-stage logic circuit, what arrangement gives least power? [5 Marks]
- d) When building a multi-stage logic circuit, what arrangement gives lowest delay? [5 Marks]

EP3. N!R: Information Flux. Detailed material to answer this question may not have been lectured.

a) How many signal nets per square micron can be routed in a vertical plane in modern VLSI? [5 Marks]

- b) How does the power required to drive a signal net vary with its planar density and length ? [5 Marks]
- c) What is the maximum information flux feasible in a modern silicon chip? [5 Marks]
- d) How might we use replicated computation to ameliorate this situation ? [5 Marks]
- EP4. N!R: Technology/Scaling.
  - a) What is meant by the term *feature size* in VLSI? Give typical values. [5 Marks]
  - b) What are the main consequences of moving to a smaller feature size in VLSI fabrication ? [5 Marks]
  - c) What happens to the relative costs of computation and communication as features get smaller ? [5 Marks]
  - d) Why has parallel computation become more important than ever before ? [5 Marks]

(C) 2008-11 DJ GREAVES. END OF DOCUMENT.