LG8 Notes: Engineering

90 Nanometer Gate Length.
Power Consumption
Dynamic Power Gating
Dynamic Frequency Scaling
Dynamic Voltage Scaling
Information Flux

90 Nanometer Gate Length.

The slide shows typical parameters from a 90 nanometer standard cell library. This figure refers to the width of the gate in the field effect transistors. The smaller this width, the faster than transistor can operate, but also it will consume more power as static leakage current. The 90 nm figure has been the mainstream VLSI technology in the period 2004-2008, but now the industry is shifting to 45 or 35 nanometer technogloy.

Typical processor core: 200k gates + 4 RAMs: one square millimeter.

A typical SoC chip area is 50-100 mm^2 with 20-40 million gates. Actual gate and transistor count would be higher owing to custom blocks (RAMs mainly), that achieve a better denisty than standard cells.

has been followed for the last two decades, but have we now reached the Silicon End Point? That is, can we no longer make things smaller (at the same cost)? Modern workstation processors have certainly demonstrated a departure from the previous trend of ever rising clock frequencies: instead they have several cores.

Power Consumption

Power in Watts is voltage times current or engergy times frequency.

The current consumed by a chip is the sum of its static current and dynamic current. Static current is generated by the leakage through off transistors. In the past, for CMOS, static current was of no consequence, but with today's small transistors it can account for one third of a SoC's power consumption.

Dynamic current use is proportional to chip activity. We can get an accurate model of dynamic power by considering :

All energy in a net/gates is wasted each time it toggles.
The energy in a capacitor is E = CV^2/2.
Dominant capacitance is proportional to net length.
Gate input and output capacitance also contribute to C.

Some additional dynamic current is consumed as 'short-circuit current' which is current consume when both the P and N transistors are on at once, during switching, but we ignore that here.

Activity ratio, a: is the percentage of clock cycles that see a transition. The net toggle rate = Operating frequency of the chip f x a;

Useful article: POWER MANAGEMENT IN CPU DESIGN.

The slide shows example power consumption for a circuit when clocked at different frequencies and voltages. The important thing to note is that the supply voltage must be sufficient for the clock frequency in use: to low a voltage means that signals do not arrive at D-type inputs in time to meet set up times.

Compare 1.35V to 1.8V: twice the power and twice the clock frequency.

In the past, chips were often core-bound or pad-bound. Pad-bound meant that the chip had too many I/O signals for its core logic area: the number of I/O's puts a lower bound on the perimiter of the chip.

Today's VLSI technology allows I/O pads in the middle of the chip and designs are commonly power-bound.

1 W/cm^2 can be dissipated from a plastic package.
2-4 W/cm^2 required a heat sink.
More than 8 W/cm^2 required forced cooling.

Workstation microprocessors dissipate tens of Watts: hence cooling fans.

Dynamic Power Gating

Previously we looked at dynamic clock gating, but we can also turn off power supply to regions of a chip, allbeit with coarser grain. We use power gating cells in series with supply rails.

Use signal isolation and retention cells (t-latches) on nets that cross in and out of the region. There is no register and RAM data retention in a block while the power is off. This technique is most suitable for complete sub-systems of a chip, that are not in use on a particular product or for quite a long time, such as a bluetooth tranceiver or audio input ADC.

Generally, power of/on controller by software or top-level input pads to the SoC. It requires some sequencing to activate the enables to the retention cells in the correct order and hence several clock cycles or more are needed to power up/down a region.

A common pracite is to power off a whole chip except for a one or two RAMs and register files. This was particularly common before FLASH memory was invented, when a small battery is/was used retain contents using a lower supply (CMOS RAM data holding voltage). Today, most mobile phones and PC mother cards have a second, tiny battery that maintains a small amout of logic when the main power is off or battery removed. This can run the real-time clock (RTC) as well.

Dynamic Frequency Scaling

Consider adjusting the clock frquency (while keeping VCC constant for now). What does this acheive? For a fixed task, it will take longer to complete. If the processor is to halt at the end of the task, it will spend less time halted. If the main clock treee keeps going while halted, yet most of the chip uses local clock gating, then we do save some power in that fewer useless clock cyles are executed by the main clock tree.

This sort of frequency scaling can be software controlled: update PLL division ratio. The PLL has inertia: e.g. 1 millisecond, but this is similar to the rate at which an operating system services interrupts, and hence the clock frequency to a system can be ramped up as load arrives. This is how most laptops now work.

Let's compare with dynamic clock gating: the table in the slide shows the main differences, but the most important difference is that we can reduce the supply voltage if we have reduced the clock frequency.

Dynamic Voltage Scaling

Looking at the derating graph for the standard cell libraries, we see that in the operating region, the frequency/voltage curve is roughly linear.

Logic with higher-speed capabilities is smaller which means it consumes greater leakage current which is wasted even while we are halted. CMOS delay is inversely proportional to supply voltage.

If we vary the voltage to a region dynamically, a higher supply voltage uses more power (square law) but allows a higher f. Overall: power will then have cubic dependence on f.

So the main principle of operation is: Method:

Adjust f for just-in-time completion (e.g. in time to decode the next frame of a real-time video),
Only raise VCC when we ramp up f.

Hence, we still obtain peak performance under heavy loads: avoid cubic overhead when idle. But we adjust VCC so that, at all times, the logic just works. However, we need to keep close track of whether we are meeting real-time deadlines.

So a typical SoC uses not only dynamic clock gating, but also manual and automatic frequency and voltage variation. Power isolation is used on a longer-scale.

Combinational logic cannot be clock gated (e.g. PAL and PLA) since it has no clock. For large combinational blocks one approach is to dip the power supply to reduce static current when block is completely idle (detect with XORs): need to retain register state.

Information Flux

Not lectured in 2008/9.