The slide shows typical parameters from a 90 nanometer standard cell library. This figure refers to the width of the gate in the field effect transistors. The smaller this width, the faster than transistor can operate, but also it will consume more power as static leakage current. The 90 nm figure has been the mainstream VLSI technology in the period 2004-2008, but now the industry is shifting to 45 or 35 nanometer technogloy.
Typical processor core: 200k gates + 4 RAMs: one square millimeter.
A typical SoC chip area is 50-100 mm^2 with 20-40 million gates. Actual gate and transistor count would be higher owing to custom blocks (RAMs mainly), that achieve a better denisty than standard cells.
has been followed
for the last two decades, but have we now reached the Silicon End Point? That is,
can we no longer make things smaller (at the same cost)? Modern workstation processors
have certainly demonstrated a departure from the previous trend of ever rising clock
frequencies: instead they have several cores.
Power in Watts is voltage times current or engergy times frequency.
The current consumed by a chip is the sum of its static current
and dynamic current. Static current is generated by the leakage
through off transistors. In the past, for CMOS, static current
was of no consequence, but with today's small transistors it
can account for one third of a SoC's power consumption.
Dynamic current use is proportional to chip activity. We can get an accurate model of
dynamic power by considering :
Some additional dynamic current is consumed as 'short-circuit current' which is current
consume when both the P and N transistors are on at once, during switching, but we ignore
that here.
Activity ratio, a: is the percentage of clock cycles that see a transition.
The net toggle rate = Operating frequency of the chip f x a;
Useful article: POWER MANAGEMENT IN CPU DESIGN.
The slide shows example power consumption for a circuit when
clocked at different frequencies and voltages. The important
thing to note is that the supply voltage must be sufficient
for the clock frequency in use: to low a voltage means
that signals do not arrive at D-type inputs in time to meet
set up times.
Compare 1.35V to 1.8V: twice the power and twice the clock frequency.
In the past, chips were often core-bound or pad-bound. Pad-bound meant
that the chip had too many I/O signals for its core logic area: the number
of I/O's puts a lower bound on the perimiter of the chip.
Today's VLSI technology allows I/O pads in the middle of the chip and
designs are commonly power-bound.
Workstation microprocessors dissipate tens of Watts: hence cooling fans.
Previously we looked at dynamic clock gating, but we can also turn off power supply to regions of a chip, allbeit with
coarser grain. We use power gating cells in series with supply rails.
Use signal isolation and retention cells (t-latches) on nets that cross in and out of the region.
There is no register and RAM data retention in a block while the power is off. This technique is
most suitable for complete sub-systems of a chip, that are not in use on a particular
product or for quite a long time, such as a bluetooth tranceiver or audio input ADC.
Generally, power of/on controller by software or top-level input pads to the SoC.
It requires some sequencing to activate the enables to the retention cells in the correct order
and hence several clock cycles or more are needed to power up/down a region.
A common pracite is to power off a whole chip except for a one or two RAMs and register files.
This was particularly common before FLASH memory was invented, when
a small battery is/was used retain contents using a lower supply (CMOS RAM data holding voltage).
Today, most mobile phones and PC mother cards have a second, tiny battery that maintains
a small amout of logic when the main power is off or battery removed. This can run
the real-time clock (RTC) as well.
Consider adjusting the clock frquency (while keeping VCC constant for now).
What does this acheive? For a fixed task, it will take longer to complete.
If the processor is to halt at the end of the task, it will spend less
time halted. If the main clock treee keeps going while halted, yet most
of the chip uses local clock gating, then we do save some power in that
fewer useless clock cyles are executed by the main clock tree.
This sort of frequency scaling can be software controlled: update PLL division ratio.
The PLL has inertia: e.g. 1 millisecond, but this is similar to the rate at which
an operating system services interrupts, and hence the clock frequency to a system can
be ramped up as load arrives. This is how most laptops now work.
Let's compare with dynamic clock gating: the table in the slide shows the main differences, but
the most important difference is that we can reduce the supply voltage if we have reduced the
clock frequency.
Looking at the derating graph for the standard cell libraries, we
see that in the operating region, the frequency/voltage curve is roughly linear.
Logic with higher-speed capabilities is smaller which means it
consumes greater leakage current which is wasted even while we are
halted. CMOS delay is inversely proportional to supply voltage.
If we vary the voltage to a region dynamically, a higher
supply voltage uses more power (square law) but allows a higher f.
Overall: power will then have cubic dependence on f.
So the main principle of operation is:
Method:
So a typical SoC uses not only dynamic clock gating, but also manual and automatic
frequency and voltage variation. Power isolation is used on a longer-scale.
Combinational logic cannot be clock gated (e.g. PAL and PLA) since it has no clock.
For large combinational blocks one approach is to dip the power supply to reduce static current
when block is completely idle (detect with XORs): need to retain register state.
Not lectured in 2008/9.
Power Consumption
Dynamic Power Gating
Dynamic Frequency Scaling
Dynamic Voltage Scaling
Hence, we still obtain peak performance under heavy loads: avoid cubic overhead when idle.
But we adjust VCC so that, at all times, the logic just works. However, we need to keep close
track of whether we are meeting real-time deadlines.
Information Flux
END (C) 2009 DJ GREAVES.