HOME UP PREV NEXT (H/W versus S/W Design Partition Principles)

FPGA

An FPGA (field-programmable gate array) consists of an array of configurable logic blocks (CLBs), as shown in Figure~\ref{fig:fpgaplan}. Not shown is that the device also contains a good deal of hidden logic used just for programming it. Some pins are also dedicated to programming. Such FPGA devices have been popular since about 1990.

Each CLB (configurable logic block) or slice typically contains two or four flip-flops, and has a few (five shown) general purpose inputs, some special purpose inputs (only a clock is shown) and two outputs. The illustrated CLB is of the look-up table type, where the logic inputs index a small section of pre-configured RAM memory that implements the desired logic function. For five inputs and one output, a 32 by 1 SRAM is needed. Some FPGA families now give the designer write access to this SRAM, thereby greatly increasing the amount of storage available to the designer. However, this is an expensive way to buy memory.

FPGAs also soon started to contain RAM blocks (called block RAM or BRAM) and multiplier blocks called DSP (digital signal processing) blocks. The BRAM and DSP blocks are automatically deployed by the design tools by matching specific patterns in the user's RTL when coded appropriately.

Today's FPGAs also contain many other 'hard-IP' blocks, such as PCIe, Ethernet and USB controllers that need to be manually instantiated as structural components in the RTL.

All CLBs within a FPGA generally have the same structure, but FPGAs are available with lower and higher functionality CLBs. The best size of CLB is not yet clear. Some designs of FPGA have a hierarchy of CLB interconnection patterns, giving CLB clusters within clusters. Most designs support special paths for fast carry adders and multiplier structures.

An FPGA is very like a mask-programmed gate array to use. The design flow and CAD tools are virtually identical. However, the expenditure before the designer has the first device in her hands might be 10,000 times lower. The cost of further devices used to be at least 10 times higher than mask-programmed devices, owing to the programming cost and wasted die area devoted to the programming activities. However, modern mask costs make the ASIC and mask-programmed gate array unattractive.

FPGAs tend to be slow, achieving perhaps one third of the clock frequency of a masked ASIC, owing to larger die area and because the signals pass through hidden logic used only for configuration.

So-called DSP block in Xilinx Virtex 7 ((C) Xilinx Inc).

The Xilinx DSP block mostly contains a multiplier that delivers a 48 bit result and an adder for accumulating results where the output from one block has a high-performance programmable connection to a neighbour.

The multiplier operands are two's complement, 25 and 18 bit operands.

Exercise: How many DSP blocks are needed for a 32x32 multiplier? What is it's latency? What differences does it make if only 32 bits of the result are needed?