HOME       UP       PREV       NEXT (Discovering Parallelism: Classical HLS Paradigms)  

Functional Unit (FU) Chaining

Naively instantiating standard FUs can be wasteful of performance, precision and silicon area.

Generally, if the output of one FU is to be fed directly to another then some optimisation can be made and many sensible optimisations involve changes of state encoding or algorithm that are beyond the back-end logic synthesiser.

A common example is an associatve reduction operator such as floating-point addition in a scalar product. In that example, we do not wish to denormalise and round-and-renormalise the operand and result at each addition. This

For example, in `When FPGAs are better at floating-point than microprocessors' (Dinechin et al 2007), it is shown that a fixed-point adder of width greater than the normal mantissa precision can reduce/eliminate underflow errors and operate with less energy and fewer clock cycles.

Fixed-Point Accumulator Proposal.
Fixed-Point Accumulator Proposal.

Their approach is to denormalise the mantissa on input from each iteration and renormalise once at the end when the result is needed.

Even a `running-average' example is generally used in a decimated form (i.e. only every 10th or so result is looked at).


22: (C) 2012-18, DJ Greaves, University of Cambridge, Computer Laboratory.