Behavioural algorithm:
while (1) { wait (Start); RA=A; RB=B; RC=0; while(RA>0) { if odd(RA) RC=RC+RB; RA = RA >> 1; RB = RB << 1; } Ready = 1; wait(!Start); Ready = 0; } (Either HLS or hand coding can give the illustrateddatapath and sequencer structure:) | ![]() |
It is certainly not fully-pipelined, it goes busy for many cycles, depening on the log of the A input. The illustration shows a common design pattern consisting of a datapath and a sequencer. Booth's algorithm (see additional material) is faster, still using one adder but needing half the clock ticks.
Exercise: Write out the complete design, including sequencer, for the above multiplier, or that of Booth, or a long division unit, in Verilog or SystemC.
You may wish to dynamically swap the inputs over according to which has the lowest most-significant bit set.
For today's VLSI, for 32 bits, a fixed latency multiplier is typically used. This will have 2 or 3 clock cycles delay and be fully pipelined.
Fully-pipelined logic with fixed latency is easy to incorporate into static schedules, either by hand or with automated HLS tools. Recent email from Dr Mullins: At clock freq 500MHz, I see something like 13.5pJ for a 32-bit multiply (2-stage pipeline) in a 40nm low power (LP) silicon process. Its area is 8750um.
43: (C) 2008-17, DJ Greaves, University of Cambridge, Computer Laboratory. | Flash Player Upgrade Needed ![]() ![]() | ![]() |