One way to overcome a structural hazard is to deploy more resources. These will suffer correspondingly less contention. For instance, we might have 3 multipliers instead of 1. This is the spatial solution. For RAMs and register files we need to add more ports to them or mirror them (i.e. ensure the same data is written to each copy).
In the temporal solution, a holding register is commonly inserted to overcome a structural hazard (by hand or by a high-level synthesis tool HLS).
Sometimes, the value that is needed is always available elsewhere in the design (and needs forwarding) or sometimes an extra sequencer step is needed.
|If we know nothing about e0 and e1:
always @(posedge clk) begin ... ans = Foo[e0] + Foo[e1]; ... end
|then load holding register in additional cycle:
always @(posedge clk) begin pc = !pc; ... if (!pc) holding <= Foo[e0]; if (pc) ans <= holding + Foo[e1]; ... end
|If we can analyse the pattern of e0 and e1:
always @(posedge clk) begin ... ee = ee + 1; ... ans = Foo[ee] + Foo[ee-1]; ... end
|then, apart from first cycle, use holding register to forward value from previous iteration (loop forwarding):
always @(posedge clk) begin ... ee <= ee + 1; holding <= Foo[ee]; ans <= holding + Foo[ee]; ... end
We can implement the program counter and holding registers as source-to-source transformations, that eliminate hazards, as just illustrated. One algorithm is to first to emit behavioural RTL and then to alternate the conversion to pure form and hazard avoidance rewriting processes until closure.
For example, the first example can be converted to old-style behavioural RTL that has an implicit program counter (state machine) as follows:
always @(posedge clk) begin holding <= Foo[e0]; @(posedge clk) ; ans <= holding + Foo[e1]; end
The transformations illustrated above are NOT performed by mainstream RTL compilers today: instead they are incorporated in HLS tools such as Kiwi. »KiwiC Structural Hazard Example
Sharing structural resources may require additional multiplexers and wiring: so not always worth it.
A good design not only balances structural resource use between clock cycles, but also critical path timing delays.
These example fragments handled one hazard and used two clock cycles. They were localised transformations. When there are a large number of clock cycles, memories and ALUs involved, a global search and optimise procedure is needed to find a good balance of load on structural components.
|42: (C) 2012-17, DJ Greaves, University of Cambridge, Computer Laboratory.|