HOME       UP       PREV       NEXT (Folding, Retiming & Recoding)  

Overcoming Structural Hazards using Holding Registers

A holding register is commonly inserted to overcome a structural hazard (by hand or by a high-level synthesis tool HLS).

Sometimes, the value that is needed is always available elsewhere in the design (and needs forwarding) or sometimes an extra sequencer step is needed.
If we know nothing about e0 and e1:
   always @(posedge clk) begin
      ...
      ans = Foo[e0] + Foo[e1];
      ...
      end
then load holding register in additional cycle:
   always @(posedge clk) begin
      pc = !pc;
      ...
      if (!pc) holding <= Foo[e0];
      if (pc)  ans <= holding + Foo[e1];
      ...
      end
If we can analyse the pattern of e0 and e1:
   always @(posedge clk) begin
      ...
      ee = ee + 1;
      ...
      ans = Foo[ee] + Foo[ee-1];
      ...
      end
then, apart from first cycle, use holding register to forward value from previous iteration:
   always @(posedge clk) begin
      ...
      ee <= ee + 1;
      holding <= Foo[ee];
      ans <= holding + Foo[ee];
      ...
      end

We can implement the program counter and holding registers as source-to-source transformations, that eliminate hazards, as just illustrated. One algorithm is to first to emit behavioural RTL and then to alternate the conversion to pure form and hazard avoidance rewriting processes until closure.

For example, the first example can be converted to behavioural RTL that has an implicit program counter (state machine) as follows:

   always @(posedge clk) begin
      holding <= Foo[e0];
      @(posedge clk) ;
      ans <= holding + Foo[e1];
      end

The transformations illustrated above are NOT performed by mainstream RTL compilers today: instead they are incorporated in HLS tools such as Kiwi. »KiwiC Structural Hazard Example

Sharing structural resources may require additional multiplexers and wiring: so not always worth it.

A good design not only balances structural resource use between clock cycles, but also critical path timing delays.

These example fragments handled one hazard and used two clock cycles. They were localised transformations. When there are a large number of clock cycles, memories and ALUs involved, a global search and optimise procedure is needed to find a good balance of load on structural components.


40: (C) 2012-14, DJ Greaves, University of Cambridge, Computer Laboratory. Flash Player Upgrade Needed   PLAY/PAUSE  READY    STOP DOWNLOAD