The transform in the » additional material shows a behavioural transform from sequential to parallel composition.
To overcome structural hazards, folding in the reverse direction is needed, trading space for time, introducing additional registers typically called (according to useage):
In these examples, sequential composition (in successive clock cycles) is denoted with the double semicolon.
There is a distributive law of lockstep composition:
(c1 || c2) ;; (c3 || c4) === (c1; c3) || (c2 ;; c4)
A lockstep parallel composition is well formed if its arguments contain the same number of sequential steps.
A holding register stores the supporting inputs to a time-folded expression. For example, the space-using form
v1 <= A[e1] + e2 || v2 <= A[e3] + e4 || other_workmay have a structural hazard if array 'A' only has one read port, but can be rewritten using time, assuming v1 occurs in e3, e4 and other_work, using the holding register for v1 called h_v1:
(v1 <= A[e1] + e2 || h_v1 <= v1) ;; (v2 <= A[e3'] + e4' || other_work')where e3' and e4' and other_work' are the rewritten forms of those expressions to refer to the holding register instead of v1 directly.
Similar holding registers are needed for other left-hand side variables assigned in the first clause of the sequence. Where such a left-hand side is an array, a pair (h_s, h_v) of holding registers is needed for the subscript and the old value at that location and a functional array form is needed for the substitution (i.e. reads of the form A[e] are replaced with (e=h_s) ? h_v:A[e]). However, the read out of the old value typically causes a new structural hazard, so it is best to instead leave assignments to arrays to the second clause of the sequence.
The delay padding operation must be applied when one (or more) of two (or more) RTL expressions executing in lockstep is/are time-folded, thereby extending its/their execution time. For instance, if
v1 <= v2+1 || v2 <= A[v1 * 3]is naively timefolded to
v1 <= v2+1 || (t1 <= v1 * 3 ;; v2 <= A[t1])the execution times of the left-hand and right-hand sides no longer match and the result is not well formed. Instead, the left-hand side of the parallel composition must be delay padded with an input holding register, as follows:
(t2 <= v2 ;; v1 <= t2+1) || (t1 <= v1 * 3 ;; v2 <= A[t1])or it can be padded with an output write-back register, as follows:
(v1_wb <= v2+1 ;; v1 <= v1_wb) || t1 <= v1 * 3 ;; v2 <= A[t1]
For an assignment with a lot of supporting input, rather than having a large number of holding registers for each of its support, having a single write-back register for its output is generally better, but optimum load balancing of expensive structural resources can be the deciding factor.
Rather than delaying the input or output to a function, the function can be divided at any intermediate point with the introduction of so-called pipeline registers.
As well as covercomming structural hazards, this can greatly help with timing closure.
v1 <= A[e1 * e2] + A[e3 * e4]should be rewritten to use only one read port on 'A' as
t1 <= A[e1 * e2] ;; v1 <= t1 + A[e3 * e4]This also has the benefit that one multiplier can be re-used for both operations using multiplexors.