The time/space fold and unfold operations trade execution time for silcon area. A given function can be computed with fewer clocks by 'unfolding' in the the time domain, typically by loop unwinding (and predication).
LOOPED (time) option: | UNWOUND (space) option: | for (i=0; i < 3 && i < limit; i++) | if (0 < limit) sum += data[0] * coef[j]; sum += data[i] * coef[i+j]; | if (1 < limit) sum += data[1] * coef[1+j]; | if (2 < limit) sum += data[2] * coef[2+j];
Sharing structural resources may require additional multiplexers and wiring: so not always worth it.
A good design not only balances structural resource use between clock cycles, but also timing delays.
We can retime a design with and without changing its state encoding. Adding a pipeline stage can increase the amount of state without recoding existing state.