Q. If you are going to do something lots of times, is it always more efficient to invest heavily in planning to get a rapid execution ?
A1. Following that approach lead Intel to Itanium VLIW processor. But has that has sunk?
A2. Classical HLS has saved a lot of energy based on static schedules. (Have you tried compressing MPEG on your laptop and then wondered how your mobile manages it so easily ?)
Even without data-dependent control flow, variable-latency operations are incompatible with a completely static schedulle. Keeping a large system in global synchronisation is bound to miss opportunities locally available, but overly-fine-grain dynamic schedulling has a lot of management overhead.
The Kiwi CSharp flow uses classical HLS on each thread in turn to generate a static schedule for that thread, but these interact dynamically. E.g. using FIFO queues between components.
Combined with a server farm we get localised static schedules and global dynamics.
A hardware server can be shared (contended for) by multiple clients. For example, Bluespec's rich library contains a Completion Buffer and other flexible structures for easy creation of pools of servers for dynamic load sharing.
Alternatively: current research, (Ali Zaidi) discards input control flow and compiles to locally to dataflow hardware: The VSFG-S Approach. KiwiC should have a plugin for this soon...
27: (C) 2008-17, DJ Greaves, University of Cambridge, Computer Laboratory. |