The register file is a critical component in a modern superscalar processor. It must be large enough to accommodate the results of all in-flight instructions. It must also have enough ports to allow simultaneous issue and writeback of many values each cycle. However, this makes it one of the most energy-consuming structures within the processor with a high access latency. As technology scales, there comes a point where register accesses are the bottleneck to performance and so must be pipelined over several cycles. This increases the pipeline depth, lowering performance.
To overcome these challenges, we propose a novel use of compiler analysis to aid register caching. Adding a register cache allows us to preserve single-cycle register accesses, maintaining performance and reducing energy consumption. We do this by passing information to the processor using free bits in a real ISA, allowing us to cache only the most important registers. Evaluating the register cache over a variety of sizes and associativities and varying the read ports into the cache, our best scheme achieves an energy-delay-squared (EDD) product of 0.81, with a performance increase of 11%. Another configuration saves 13% of register system energy. Using four register cache read ports brings both performance gains and energy savings, consistently outperforming two state-of-the-art hardware approaches.