The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary accesses to the data banks or to the ways that are not likely to hit. However, tag banks still need to be checked.
This paper considers a new hybrid hardware and linker-assisted approach to tagless instruction caching. Our novel cache architecture, supported by the compilation toolchain, removes the need for tag checks entirely for the majority of cache accesses. The linker places frequently-executed instructions in specific program regions that are then mapped into the cache without the need for tag checks. This requires minor hardware modifications, no ISA changes and works across cache configurations. Our approach keeps the software and hardware independent, resulting in both backward and forward compatibility.
Evaluation on a superscalar processor with and without SMT support shows power savings of 66% within the instruction cache with no loss of performance. This translates to a 49% saving when considering the combined power of the instruction cache and translation lookaside buffer, which is involved in managing our tagless scheme.