HOME       UP       PREV       NEXT (NOTITLE)  

Cache Design

Implementing 4-way, set-associative cache is relatively straightforward.

Do not need an associative RAM macrocell: just synthesise four sets of XOR gates from RTL using the `==' operator!

  reg [31:0] data0 [0:32767], data1 [0:32767], data2 [0:32767], data3 [0:32767];   
  reg [14:0] tag0 [0:32767], tag1 [0:32767], tag2 [0:32767], tag3 [0:32767];   

  always @(posedge clk) begin
        miss = 0;
        if (tag0[addr[16:2]]==addr[31:17]) dout <= data0[addr[16:2]];
        else if (tag1[addr[16:2]]==addr[31:17]) dout <= data1[addr[16:2]];
        else if (tag2[addr[16:2]]==addr[31:17]) dout <= data2[addr[16:2]];
        else if (tag3[addr[16:2]]==addr[31:17]) dout <= data3[addr[16:2]];
        else miss = 1;
        end

Of course we also need a write and evict mechanism... (not shown).

Rather than implement least-recently-used (LRU) one tends to do `random' replacement which can be as simple as using keeping a two bit counter to say which `way' to evict next.

Typically an IP company like ARM will provide a high-quality, carefully-tuned implementation, ready to go.

Comp-arch exercise: add a `way prediction cache' that avoids the double lookup latency. A way cache records which set was last accessed and optimistically forwards the result from that, giving access times closer to that of a directly-mapped cache, without the aliasing overheads.


63: (C) 2008-16, DJ Greaves, University of Cambridge, Computer Laboratory.