Whether computing on standard CPUs or FPGA, memory bandwidth is often the main performance bottleneck.
Given that data transfer rate per-bit of read or write port is fixed, two solutions to memory bandwidth are to use multiple banks or wide memories.
Multiple banks can be accessed simultaneously whereas memories with a wider word are accessed at just one location at a time (per port) but yield more data for each access.
With multiple static RAM banks, data can be arranged randomly or systematically between them. To achieve `random' data placement, some set of the address bus bits are normally used to select between the different banks. Indeed, when multiple chips are used to provide a single bank, this arrangement is already deployed. The question is which bits to use.
Using low bits causes a fine-grained interleave, but may either destroy or leverage spatial locatlity in access patterns according to many details.
Ideally, concurrent accesses hit different banks, therefore providing parallelism. Where data access patterns are known in advance, which is typically the case for HLS, then this can be maximised or even ensured by careful bank mapping. Interconnection complexity is also reduced when it is manifest that certain data paths of a full cross-bar with never be used. In extreemis, we need no interconnect switch at all.
To be completed ...
|23: (C) 2012-17, DJ Greaves, University of Cambridge, Computer Laboratory.|