Project: CommCentric - Communications Centric Microelectronic Design

Start date: January 2006 End data: March 2009

Research as originally proposed

We are undertaking research into communication centric microelectronic design methods which are in contrast to today's computation centric (or gate-level) techniques. We believe that this research is timely since electronics is at the cusp of change. For the last 60 years, digital gates have been costly to produce and have limited performance. We are now entering an era where the wires, which were once almost free, becoming the cost and performance limiter. This trend is well documented in the International Technology Roadmap for Semiconductors (ITRS) roadmap which clearly identifies the step change needed in circuits and associated design techniques. They also identify spiralling design complexity, escalating power densities and associated thermal problems.

Challenges

Wire delay - Wires, rather than gates, are increasingly the speed limiter in IC designs targeting deep submicron CMOS. Over the years, chip area has remained almost constant for many devices (microprocessors, FPGAs, etc.), and yet wire delay per unit length continues to increase. Communication across chip requires repeater buffers at the very least. Replacing buffers with flip-flops allows the wires to be pipelined, thereby improving utilisation. This observation leads one to think about other architectural possibilities.
Clocking - Clocks are typically used to sequence operations on chips. Global clock distribution with low skew to all nodes in the circuit is untenable in the future. Our research into distributed clock generation and distribution Fairbanks2005 indicates that time can be uniformly distributed across future deep submicron chips. None-the-less, even with advanced clocking techniques it is still important that long distance data traversal needs to take into account the limitations of the clock distribution technique used.
Signal integrity - As wires have become narrower, they have also become taller in order to reduce resistance per unit length. Unfortunately tall wires placed close together couple rather well which at best introduces data dependent timing characteristics and at worst corrupts the data. Consequently, signal integrity is an increasing issue for longer wires. Therefore, there is increased engineering effort in the layout and analysis of long wires.
Cross sectional bandwidth - Rent's rule indicates that the cross sectional bandwidth of a design inexorably increases which has resulted in an ever increasing demand for layers of metallisation to provide sufficient static routing capacity. This in turn has an impact on mask costs for chip manufacture which are currently growing exponentially (currently around $1m for a mask set for a 90nm process).
Design for reuse - Ever increasing design complexity is being tackled using the well trodden path of abstractions or modularisation. Well defined modules can then be designed for reuse, often referred to as intellectual property blocks (IP blocks) in the EDA business. Clean interfaces are required for quick and reliable integration. Commercial examples of communication interfaces include ARM Ltd's AMBA bus for ASICs and Altera's Avalon bus for FPGAs.

The communication centric design approach

From the five challenges it is apparent that movement of data across chip is an increasingly important issue. Based on our research Mullins2004 Moore2002 and that of others (e.g. Dally's at Stanford and the RAW work at MIT), we believe that the design of effective on-chip networks will be key to achieving both power and performance goals and alleviating the problems of integration and verification in future ASICs and FPGAs. Such an interconnect can be designed for reuse. Top level wiring can be well designed to ensure high performance and good signal integrity. Routing data through switches allows wire reuse. Wiring between switches can be routed with geometric regularity which simplifies verification of signal integrity. But what architectural form should the networks take for different applications and how can designs be technology mapped onto FPGAs and ASICs?

Design metaphors

Viewing a switched interconnect as a bus replacement is just one design metaphor. One could also provide an abstraction which looks like independent FIFOs or communication channels. This abstraction maps well to constructs appearing in hardware description languages (HDLs) like SystemC and SystemVerilog. For existing design systems, generators like Altera's SOPC builder could be used to instantiate FIFOs, communication channels and bus equivalents that would map directly onto the switched interconnect.

Dynamic scheduling of the switches may well be appropriate for many applications, but others may require hard real-time requirements to be met. One option is to statically schedule the switches, thereby time division multiplexing the wires.

The latency of chip wide communications will require architectural changes, some of which can be addressed through high-level language support which enable latency tolerant communication to be described. Once a design is represented in this abstract form, constraint-driven communication synthesis can then be performed.

Technology mapping

Higher level HDLs and reuse of abstract IP blocks allows a design team to construct complex systems more rapidly. This design approach typically allows designs to be targeted at a range of implementation technologies, from soft programmed devices like FPGAs, to gate array type structures (e.g. Altera's Hardcopy devices) and hard wired ASICs. A switched interconnect could present the same interface for all levels of technology mapping, though the implementation may be quite different:

FPGA - On an FPGA the interconnect will augment some of the longer statically configured wires. Some full custom components may be required, for example custom drivers to allow high speed time division multiplexing on wires. Such multiplexing could then run at least 10X faster than the surrounding programmable FPGA fabric. For the rest of the architecture it is currently unclear which other components (e.g. a cross bar, arbiters, etc.) are best implemented in programmable logic and which will benefit from full custom implementation. It may also be beneficial to co-locate full custom components with specific reconfigurable components. So, for example, one or more embedded memories might be co-located with the NOC core, thereby allowing the amount of buffering to be configurable, or to provide routing tables for more complex architectures.
Hardcopy - Some critical components of the switched interconnect could be hard coded into modern gate array technologies (e.g. Altera's Hardcopy devices).
ASIC - The switched interconnect could be laid down as a full-custom or semi-custom generated structure as part of a block based design flow.

Related work

Commercial bus systems are already evolving into networks-on-chip in order to address today's implementation technology demands. For example ARM's multilayer AHB, IBM's CoreConnect, ST Microelectronics' STBus, Hitachi's SuperHyway, Sonics' SMART interconnect, siliconbackplane and Manchester's CHAIN interconnect, etc. These all replace the bus with some form of cross bar with central arbitration, neither of which scales well to a large number of ports.

Research into networks-on-chip has produced several theoretical architectures, but few have been implemented. Notable exceptions are the static and dynamic networks of MIT's RAW tiled processor architecture, Philips' SOC interconnect and of course our own work.

Outcomes

Daniel Greenfield and Simon Moore, Implications of Electronics Technology Trends to Algorithm Design, To appear in The Computer Journal, 2009.

Daniel Greenfield and Simon Moore, Implications of Electronics Technology Trends to Algorithm Design, In Proceedings of the BCS International Academic Conference on Visions of Computer Science, pages 331-342, 22-24 September 2008.

Simon Moore and Daniel Greenfield, The Next Resource War: Computation vs. Communication, In the 10th International Workshop on System-Level Interconnect Prediction, April 2008.

Arnab Banerjee, Robert Mullins and Simon Moore, A Power and Energy Exploration of Network-on-Chip Architectures, In Proceedings of the First Intl. Symp. on Networks-on-Chips, May 2007.

Daniel Greenfield, Arnab Banerjee, Jeong-Gun Lee and Simon Moore, Implications of Rent's rule for NoC design and its fault-tolerance, In Proceedings of the First Intl. Symp. on Networks-on-Chips, May 2007.

Arnab Banerjee, Pascal Wolkotte, Robert Mullins, Simon Moore and Gerard Smit, An Energy and Performance Exploration of Network-on-Chip Architectures, In The IEEE Transactions on VLSI Systems Special Section on Networks-On-Chip, To appear in 2009.

Rosemary Francis, Simon Moore and Robert Mullins, A Network of Time-Division Multiplexed Wiring for FPGAs, In Proceedings of the 2nd ACM/IEEE Intl. Symp. on Networks-on-Chips, April 2008. DOI=10.1109/NOCS.2008.4492723

Robert Mullins and Simon Moore, Demystifying Data-Driven and Pausible Clocking Schemes, In Proceedings of 13th IEEE Intl. Symp. on Asynchronous Circuits and Systems (ASYNC), March, 2007.

Computer Laboratory