Department of Computer Science and Technology

Technical reports

Complexity-effective superscalar embedded processors using instruction-level distributed processing

Ian Caulfield

December 2007, 130 pages

This technical report is based on a dissertation submitted May 2007 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Queens’ College.

DOI: 10.48456/tr-707

Abstract

Modern trends in mobile and embedded devices require ever increasing levels of performance, while maintaining low power consumption and silicon area usage. This thesis presents a new architecture for a high-performance embedded processor, based upon the instruction-level distributed processing (ILDP) methodology. A qualitative analysis of the complexity of an ILDP implementation as compared to both a typical scalar RISC CPU and a superscalar design is provided, which shows that the ILDP architecture eliminates or greatly reduces the size of a number of structures present in a superscalar architecture, allowing its complexity and power consumption to compare favourably with a simple scalar design.

The performance of an implementation of the ILDP architecture is compared to some typical processors used in high-performance embedded systems. The effect on performance of a number of the architectural parameters is analysed, showing that many of the parallel structures used within the processor can be scaled to provide less parallelism with little cost to the overall performance. In particular, the size of the register file can be greatly reduced with little average effect on performance – a size of 32 registers, with 16 visible in the instruction set, is shown to provide a good trade-off between area/power and performance.

Several novel developments to the ILDP architecture are then described and analysed. Firstly, a scheme to halve the number of processing elements and thus greatly reduce silicon area and power consumption is outlined but proves to result in a 12–14% drop in performance. Secondly, a method to reduce the area and power requirements of the memory logic in the architecture is presented which can achieve similar performance to the original architecture with a large reduction in area and power requirements or, at an increased area/power cost, can improve performance by approximately 24%. Finally, a new organisation for the register file is proposed, which reduces the silicon area used by the register file by approximately three-quarters and allows even greater power savings, especially in the case where processing elements are power gated.

Overall, it is shown that the ILDP methodology is a viable approach for future embedded system design, and several new variants on the architecture are contributed. Several areas of useful future research are highlighted, especially with respect to compiler design for the ILDP paradigm.

Full text

PDF (0.7 MB)

BibTeX record

@TechReport{UCAM-CL-TR-707,
  author =	 {Caulfield, Ian},
  title = 	 {{Complexity-effective superscalar embedded processors using
         	   instruction-level distributed processing}},
  year = 	 2007,
  month = 	 dec,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-707.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-707},
  number = 	 {UCAM-CL-TR-707}
}