Computer Laboratory - Computer Science Syllabus

	Computer Laboratory Computer Science Syllabus - Comparative Architectures

Computer Laboratory > Computer Science Syllabus - Comparative Architectures

Comparative Architectures

Next: Computer Systems Modelling Up: Lent Term 2007: Part Previous: Bioinformatics Contents

Comparative Architectures

Lecturer: Dr D.J. Greaves

No. of lectures: 16

Prerequisite course: Computer Design

Aims

This course examines the evolution of high-performance computers and processors and discusses the difficulties associated with making objective performance comparisons and maintaining code compatibility. The IBM System 360 is used as a reference architecture. Microprocessor evolution from 8 bits though to 64 is presented, along with important digressions to low-power, dataflow and VLIW architectures, since these techniques now underlie mainstream processor implementation. Detailed features of a number of popular Instruction Set Architectures are compared and contrasted, with particular attention to their effects on implementation and hence performance. The course addresses micro-architecture implementation issues, examining how Instruction Level Parallelism can be exploited through deep pipelining and super-scalar techniques such as out-of-order execution. Issues in memory hierarchy design are explored, and the impact they have on code optimisation. Multi-processor cluster interconnect, on chip and off chip, is briefly examined.

Lectures

Instruction set architectures. ISA history and compatibility, illustrated with IBM 360 and notable 8, 16, 32, 64 microprocessors. Review of stack/accumulator/GPR instruction sets in terms of byte sex, load-store versus register-memory, addressing modes, sub and un-aligned memory support. [3 lectures]
Comparing architectures. Moore's Law, System versus chip performance. Performance metrics MIPS, MHz, FLOPS, SPEC. Power. Price. Compatibility [2 lectures]
Advanced pipelining. The CPU performance equation. Structural hazards: long latency instructions. Data hazards: result forwarding and delayed loads. Control hazards: branch prediction, trace caches and avoiding branches. Exceptions. [3 lectures]
Super-scalar techniques. Instruction Level Parallelism (ILP). Dynamic out-of-order execution: Tomasulo, embedded dataflow, virtual registers. [2 lectures]
Beyond super-scalar. The limits of ILP. Alternative architectures: VLIW processors and custom VLIW synthesis, Tri-media, SMT, SCMP [2 lectures]
Memory hierarchy. Cache configurations. Latency versus bandwidth. Re-ordering and coherence. Programming for caches. [2 lectures]
Multi-processor systems. Multi-core devices, multi-processor cache coherency. Interconnects for NUMA, message passing clusters and network on chip: OCP, ARM AXI. Models for weak memory ordering. [2 lectures]

Objectives

At the end of the course students should

appreciate the balance between implementation and architecture in determining performance
understand how quantitative analysis led to the convergence towards RISC-like designs
comprehend the issues associated with deeply-pipelined designs
understand the operation of processors supporting out-of-order execution
be able to describe the difficulties associated with building wide-issue machines, and have a basic understanding of the alternatives to Instruction Level Parallelism
appreciate the tradeoffs made by architects in the design of memory hierarchies, and be able to optimise algorithms for memory hierarchy performance

Recommended reading

Hennessy, J. & Patterson, D. (2002). Computer architecture: a quantitative approach. Morgan Kaufmann (3rd ed.) ISBN 1-55860-724-2. (2nd edition, 1996, is also good.)



© 2006 University of Cambridge Computer Laboratory Please send any comments to pagemaster@cl.cam.ac.uk Page last updated on 12-Sep-2006 at 13:55 by Christine Northeast