Computer Laboratory

Computer Architecture Group

DOME - Delaying and Overcoming Microprocessor Errors

The ever-increasing numbers of transistors in modern microprocessors, developed from ranges of increasingly smaller technology nodes, pose significant reliability challenges. Reliable components are becoming harder to design with each coming generation of processors, reminiscent of the challenges faced by early-stage computer scientists, as evident in writings of that time (e.g.: Von Neumann in 1956).

Component reliability faces further challenges, such as the power wall caused by slow scaling in operating voltages, compared to the reductions in transistor sizes. Another challenge is the decreasing lifetimes of the resulting transistors caused by wearout.

The DOME project seeks to address these upcoming challenges through development of reliability schemes within a Managed Runtime Environment, aiming to improve processor lifetimes and slow down the ageing process. Lifetime reliability of processors will be improved due to wearout-prevention mechanisms. Although the first steps in preventing processor wearout will slow down ageing, hard faults will always occur during the processor's lifetime. Therefore techniques to handle errors, once they have occurred within the processor, will be developed to maintain execution on faulty hardware.

This project is a collaboration between researchers at the University of Cambridge and the APT Group at the University of Manchester.

Posters

A poster presented at the Computer Lab's 75th anniversary.