Computer Architecture Group
ACS Projects (2012-2013)
Computer Architecture Group
Please contact the proposer(s) by email if you are interested in any of the projects below.
Please check here again over the coming weeks for more projects. We would also be happy to consider any project ideas you have too.
- Application Scheduling for Heterogeneous Systems
Contact: Robert Mullins and Timothy Jones
As energy efficiency becomes the main driver for processor development, heterogeneous systems become attractive, since they allow applications to be scheduled on the cores that best suit their current requirements. Emerging heterogeneous systems include those with close CPU-GPU integration and ARM's big.LITTLE processors.The goal of this project is to perform an evaluation of a heterogeneous multicore system using the gem5 simulation environment. It will consider a range of cores to determine the optimal system for a group of multi-threaded and multi-programmed workloads. There should not need to be a significant amount of infrastructure development, since gem5 already includes support for multiple, configurable cores. The results will be an analysis of the types of workloads that benefit from heterogeneity in the processor and how they can be successfully scheduled together.
- Speculative Guided Parallelisation of Application Binaries
Contact: Robert Mullins and Timothy Jones
With multicore systems now the norm across the computing landscape, and many-core systems on the horizon, it is important for applications to gain performance through parallel execution. However, a significant fraction of existing software is in single-threaded form, and rewriting it to be parallel would be a significant undertaking.This project seeks to parallelise applications without needing to alter the program source code. Using dynamic binary instrumentation and rewriting, such as within DynamoRio, it will alter program loops as they execute to allow them to run in parallel. To avoid complicated analysis of each loop, it will employ a form of speculation to catch situations where the code must be executed sequentially. The loops to parallelise will be determined in advance.
- Acceleration of the Floyd-Steinberg dithering algorithm
Contact: Robert Mullins and Timothy Jones
Applications such as high-speed ink-jet printing need to perform image dithering at Gpixel/s rates. Highly optimised sequential implementations can today only reach ~200Mpixels/sec. This project will explore parallel implementations of the Floyd-Steinberg algorithm, either hand-coded or produced with the aid of an automatic loop parallelisation technique (called HELIX).There is scope to extend the project to explore source-to-source transformations that could improve the performance of the HELIX technique.
[1] PT Metaxas, Optimal parallel error diffusion dithering
[2] Y Zhang, "Line diffusion: a parallel error diffusion algorithm for digital halftoning" - Scalable Graphics Shader Engine
Contact: Simon Moore
Full-system research is becoming practical using FPGAs, and Cambridge is at the forefront with a full CPU and OS stack with a number of peripherals. However modern systems are not complete without an autonomous graphics processing unit with implications for system-on-chip data flow and prioritization, memory allocation and scheduling, and especially security.This project would explore the implications of an autonomous graphics processing unit in a system-on-chip architecture. This project would design and build a compact, scalable fragment shader engine in Bluespec SystemVerilog which is able, at least, to apply textures to triangles in a framebuffer. We would recommend an internal 16-bit floating-point pixel format similar to the ARM MALI GPU to save area and improve timing.
Evaluation could include efficiency and performance as well as novel memory protection or sharing ideas when combined with the Cambridge CHERI 64-bit MIPS processor optionally running FreeBSD.
[1] ARM, MALI Shader Arithmatic
