Computer Laboratory

Computer Architecture Group

Computer Architecture ACS Project Suggestions (2014-2015)

Please contact the proposer(s) by email if you are interested in any of the projects below. In addition, some of the projects from previous years may still be suitable and interesting. Please remember, these are just starting points that suggest possible directions for the resarch. You can continue to check here again over the coming weeks for more projects. We would also be happy to consider any project ideas you have too.


  1. Energy-Efficient Caching
    Contact: Timothy Jones


    Processor caches exploit both spatial and temporal locality to reduce the latency of accessing memory. In an ideal cache, when an item of data is brought in it would be free to occupy any position within the cache that it liked, replacing the least recently used value wherever it may be. In reality, fully-associative caches such as these are too costly to implement in hardware. At the other extreme, direct-mapped caches restrict each data item to only one position within the cache, but these suffer from a significant number of conflict misses, when two data items map to the same position. Therefore a compromise is found with some form of set-associativity, usually restricting each data item to a small number of positions.

    Wouldn't it be great to be able to combine the best of both worlds by having fully-associative caches at the cost of direct-mapped hardware? Recent research has pointed to a potential method for achieving this, but as yet nobody has applied the concept to caches. This project aims to be the first to do this. It will perform research into this type of cache hardware, evaluating the trade-offs involved and determining how best to make use of this recent research. The aim is to create a cache that has the low miss rate of a fully-associative cache, but with an energy consumption close to a direct-mapped cache.

  2. Flexible I/O for the lowRISC SoC
    Contact: Robert Mullins


    The lowRISC project (www.lowrisc.org) is aiming to produce a competitive open-source SoC. One important part of this project is the design of an array of simple I/O coprocessors called "Minions". Soft peripheral interfaces can be created by programming these cores. They can also be used to off-load work from the SoC's main processors, e.g. by filtering or preprocessing I/O.

    This project will explore the architecture of the Minions and the thin layer of custom logic that will be placed between the I/O pins and the Minion cores themselves (the I/O shim). The aim will be to develop an implementation that is able to support the widest range of interface types at the lowest cost. This will involve carefully dividing work between the cores and I/O shim, investigating ISA extensions and devising a suitable interface between the minions and I/O shim.

    Slides outlining the lowRISC project can be found here

  3. Exploring Architectural Trade-offs in RISC-V Processors
    Contact: Robert Mullins


    This project will explore complexity, area, power and performance trade-offs for a number of different processor implementations (targeting the RISC-V ISA). Comparisons will be made to public implementations and those created by the student.

    Detailed comparisons will be made using a standard ASIC toolflow.

  4. A Flexible Multipurpose Tagged Memory System
    Contact: Robert Mullins


    The lowRISC project (www.lowrisc.org) is aiming to produce a competitive open-source SoC. We aim to support a simple tagged memory system to provide protection against control-flow hijack attacks. This project would explore the implementation of the tagged memory system and possible other uses for it, e.g. infinite memory watchpoints, garbage collection, accelerating existing debug tools, locks on every word, simple control-flow integrity checks etc.



Computer Architecture ACS Project Suggestions (2013-2014)

The following is a list of old projects from previous years that may provide inspiration for your own ideas.

  1. Optimal Heterogeneous CMP Core Selection
    Contact: Timothy Jones


    As we project into the future, continued increases in transistor counts, coupled with tight processor power constraints, will lead to increased specialisation of cores within a chip multiprocessor (CMP). However, it is still an open question as to what this heterogeneous CMP will look like.

    This project will seek to answer this question by exploring the design space of heterogeneous CMPs. It will use the gem5 simulation infrastructure to run applications on a variety of cores and develop an algorithm to pick the best ones, given constraints such as power or area.



  2. Vectorisation in General Purpose Applications
    Contact: Timothy Jones


    Modern application processors now contain specialised instructions for operating on a vector of data. This is often called single-instruction, multiple data (SIMD) processing, and common forms are the SSE and AVX instructions in x86 processors, or NEON instructions in ARM. Making use of these instructions can provide significant speed ups.

    This project will study the opportunities for vectorisation within general purpose applications, which are traditionally not suited to this kind of processing. It will analyse the loops within each application to determine the inherent vector operations and those that can be exposed through additional compiler transformations. The goal is to expose as many opportunities for vectorisation as possible and, if time allows, implement a vectorisation pass within a compiler to take advantage of these.



  3. RISC-V Implementation in Bluespec
    Contact: Simon Moore Contact: David Chisnall


    The University of California, Berkeley is developing the RISC-V open instruction set architecture to promote open source research into computer architecture, but their current implementations are simple, unproven, user-mode designs. At the Cambridge Computer Laboratory, we have been developing the BERI 64-bit MIPS processor which now has a mature design with register forwarding, branch prediction, a MMU, floating point, a dependable cache heirarchy as well as a mature system on chip.

    This project would implement the RISC-V ISA instead of the 64-bit MIPS ISA using the BERI infrastructure. The base project would include user-mode, 32-bit instructions. Optional extensions, of which at least one should be attempted, include floating point instructions, 16-bit instructions, and full system support (which is preliminary in the specification). The resulting processor should be able to run code compiled with riscv-gcc from Berkeley. The student may also attempt or colaborate to develop an LLVM backend for the RISC-V ISA. This project will explore implementation implications of the experimental RISC-V instruction set as well as provide insight into the efficiency of the ISA when running compiled code.



  4. A Fast Cache Hierarchy for BERI
    Contact: Simon Moore Contact: Jonathan Woodruff


    The BERI processor, developed at the University of Cambridge, is a 64-bit MIPS processor which is somewhat mature and reliable, but has not so-far been optimised extensively for performance. One of the greatest shortcomings of the current design is cache performance, which only allows a single outstanding transaction.

    This project would implement cache heirarchy for the BERI project that can saturate the bandwidth to DRAM for the Terasic DE4. The student would implement instruction and data L1 caches with a shared L2 cache as well as a traffic generator to test the heirarchy. The caches should be pipelined and allow at least 16 outstanding transactions and should run at a high clockspeed on the Terasic DE4. The caches should be parameterizable for size and possibly for line size and associativity. The traffic generator should be capable of both speed tests and complex patterns to test consistency in the caches. An optional extension would be to exend the caches to support coherency when more than one set of L1 caches is present. The final report should present cache performance with a range of parameters which trade off between area, clock speed, and performance.





  5. Computer Architecture ACS Project Suggestions (2012 - 2013)


    1. Application Scheduling for Heterogeneous Systems
      Contact: Robert Mullins and Timothy Jones


      As energy efficiency becomes the main driver for processor development, heterogeneous systems become attractive, since they allow applications to be scheduled on the cores that best suit their current requirements. Emerging heterogeneous systems include those with close CPU-GPU integration and ARM's big.LITTLE processors.

      The goal of this project is to perform an evaluation of a heterogeneous multicore system using the gem5 simulation environment. It will consider a range of cores to determine the optimal system for a group of multi-threaded and multi-programmed workloads. There should not need to be a significant amount of infrastructure development, since gem5 already includes support for multiple, configurable cores. The results will be an analysis of the types of workloads that benefit from heterogeneity in the processor and how they can be successfully scheduled together.



    2. Speculative Guided Parallelisation of Application Binaries
      Contact: Robert Mullins and Timothy Jones


      With multicore systems now the norm across the computing landscape, and many-core systems on the horizon, it is important for applications to gain performance through parallel execution. However, a significant fraction of existing software is in single-threaded form, and rewriting it to be parallel would be a significant undertaking.

      This project seeks to parallelise applications without needing to alter the program source code. Using dynamic binary instrumentation and rewriting, such as within DynamoRio, it will alter program loops as they execute to allow them to run in parallel. To avoid complicated analysis of each loop, it will employ a form of speculation to catch situations where the code must be executed sequentially. The loops to parallelise will be determined in advance.



    3. Acceleration of the Floyd-Steinberg dithering algorithm
      Contact: Robert Mullins and Timothy Jones


      Applications such as high-speed ink-jet printing need to perform image dithering at Gpixel/s rates. Highly optimised sequential implementations can today only reach ~200Mpixels/sec. This project will explore parallel implementations of the Floyd-Steinberg algorithm, either hand-coded or produced with the aid of an automatic loop parallelisation technique (called HELIX).

      There is scope to extend the project to explore source-to-source transformations that could improve the performance of the HELIX technique.

      [1] PT Metaxas, Optimal parallel error diffusion dithering
      [2] Y Zhang, "Line diffusion: a parallel error diffusion algorithm for digital halftoning"

    4. Scalable Graphics Shader Engine
      Contact: Simon Moore


      Full-system research is becoming practical using FPGAs, and Cambridge is at the forefront with a full CPU and OS stack with a number of peripherals. However modern systems are not complete without an autonomous graphics processing unit with implications for system-on-chip data flow and prioritization, memory allocation and scheduling, and especially security.

      This project would explore the implications of an autonomous graphics processing unit in a system-on-chip architecture. This project would design and build a compact, scalable fragment shader engine in Bluespec SystemVerilog which is able, at least, to apply textures to triangles in a framebuffer. We would recommend an internal 16-bit floating-point pixel format similar to the ARM MALI GPU to save area and improve timing.

      Evaluation could include efficiency and performance as well as novel memory protection or sharing ideas when combined with the Cambridge CHERI 64-bit MIPS processor optionally running FreeBSD.

      [1] ARM, MALI Shader Arithmatic