Computer Laboratory

Computer Architecture Group

Computer Architecture Part II Project Suggestions

Please contact the proposer(s) by email if you are interested in any of the projects below. In addition, some of the projects from previous years may still be suitable and interesting. Please remember, these are just starting points that suggest possible directions for the resarch. You can continue to check here again over the coming weeks for more projects. We would also be happy to consider any project ideas you have too.

  1. A Multicore Cache Simulator
    Contact: Timothy Jones

    Optimising a program's memory characteristics can often lead to huge speedups, since the latencies involved in communicating across a chip, or fetching data from main memory, are so high. However, it is often difficult to get more than just coarse-grained information about the types of access that a program makes and how data is shared between its different threads. This project would create a cache simulator based on the DynamoRIO dynamic binary translator, perhaps taking inspiration from its own cache simulator, but addressing some of its limitations (such as simulating arbitrary hierarchies and cache coherence).

  2. C or ML to JavaScript Compiler
    Contact: Timothy Jones

    Wouldn't it be great to be able to write web applications in your favourite language instead of having to write them in JavaScript? The idea of this project is to take some source language (for example, C or ML, but it could be anything) and write a compiler that transforms it to JavaScript. You'll probably want to consider just a subset of the language and there may be optimisations to perform on the way to produce faster / easier to read code as the end result.

  3. JavaScript Dependence Profiler
    Contact: Timothy Jones

    JavaScript has become the de facto language for web applications, historically just on the client but extending to server-side applications too with the development of environments such as Node.js. In addition, subsets of JavaScript, such as asm.js, allow programs written in other languages to be run as web applications. However, it is difficult currently to understand the dependences between different parts of a JavaScript application (e.g. between different loop iterations or different functions). The aim of this project, therefore, is to build a dependence profiler for JavaScript. This may be a standalone tool, or integrated into the browser or JavaScipt engine, and should provide information to the user about the dependences within an application in a useful and intuitive format.

  4. Modelling Interconnect Bottlenecks
    Contact: Noa Zilberman

    New computer systems architectures seek to build hyper-converged, large systems, where a large number of compute nodes are connected through a dedicated fabric. Current computer architecture simulators provide a simplistic model of computing interconnect, failing to reflect interconnect bottlenecks. This project aims to extend the gem5 simulator, by providing accurate modelling of the PCIe interconnect. It will then be used to show how interconnect bottleneck affects overall system performance.

  5. Colossus-like code breaking machines
    Contact: Simon Moore

    The Colossus code breaking machine was so pivotal to the World War II code breaking efforts at Bletchley Park that it's very existence was kept secret for 50 years. The machine was reconstructed in the 1990s by Tony Sale at The National Museum of Computing from illegally kept photographs and a few other documents. This project is to explore software and/or hardware implementations of Colossus-like machines. For example, Joachim Schueth produced an award winning program to beat Colossus using a PC (see his Ada code and example input). There are a number of possible subprojects that could be joined into a set to form an interesting researchy project (e.g. 1+2 with 3 as an option would make one good project, or 4+5 with 6 as an option, or some other combination):

    1. Produce an efficient Java/C/? version of the code breaking code based on Joachim's code taking cyphertext (not the radio broadcast Morse code) as input.
    2. Write a parallel version of the code breaking code.
    3. Write a GPU version of the code breaking code.
    4. Research the principle algorithmic approach taken by Colossus and write a simulator which is functionally faithful.
    5. Produce an FPGA implementation of a Colossus which is functionally similar to the original valve version. There is a partial version (when I looked June 2016) -
    6. Explore the use of more modern computer architecture techniques and the use of a large FPGA to produce a high performance modern Colossus.

  6. Hardware Transactional Memory
    Contact: Timothy Jones

    Transactional memory systems have been gaining increasing popularity since multicores became commonplace due to their ease of use, allowing the programmer to avoid reasoning about locks while transparently supporting atomic access to data. There are a number of software transactional memory libraries available and hardware transactional memory has now become mainstream with Intel's release of the TSX extensions in Haswell processors (bugs notwithstanding). However, a significant downside is that no ordering can be enforced on transaction commit, meaning that TSX is a poor fit for techniques such as thread-level speculation. This project would start by implementing a simple version of hardware transactional memory within a simulator, such as gem5. It would then evaluate the amount of performance improvement available either through source code alterations, or automatically in hardware through speculative thread creation.

  7. Real FPGA Virtual I/O
    Contact: Simon Moore

    This is a fairly advanced project. VirtIO is used to virtualize devices for virtual machines. It provides an abstraction layer between the guest OS and the virtual machine. Now that we have System-on-Chip (SoC) FPGAs it may be possible to treat the ARM core (running Linux or FreeBSD) as the guest OS and a NIOS core mimicking the virtual machine with shared memory (or a FIFO) between the two. The NIOS could then be replaced with some custom FPGA hardware to consume (or produce) VirtIO. For example, it would be good to have a VirtIO stream to Avalon Stream adaptor, or a VirtIO block device that could scan/search through blocks of data. Such an approach would allow FPGA acceleration while not having to deal with low-level configuration details of a particular device.

  8. Image and Video Processing
    External industrial contact: Robert Walczyk (robertwalczyk at @gmail dot com)
    Internal contact: Simon Moore

    An opportunity to contribute to the algorithmic development of image and video processing for VLSI architectures. The aim is to develop an open platform and verification framework for the purpose of implementation of real-time video processing algorithms, e.g. mathematical morphology, JPEG encoders or H.264 decoders. The platform design shall be parameterized, supporting popular FPGA development boards and expansion cards. The design shall include I/O interfaces for video acquisition from digital camera module, VGA/HDMI controller for video display and visual debug as well as generic PC interface for further debug and flow control. Moreover, the PC interface shall be accompanied by SW console to allow loading test vectors directly into the memory and offloading results for post-processing analysis.

  9. Convolutional neural networks on FPGA
    Contact: Jonathan Woodruff (jdw57@cl)

    Develop a convolutional neural network engine on FPGA.. Convolutional neural networks are finding applications in big data learning but are mostly running on standard CPUs or GPUs. This project would design a hardware/software architecture for efficient processing of convolutional neural networks on FPGA using either the BlueVec vector processor or NiosII CPU cores with custom accelerators. BlueVec is an opensource vector processor written in BlueSpec System Verilog for synthesis on FPGA and has been shown to be very efficient for low-precision arithmetic such as that used in convolutional neural networks, and may be a useful starting point for this project.

  10. Accelerators for SoC-FPGAs
    Contact: Simon Moore

    System-on-Chip Field Programmable Gate Arrays (SoC-FPGAs) like the Cyclone V used in the ECAD+Arch labs have some ARM cores and an FPGA fabric. The FPGA fabric could be used to provide an accelerator, e.g. like our BlueVec vector unit. What accelerator might you like to build?

  11. GPU hardware/software codesign
    Contact: Theo Markettos

    SoC FPGAs allow flexible interconnection between a mainstream processor (eg an ARM Cortex A9) and hardware of your own design. This enables easy transition of functionality between hardware and software.

    This project will explore that space in the context of GPUs. For example, a simple GPU could consist of a number of Yarvi cores each running a line or triangle drawing program, being fed coordinates by a driver on the ARM core, which might be running Linux or Android. A significantly more advanced GPU could involve a vector processor along the lines of BlueVec implementing (part of) the Android graphics API to render Android applications. Along the way you can explore the performance tradeoffs involved in moving compute between hardware and software.

Older Project Suggestions

The following is a list of older projects from previous years that may have been attempted already, but could be built upon or provide inspiration for your own ideas.

  1. Garbage Collector For C
    Contact: Timothy Jones

    C is a language where the programmer, for better or worse, has complete control over memory allocation and subsequent deallocation. This causes obvious issues when reasoning about memory usage is complex, whereby memory leaks occur (through forgetting to deallocate memory) and double-frees occur (when deallocating a block multiple times). Wouldn't it be great to be able to automatically manage memory in C to avoid these issues? Turns out, Hans Boehm has written a mark-and-sweep garbage collector already. The aim of this project is to write a different one, perhaps using generations or reference counting.

  2. Function Call Parallelism
    Contact: Timothy Jones

    When parallelising an application, function calls provide obvious points where a new thread can be spawned to do the work of the procedure whilst the main thread carries on. However, in the general case, this isn't safe because the called function might write to memory that is later read by the other thread. This project will provide a safety net for this type of parallelism by implementing thread-level speculation at function calls, using a transactional memory library to catch these dependences. The work could be done by hand to several benchmarks or, ideally, implemented within a compiler such as LLVM.

  3. Dynamic Data Dependence Analysis
    Contact: Timothy Jones

    Data dependence profiling is time consuming and uses a lot of space if taking program traces. To combat this, this project implemented SD3, an online data dependence analyser. A library implementing SD3 was written and then interfaced with a program automatically by writing a pass within LLVM to insert calls into the library.