Computer Laboratory

Computer Architecture Group

Computer Architecture Part II Project Suggestions

Please contact the proposer(s) by email if you are interested in any of the projects below. In addition, some of the projects from previous years may still be suitable and interesting. Please remember, these are just starting points that suggest possible directions for the resarch. You can continue to check here again over the coming weeks for more projects. We would also be happy to consider any project ideas you have too.


  1. JavaScript Dependence Profiler
    Contact: Timothy Jones


    JavaScript has become the de facto language for web applications, historically just on the client but extending to server-side applications too with the development of environments such as Node.js. In addition, subsets of JavaScript, such as asm.js, allow programs written in other languages to be run as web applications. However, it is difficult currently to understand the dependences between different parts of a JavaScript application (e.g. between different loop iterations or different functions). The aim of this project, therefore, is to build a dependence profiler for JavaScript. This may be a standalone tool, or integrated into the browser or JavaScipt engine, and should provide information to the user about the dependences within an application in a useful and intuitive format.



  2. Modelling Interconnect Bottlenecks
    Contact: Noa Zilberman


    New computer systems architectures seek to build hyper-converged, large systems, where a large number of compute nodes are connected through a dedicated fabric. Current computer architecture simulators provide a simplistic model of computing interconnect, failing to reflect interconnect bottlenecks. This project aims to extend the gem5 simulator, by providing accurate modelling of the PCIe interconnect. It will then be used to show how interconnect bottleneck affects overall system performance.



  3. Colossus-like code breaking machines
    Contact: Simon Moore


    The Colossus code breaking machine was so pivotal to the World War II code breaking efforts at Bletchley Park that it's very existence was kept secret for 50 years. The machine was reconstructed in the 1990s by Tony Sale at The National Museum of Computing from illegally kept photographs and a few other documents. This project is to explore software and/or hardware implementations of Colossus-like machines. For example, Joachim Schueth produced an award winning program to beat Colossus using a PC (see his Ada code and example input). There are a number of possible subprojects that could be joined into a set to form an interesting researchy project (e.g. 1+2 with 3 as an option would make one good project, or 4+5 with 6 as an option, or some other combination):

    1. Produce an efficient Java/C/? version of the code breaking code based on Joachim's code taking cyphertext (not the radio broadcast Morse code) as input.
    2. Write a parallel version of the code breaking code.
    3. Write a GPU version of the code breaking code.
    4. Research the principle algorithmic approach taken by Colossus and write a simulator which is functionally faithful.
    5. Produce an FPGA implementation of a Colossus which is functionally similar to the original valve version. There is a partial version (when I looked June 2016) - http://bennorth.github.io/fpga-colossus/doc/Content/notes.html.
    6. Explore the use of more modern computer architecture techniques and the use of a large FPGA to produce a high performance modern Colossus.


  4. Hardware Transactional Memory
    Contact: Timothy Jones


    Transactional memory systems have been gaining increasing popularity since multicores became commonplace due to their ease of use, allowing the programmer to avoid reasoning about locks while transparently supporting atomic access to data. There are a number of software transactional memory libraries available and hardware transactional memory has now become mainstream with Intel's release of the TSX extensions in Haswell processors (bugs notwithstanding). However, a significant downside is that no ordering can be enforced on transaction commit, meaning that TSX is a poor fit for techniques such as thread-level speculation. This project would start by implementing a simple version of hardware transactional memory within a simulator, such as gem5. It would then evaluate the amount of performance improvement available either through source code alterations, or automatically in hardware through speculative thread creation.



  5. Real FPGA Virtual I/O
    Contact: Simon Moore


    This is a fairly advanced project. VirtIO is used to virtualize devices for virtual machines. It provides an abstraction layer between the guest OS and the virtual machine. Now that we have System-on-Chip (SoC) FPGAs it may be possible to treat the ARM core (running Linux or FreeBSD) as the guest OS and a NIOS core mimicking the virtual machine with shared memory (or a FIFO) between the two. The NIOS could then be replaced with some custom FPGA hardware to consume (or produce) VirtIO. For example, it would be good to have a VirtIO stream to Avalon Stream adaptor, or a VirtIO block device that could scan/search through blocks of data. Such an approach would allow FPGA acceleration while not having to deal with low-level configuration details of a particular device.



  6. Optimising Multithreaded Stalls
    Contact: Timothy Jones


    Within a multithreaded application, threads often have to wait for others to finish computation. During this waiting time, no useful work is performed by the stalled threads, so they are not making best use of the underlying multicore hardware. This project aims to quantify the amount of stalling that each thread experiences in a multithreaded workload. It will then develop schemes to optimise this time away, by allowing the waiting thread to perform useful work (e.g. prefetching in data it will use after the stall).



  7. A Fast Lock-Free Software Queue
    Contact: Timothy Jones


    Lock-free queues provide scalable communication mechanisms for multi-threaded applications. We have implemented a fast lock-free software queue for fine-grained communication between a single producer and single consumer. This project will generalise the queue to a multiple consumer scenario and evaluate it within different contexts.



  8. Image and Video Processing
    External industrial contact: Robert Walczyk (robertwalczyk at @gmail dot com)
    Internal contact: Simon Moore

    An opportunity to contribute to the algorithmic development of image and video processing for VLSI architectures. The aim is to develop an open platform and verification framework for the purpose of implementation of real-time video processing algorithms, e.g. mathematical morphology, JPEG encoders or H.264 decoders. The platform design shall be parameterized, supporting popular FPGA development boards and expansion cards. The design shall include I/O interfaces for video acquisition from digital camera module, VGA/HDMI controller for video display and visual debug as well as generic PC interface for further debug and flow control. Moreover, the PC interface shall be accompanied by SW console to allow loading test vectors directly into the memory and offloading results for post-processing analysis.



  9. Convolutional neural networks on FPGA
    Contact: Jonathan Woodruff (jdw57@cl)


    Develop a convolutional neural network engine on FPGA.. Convolutional neural networks are finding applications in big data learning but are mostly running on standard CPUs or GPUs. This project would design a hardware/software architecture for efficient processing of convolutional neural networks on FPGA using either the BlueVec vector processor or NiosII CPU cores with custom accelerators. BlueVec is an opensource vector processor written in BlueSpec System Verilog for synthesis on FPGA and has been shown to be very efficient for low-precision arithmetic such as that used in convolutional neural networks, and may be a useful starting point for this project.



  10. Accelerators for SoC-FPGAs
    Contact: Simon Moore


    System-on-Chip Field Programmable Gate Arrays (SoC-FPGAs) like the Cyclone V used in the ECAD+Arch labs have some ARM cores and an FPGA fabric. The FPGA fabric could be used to provide an accelerator, e.g. like our BlueVec vector unit. What accelerator might you like to build?



  11. GPU hardware/software codesign
    Contact: Theo Markettos


    SoC FPGAs allow flexible interconnection between a mainstream processor (eg an ARM Cortex A9) and hardware of your own design. This enables easy transition of functionality between hardware and software.

    This project will explore that space in the context of GPUs. For example, a simple GPU could consist of a number of Yarvi cores each running a line or triangle drawing program, being fed coordinates by a driver on the ARM core, which might be running Linux or Android. A significantly more advanced GPU could involve a vector processor along the lines of BlueVec implementing (part of) the Android graphics API to render Android applications. Along the way you can explore the performance tradeoffs involved in moving compute between hardware and software.

Older Project Suggestions

The following is a list of older projects from previous years that may have been attempted already, but could be built upon or provide inspiration for your own ideas.

  1. Function Call Parallelism
    Contact: Timothy Jones


    When parallelising an application, function calls provide obvious points where a new thread can be spawned to do the work of the procedure whilst the main thread carries on. However, in the general case, this isn't safe because the called function might write to memory that is later read by the other thread. This project will provide a safety net for this type of parallelism by implementing thread-level speculation at function calls, using a transactional memory library to catch these dependences. The work could be done by hand to several benchmarks or, ideally, implemented within a compiler such as LLVM.