Parts II, III & ACS Projects
These are some Part II, Part III and ACS project ideas that I've put together. The groupings are only loose and many of the projects could be adapted to suit any course stage. Please contact me for more information. If you have your own ideas for a project that you'd like to do with me, then it would be great to hear them so please also get in touch. I'm interested in advising anybody wanting to do a project in computer architecture or compilers.
Part II Projects
Hardware Transactional Memory
Transactional memory systems have been gaining increasing popularity since multicores became commonplace due to their ease of use, allowing the programmer to avoid reasoning about locks while transparently supporting atomic access to data. There are a number of software transactional memory libraries available and hardware transactional memory has now become mainstream with Intel's release of the TSX extensions in Haswell processors (bugs notwithstanding). However, a significant downside is that no ordering can be enforced on transaction commit, meaning that TSX is a poor fit for techniques such as thread-level speculation. This project would start by implementing a simple version of hardware transactional memory within a simulator, such as gem5. It would then evaluate the amount of performance improvement available either through source code alterations, or automatically in hardware through speculative thread creation.
Optimising Multithreaded Stalls
Within a multithreaded application, threads often have to wait for others to finish computation. During this waiting time, no useful work is performed by the stalled threads, so they are not making best use of the underlying multicore hardware. This project aims to quantify the amount of stalling that each thread experiences in a multithreaded workload. It will then develop schemes to optimise this time away, by allowing the waiting thread to perform useful work (e.g. prefetching in data it will use after the stall).
Function Call Parallelism
When parallelising an application, function calls provide obvious points where a new thread can be spawned to do the work of the procedure whilst the main thread carries on. However, in the general case, this isn't safe because the called function might write to memory that is later read by the other thread. This project will provide a safety net for this type of parallelism by implementing thread-level speculation at function calls, using a transactional memory library to catch these dependences. The work could be done by hand to several benchmarks or, ideally, implemented within a compiler such as LLVM.
A Fast Lock-Free Software Queue
Lock-free queues provide scalable communication mechanisms for multi-threaded applications. We have implemented a fast lock-free software queue for fine-grained communication between a single producer and single consumer. This project will generalise the queue to a multiple consumer scenario and evaluate it within different contexts.
Part III / ACS Projects
Binary Alias Analysis
One of the (many!) problems with transforming application binaries is that a significant amount of information about the code has been lost during the compilation stages. Analysing and transforming binaries is an important task because it allows us to optimise programs even when we don't have access to the source code. A number of transformations rely on knowing whether two instructions access the same locations in memory, so that we can ensure we maintain application correctness when we perform our optimisations. Recently, a number of analyses have been proposed to identify the memory objects that a program uses, specifically for Intel x86 binaries. The aim of this project is to implement a subset of these analyses for ARM binaries, and develop strategies for coping with facets of the ISA that are not present in x86, like predicated execution. If time permits, this analysis could then be used to implement a code motion transformation to safely move instructions around in the binary.
Automatic program parallelisation is the process of taking a sequential application and splitting it into threads that can be run independently. One of the challenges of automatic parallelisation is defining the amount of code that each thread should execute. This project seeks to take this to the extreme, by splitting up the application into very small units of work, with threads executing many of these tasks each. This could be performed by hand or, ideally, implemented as a pass within an optimising compiler, such as LLVM.