Parts II, III & ACS Projects
These are some Part II, Part III and ACS project ideas that I've put together. The groupings are only loose and many of the projects could be adapted to suit any course stage. Please contact me for more information. If you have your own ideas for a project that you'd like to do with me, then it would be great to hear them so please also get in touch. I'm interested in advising anybody wanting to do a project in computer architecture or compilers.
Part II Projects
Also, check out the Computer Architecture Group projects page for other ideas. Some of these are replicated there.
Software Prefetching in a Binary Translator
Software prefetch instructions are often not required, since the hardware prefetchers can do a good job with simple access patterns. Even when they would be beneficial, they difficult to use correctly and many programmers simply leave them out of their codes. However, for a certain class of workload they can provide significant speedups when placed in the correct locations in the program. The aim of this project is to automatically add software prefetches into applications when they would benefit from them. Instead of doing this in the compiler, it would be good to perform the transformations inside a dynamic binary translator, such as DynamoRIO, which avoids the need for recompilation and allows the prefetches to be inserted only on systems where they are beneficial. A published paper from our group describes the kinds of codes that benefit and an algorithm for automatic insertion in the comipler. This project could implement something similar in a DBT where there are different challenges from modifying a compiler's IR.
Hardware Transactional Memory
Transactional memory systems have been gaining increasing popularity since multicores became commonplace due to their ease of use, allowing the programmer to avoid reasoning about locks while transparently supporting atomic access to data. There are a number of software transactional memory libraries available and hardware transactional memory has now become mainstream with Intel's release of the TSX extensions in Haswell processors (bugs notwithstanding). However, a significant downside is that no ordering can be enforced on transaction commit, meaning that TSX is a poor fit for techniques such as thread-level speculation. This project would start by implementing a simple version of hardware transactional memory within a simulator, such as gem5. It would then evaluate the amount of performance improvement available either through source code alterations, or automatically in hardware through speculative thread creation.
Optimising Multithreaded Stalls
Within a multithreaded application, threads often have to wait for others to finish computation. During this waiting time, no useful work is performed by the stalled threads, so they are not making best use of the underlying multicore hardware. This project aims to quantify the amount of stalling that each thread experiences in a multithreaded workload. It will then develop schemes to optimise this time away, by allowing the waiting thread to perform useful work (e.g. prefetching in data it will use after the stall).
Part III / ACS Projects
Binary Alias Analysis
One of the (many!) problems with transforming application binaries is that a significant amount of information about the code has been lost during the compilation stages. Analysing and transforming binaries is an important task because it allows us to optimise programs even when we don't have access to the source code. A number of transformations rely on knowing whether two instructions access the same locations in memory, so that we can ensure we maintain application correctness when we perform our optimisations. Recently, a number of analyses have been proposed to identify the memory objects that a program uses, specifically for Intel x86 binaries. The aim of this project is to implement a subset of these analyses for ARM binaries, and develop strategies for coping with facets of the ISA that are not present in x86, like predicated execution. If time permits, this analysis could then be used to implement a code motion transformation to safely move instructions around in the binary.
Automatic program parallelisation is the process of taking a sequential application and splitting it into threads that can be run independently. One of the challenges of automatic parallelisation is defining the amount of code that each thread should execute. This project seeks to take this to the extreme, by splitting up the application into very small units of work, with threads executing many of these tasks each. This could be performed by hand or, ideally, implemented as a pass within an optimising compiler, such as LLVM.