Parts II, III & ACS Projects

These are some Part II, Part III and ACS project ideas that I've put together. The groupings are only loose and many of the projects could be adapted to suit any course stage. Please contact me for more information. At the end there are a list of successful projects that have been done in the recent past, to help seed ideas.

If you have your own ideas for a project that you'd like to do with me, then it would be great to hear them so please also get in touch. I'm interested in advising anybody wanting to do a project in computer architecture or compilers.

Many people implement compilers, but think a bit carefully before you do since just writing a compiler isn't necessarily all that difficult. Might you do some optimisations on the way, some sophisticated analysis of some sort, or provide a compiler that for a language / target combination that doesn't yet exist?

Part II Projects

Also, check out the Computer Architecture Group Part II projects page for other ideas. Some of these are replicated there.

Address Sanitizer in Cinnamon

Address Sanitizer is a tool to detect buffer overflows in program code through compiler support. The compiler inserts extra instructions that manage a shadow memory indicating areas that contain application data and those that do not, then other instructions to check addresses that are loaded from or stored to. It would be great to provide this functionality to all applications, even if we don't have compiler support. Cinnamon is a domain-specific language for binary profiling and monitoring, which should be ideal for implementing this kind of tool. However, Cinnamon lacks support for pointers and other data structures, so the aim of this project would be to implement this functionality and write Address Sanitizer in it. Extensions could be to optimise the performance of the implementation through addtional analysis, and to implement other types of sanitizer, such as those mentioned on the Wikipedia page. This project touches on programming language design, compilers and binary modification.

Vector runahead

This project was performed in 2022/23 but there is scope for it to be attempted again in a different way.

To address the widening performance gap between CPU cores and main memories, designers have implemented prefetchers into various levels of the cache hierarchy, so as to bring data close to the processor before it is needed, meaning it is available in fast storage at the point of use. There are a wide variety of data prefetchers available, but few that can accurately identify data that is accessed through complex data structures.

An alternative scheme is Runahead execution. Here, when the processor is stalled, it continues speculatively fetching an executing instructions from the future so as to perform their memory accesses, then discards them and re-executes them correctly once the pipeline starts up again. This provides an accurate form of prefetching within the core, rather than as separate logic beside the cache. Until recently though, Runahead techniques couldn't deal with complex access logic either. However, a new scheme, called Vector Runahead, can effectively prefetch these access patterns, providing significant performance increases for certain workloads. The aim of this project is to implement Vector Runahead in the gem5 simulator to reproduce the impressive results obtained, with more advanced extensions possible too.

Dynamic binary translator

This is a fairly generic project whereby you could implement a dynamic binary translator from one ISA to another. The idea is that it will read in a binary from one ISA (e.g. x86) and then either interpret each instruction, or actually emit instructions from another ISA (e.g. Arm) for each x86 instruction that it then directly executes. It would probably be best to restrict the project to a subset of x86, or a different source ISA if preferred (e.g. AArch32 to AArch64).

Interpreter or JIT compiler

In a similar vein, why not create an optimised interpreter or JIT compiler from your favourite source language. This could be an established language, such as C or OCaml, or a minimalist language, such as P'' (or a descendent), or an existing bytecode (Java being the obvious choice, but it could be the CPython bytecode or something else). Or feel free to suggest your own.

Part III / ACS Projects

Also, check out the Computer Architecture Group ACS projects page for other ideas. Some of these are replicated there.

Old Projects

Runahead - Part II Project

Runahead is a technique to continue speculatively fetching instructions when a core is stalled, so as to prefetch the data for load instructions into the cache hierarchy. When the stall is resolved, the speculative instructions are discarded and re-executed correctly, with high likelihood of loads hitting in the cache. One project implemented several forms or runahead within the gem5 simulator. A subsequent project implemented Vector Runahead, which is a runahead technique to prefetch complex access patterns, within gem5. Both projects aimed to improve the performance of certain kinds of workload.

Use-After-Free Security Mitigations - Part III Project

This project created an extension to a memory allocator (jemalloc, but the implementation was largely independent of the specific allocator) to keep freed objects in quarantine, preventing them from being reallocated, until a sweep through memory could confirm that no dangling pointers to the objects remained. At that point they could be given back to the allocator to be recycled, thus removing many use-after-free vulnerabilities from applications.

Profile-Driven Data-Layout Optimisations - ACS Project

This project profiled memory accesses made by an application, analysed the objects that were accessed together and rewrote the binary to identify different clusters of objects. A custom memory allocator then dynamically allocated these objects so that those accessed together were placed near each other in memory, increasing spatial locality. This work got published as a paper at CGO 2020.

Language A to B Compiler - Part II Project

Various projects in the past have written compilers from one language to another. For example, Java to JavaScript, Crystal to JavaScript, Haskell to JVM, ML to JVM, Java to WebAssembly, Haskell to WebAssembly.

An Optimising Compiler - Part II Project

Various projects in the past have created optimising compilers. For example, an optimiser for OCaml's bytecode generator.

Binary Alias Analysis - ACS Project

This project wrote an alias analysis pass for Janus, our binary parallelisation tool.

Vectorisation in a Dynamic Binary Translator - Part III Project

This project wrote a vectorisation pass for Janus, our binary parallelisation tool, so as to take advantage of SIMD operation. It also got incorporated into a paper we wrote for an international workshop.

A Multicore Cache Simulator - Part II Project

This project created a cache simulator based on the DynamoRIO dynamic binary translator, providing an alternative to its own cache simulator, but addressing some of its limitations (such as simulating arbitrary hierarchies and cache coherence).

Garbage Collector For C - Part II Project

This project wrote a garbage collector for C programs, to replace malloc() and free() calls, then compared it to the Boehm garbage collector.

DoAll Parallelisation - Part II Project

The idea of this project was to identify DoAll parallelism within program loops using the LLVM intermediate representation and then split up the loops into multiple threads to execute each iteration concurrently. The project was extended to extract a form of DoAcross parallelism too.

Data Dependence Profiling - Part II Project

This project implemented SD3, a technique for dynamic data-dependence profiling.

Software Error Detection - Part II Project

This project implemented SWIFT, a technique to detect transient errors by adding extra software instructions.

Dynamic Data-Dependence Analysis - Part II Project

Data dependence profiling is time consuming and uses a lot of space if taking program traces. To combat this, this project implemented SD3, an online data dependence analyser. A library implementing SD3 was written and then interfaced with a program automatically by writing a pass within LLVM to insert calls into the library.

Function Call Parallelisation - Part II Project

This project attempted to run functions in their own threads. It used software transactional memory to deal with data dependences and performed the transformation automatically in LLVM.

Ultra-Fine-Grained Parallelisation - ACS Project

Automatic program parallelisation is the process of taking a sequential application and splitting it into threads that can be run independently. One of the challenges of automatic parallelisation is defining the amount of code that each thread should execute. This project took this towards an extreme, by splitting up the application into small units of work, with threads executing many of these tasks each.

JavaScript Parallelisation - ACS Project

The main idea of this project was to transform the code generated by a JavaScript engine (we used Google's V8) to take advantage of thread-level parallelism. Essentially DoAll loops were identified by the programmer (those where all iterations can run in parallel; there are no dependences) and the V8 engine split it out into different threads, running all at the same time.

Department of Computer Science and Technology