Computer Laboratory: Alan Mycroft's Masters' project suggestions
Here are some possible compiler-like project sugestions students taking the MPhil in ACS or Part III (MEng) in the Computer Science Tripos. They are all designed to have possibilities of resulting publications.
Contents
- Unifying LOLCAT with separation logic
- Liveness-based Garbage Collection
- Liveness-based Pointer Analysis
- SSA-based Register Allocation in LLVM
- A Decompiler
Unifying LOLCAT with separation logic
Castegren and Wrigstad's paper in ECOOP'2017 explores a type-like capability system for ensuring programs using lock-free data structure and CAS are safe. Separation logic provides a richer language for reasoning about such programs, but is not so easily mechanisable as a type system. Can we establish LOLCAT as a syntactically restricted separation logic?
Liveness-based Garbage Collection
A paper in CC'2014 on this topic shows how a simple language can be augmented with tables, similar to debugging information, about liveness which can be used to improved garbage collection by following only live data rather than reachable data. There are many ways this initial work could be extended, e.g. mutable data, avoiding multiple heap-node revisits etc.
Liveness-based Pointer Analysis
A paper on this topic shows points-to analysis can be significantly improved (in performance and in accuracy) in GCC by storing only live points-to information. Exploring this idea for LLVM would give an additional data point.
SSA-based Register Allocation in LLVM
Traditionally compilers eliminate the phi-functions of SSA form, and then do register allocation. Optimal register allocation is therefore NP-complete. A few years ago Sebastian Hack noted that register allocation directly on SSA form is linear, and NP-completeness only arises on SSA-elimination (at which time different heuristics are available).
The project is to implement Hack's work as part of LLVM.
In a previous project Fernando M Q Pereira developed an implementation of a similar idea in A Practical Approach for SSA Based Register Allocation. (Thanks to Pavel Yudin for pointing this out).
A Decompiler
The aim of a decompiler is to produce as readable as possible high-level
output (e.g. C or higher) from low-level source (e.g. assembler or
binary). A simple system can be got working quite quickly
e.g. produce C statements like "AX=AX+BX;" for assembly code like
addl %eax,%ebx
Doing it better has almost limitless possibilities.
Cifuentes'
dcc
was one of the first decompilers.
The Decompilation Page of the program-transformation wiki
gives good background (but may be not up-to-date).
There's some some work of mine:
Type-based decompilation to recover struct/union/pointer type
information from assembler, and Wei-Ming Khoo's
Rendezvous
project and up-coming PhD thesis on "decompilation as search".
An excellent commercial disassembler and decompiler comes from
Hex-Rays.