Computer Laboratory

Computer Laboratory: Alan Mycroft's Masters' project suggestions

Here are some possible compiler-like project sugestions students taking the MPhil in ACS or Part III (MEng) in the Computer Science Tripos. They are all designed to have possibilities of resulting publications.

Contents


Unifying LOLCAT with separation logic

Castegren and Wrigstad's paper in ECOOP'2017 explores a type-like capability system for ensuring programs using lock-free data structure and CAS are safe. Separation logic provides a richer language for reasoning about such programs, but is not so easily mechanisable as a type system. Can we establish LOLCAT as a syntactically restricted separation logic?


Liveness-based Garbage Collection

A paper in CC'2014 on this topic shows how a simple language can be augmented with tables, similar to debugging information, about liveness which can be used to improved garbage collection by following only live data rather than reachable data. There are many ways this initial work could be extended, e.g. mutable data, avoiding multiple heap-node revisits etc.


Liveness-based Pointer Analysis

A paper on this topic shows points-to analysis can be significantly improved (in performance and in accuracy) in GCC by storing only live points-to information. Exploring this idea for LLVM would give an additional data point.


SSA-based Register Allocation in LLVM

Traditionally compilers eliminate the phi-functions of SSA form, and then do register allocation. Optimal register allocation is therefore NP-complete. A few years ago Sebastian Hack noted that register allocation directly on SSA form is linear, and NP-completeness only arises on SSA-elimination (at which time different heuristics are available).

The project is to implement Hack's work as part of LLVM.

In a previous project Fernando M Q Pereira developed an implementation of a similar idea in A Practical Approach for SSA Based Register Allocation. (Thanks to Pavel Yudin for pointing this out).


A Decompiler

The aim of a decompiler is to produce as readable as possible high-level output (e.g. C or higher) from low-level source (e.g. assembler or binary). A simple system can be got working quite quickly e.g. produce C statements like "AX=AX+BX;" for assembly code like
        addl %eax,%ebx
Doing it better has almost limitless possibilities. Cifuentes' dcc was one of the first decompilers. The Decompilation Page of the program-transformation wiki gives good background (but may be not up-to-date). There's some some work of mine: Type-based decompilation to recover struct/union/pointer type information from assembler, and Wei-Ming Khoo's Rendezvous project and up-coming PhD thesis on "decompilation as search". An excellent commercial disassembler and decompiler comes from Hex-Rays.