Computer Laboratory

Student projects (Part II and ACS)

Previously completed student projects

I have been fortunate enough to work with a number of very talented students on a variety of systems projects in the past. I summarize the projects I supervised below, and include links to the final dissertations where available.

Academic year 2014/2015

Hapi: Fast and scalable cluster scheduling using flow networks

by Adam Gleave, St John's College

» Dissertation

Hephaestus: a Rust runtime for a distributed operating system

by Andrew Scull, St John's College

» Dissertation

Academic year 2013/2014

Hydra: Automatic Parallelisation using LLVM

by James Chicken, Homerton College

This project received a score of 91/100 (ranked second in the year)

This project was concerned with the design and implementation of a compiler extension which can automatically add parallelism to serial programs. This is done by offloading function calls to a parallel thread when beneficial. Static analysis is used to determine where offloading is possible, and to assess its profitability. To do this in a language and architecture-independent fashion, a modern compiler framework (LLVM) is extended to perform the analysis and transformation on its intermediate representation. Finally, a portable parallel runtime for running the parallelised programs was implemented.

The prototype system, Hydra, is able to statically identify functions which are fit for offloading. It also assesses, through a variety of heuristics, the gains associated with offloading a call. It can then realise the offload intent using two supported runtimes: kernel threads and a portable thread pool implementation. The correctness of Hydra has been verified through carefully designed tests. Hydra achieves a 6.33x speedup on a serial implementation of an n-body simulation when using a 48-core machine, without any user assistance.

» Dissertation

Unbuckle: a high-performance in-kernel key-value store

by Matthew Huxtable, St John's College

This project received a score of 92/100 (ranked first in the year)

This project involved the implementation and performance analysis of a platform-independent key-value store within an operating system kernel. It sought to investigate the overhead imposed by the user-space privilege transition on applications running in modern data centres and showed much faster a software key-value store can be made by removing layers of abstraction and multi-process safeguards.

The result was Unbuckle, a kernel-based, memcached-compliant key-value store with an evaluation of its performance in three contexts: the kernel, user-space, and in comparison with existing commercial and research systems.

Unbuckle can be compiled as a Linux kernel module or a user-space application. On a 10 Gbps network, the in-kernel version achieves a 25% reduction in latency at the same time as a 40% throughput increase compared to the user-space version. Unbuckle scales to 22.5 Gbps throughput using eight cores of a modern server. It outperforms memcached, an optimised key-value store used by many web companies, in both throughput (3.5x improvement) and latency (50% reduction).

» Dissertation

Benson: Extending Raft with structured voting

by Leonhard Markert, Emmanuel College

» Dissertation

The BDFS distributed file system

by Bogdan-Cristian Tataroiu, Churchill College

» Dissertation

Network packet trace visualisation at scale

by Christopher Wheelhouse, St John's College

» Dissertation

Academic year 2012/2013

Cluster-in-a-box: task parallel computing on many-core machines

by Antanas Uršulis, Queens' College

» Dissertation

Wrench: A Distributed Key-Value Store with Strong Consistency

by Forbes Lindesay, Robinson College

» Dissertation

Academic year 2011/2012

No projects supervised as I was away at Google for part of the academic year.

Academic year 2010/2011

Osiris: Secure Social Backup

by James Bown, St John's College

This project received a score of 82/100 (ranked fourth in the year)

Many people would like to have secure off-site backups of their personal data. Recently, so-called "cloud storage" services (such as e.g. Amazon S3) have started providing infrastructure for this. However, there is are many inherent issues to do with trust and control: users may not trust a cloud provider with their personal data, or they might feel uncomfortable about it being stored in a country with a jurisdiction that might not protect their privacy. Furthermore, cloud storage is still prohibitively expensive.
A better concept might be for a group of friends to arrange a mututally beneficial backup system whereby everyone offers some storage space to the others (this could be on servers, desktop machines, laptops or mobile devices). The challenge is to do this in a secure way that maintains privacy and offers compelling performance.
Different pairings of people might have different degrees of trust and this could be reflected in the strenght of pairwise encryption (though one would have to consider the case of a friend accidentally giving parties with weaker trust getting accessing the data!). Alternatively, data could be distributed across multiple different friends' devices, and only the combination would be sufficient to reconstruct the old data.
This project has lots of scope for the design and implementation of clever synchronization algorithms and cryptographical constructs.

» Dissertation

Cirrus: Distributing Application Threads on the Cloud

by Sebastian Hollington, St John's College

This project received a score of 84/100 (ranked third in the year)

This project made use of the SRG's Ciel distributed execution engine to provide a pthread-like API for writing distributed programs that can run on an elastic compute cluster or "the cloud". It allows programs to be written accordding to the familiar threading paradigm, using fork() and join() primitives. They can then run on top of a Ciel cluster, taking advantage of elastic cluster membership, transparent fault tolerance and result memoization. The thread migration was implemented using the BLCR checkpoint-restart framework; furthermore, support basic emulated shared memory functionality exists. The goal of this project was to explore intuitive programming models for parallel computation. In particular, it investigated whether a threading-like approach can compete with the traditional, special-purpose cloud programming paradigms like MapReduce.

» Dissertation

Implementing Homomorphic Encryption

by Valentin Dalibard, St John's College

This project received a score of 85/100 (ranked second in the year)

Homomorphic encryption is the idea of performing arbitrary computations on an encrypted piece of data ("ciphertext") without decrypting it first. In other words, the operations manipulate the ciphertext to generate another, different, ciphertext, which when decrypted produces a plaintext that is equivalent to the original plaintext with the operations applied. For example, operation "add 37" on plaintext "5" gives "42". Under homomorphic encryption, we apply "add 37" to E(5) (where E is an encryption function) and get back E(42), which when decrypted gives us "42" (the correct answer). The utility of this is that using homomorphic encryption, secure computation could be performed in a potentially hostile environment (e.g. untrusted servers on the "cloud").
An important observation is that a fully homomorphic scheme only needs to support two operations: addition and multiplication. These are sufficient to bootstrap any computation that we can imagine (even though it would be very, very, very slow!).
Homomorphic encryption has long (since the 1980s) been considered to be an unsolved "holy grail" of cryptography. However, recently, several working schemes have been published (the last is easiest to understand and implement as it does not require understanding of lattices). Most of the work in this space has been in theory though, and there is implementation of any of these systems available.
Your task in this project would be to implement one of the published schemes and show that it works with small examples (due to the exponential increase in computing power required, you are likely to only be able to try it with small inputs and keys, but that is enough for a proof-of-concept).
As a possible extension, the project investigated how much of the process can be parallelised (a question that has received very little attention so far) to speed it up, and found various opportunities.

» Dissertation