Part II / III & MPhil Project Suggestions

These are some suggestions for Part II / III and MPhil projects within the Digital Technology Group. If you have an idea related to our group's research interests that isn't mentioned here, get in touch.



Part II

Make a cheap EKG using an Android smartphone

Many Android smartphones contain a serial interface (Nexus 1), that could be used to communicate with a small piece of external hardware. This custom piece of hardware would take the required readings from the patient, with the phone analysing and displaying them.

Dr R. Sohan, Mattias Linnap, James Snee

Part III / MPhil

Content Based Steering for Efficient Data Delivery in 10G NICs

The coarse grained coupling between the NIC and Endhost interface in current high speed NIC designs can result in lowered application performance. In particular, the mapping of NIC receive queues to CPUs can only be done at a very high (per connection) level. This makes it very hard to arrange the system so data is always received on the same CPU on which it is consumed resulting in higher application data processing latency (due to CPU cache misses) and lower throughput performance (due to CPU stalls). Furthermore, mapping a single connection to one or more CPUs can result in overloading of those CPUs (network processing overhead). While recent technologies such as Receive Side Scaling and FlowDirector try to mitigate this issue by spreading load over CPUs and trying to automatically deliver data to the "right" CPUs they are very simple mechanisms and therefore not very effective.

In this project you would be exploring the idea of "content based data steering". The core hypothesis of this work is that if the NIC possesses some simple application logic it can efficiently steer packets to the correct CPUs reducing latency and increasing performance. The idea is that you would examine the feasibility of the approach and design and implement the idea on the NetFPGA platform. Ideally the system would be based on a BPF-like language that allows applications to specify steering policies to the NIC coupled with minor application changes required to leverage the architecture.

Dr R. Sohan, James Snee

Part III / MPhil

Framework For High-Speed Endhost NIC Evaluation

Benchmarking high speed (10G) endhost NICs can be a complicated, error prone and tedious task. There can be significant differences in performance based on the hardware configuration, kernel, driver and software versions involved. Moreover, there are OS, application and device specific runes which often get overlooked or misconfigured impacting performance. Finally, most devices benchmarks are usually irreproducible as the entirety of the configuration is unknown.

In this project you will attempt to create an extensible open-source high-speed end host NIC evaluation framework that makes it simple and fast to obtain reproducible benchmarks over a wide range of 10G NICs and application types. You will then proceed to test it by benchmarking a number of production 10G NICs for the purposes of quantitative comparision.

Dr R. Sohan, James Snee

Part III / MPhil

Memory Mapped Socket Buffers

For data intensive network communication the memory copy between kernel and user space can amount to up to 40% [1] of network processing overheads[1]. Various solutions have been proposed to mitigate this issue over the years. In particular, various implementations of hardware and software based zero-copy networking [2,3] has been investigated as has user-level networking [4, 5]. Recent attempts have also attempted at exporting the NIC DMA queue into user-space for the purposes of performance and flexibility [6]. However, all these solutions require extensive application and/or specialised hardware.

In this project you will attempt to mitigate the kernel-user memory copy issue by extending the sockets interface to allow applications to memory map the socket send and receive buffers into their process address space for the purposes of zero-copy communciation between the application and the kernel. In conjunction with a lightweight application-kernel notification mechanism it is our hypothesis that this architecture will reduce the memory copy overhead for a number of data intensive network applications.

  • [1] Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft RDMA to Boost Memcached
  • [2] Zero Copy Direct Protocol Over Infiniband - Preliminary Implementation and Performance Analysis - http://www.mellanox.com/pdf/whitepapers/SDP_Whitepaper.pdf
  • [3] PF_RING: http://www.ntop.org/products/pf_ring/
  • [4] http://www.openonload.org
  • [5] Atm and fast ethernet network interfaces for user-level communication
  • [6] Netmap - http://info.iet.unipi.it/~luigi/papers/20120503-netmap-atc12.pdf

Dr R. Sohan, James Snee

Part III / MPhil

Demincer: Making Meat from Mince

The concept of scientific reproducibility is becoming an increasingly important issue both in academia and industry [1]. One of the pivotal aspects of reproducibility is the ability to reuse existing results for the purposes of extension and/or comparison. However, given the standard method for result dissemination in today's scientific environment is that of (PDF) paper publication it would be very useful to have a tool that is able to extract information from existing papers so that it can be augmented and/or extended.

In essence we are advocating the creation of an extensible tool that is able to extract text, figures, tables and other domain specific information from published papers in a manner that facilitates their reuse. Such a tool would be very useful in lowering the entry barrier to reproducibility and is thus likely to encourage reproducible research.

Dr A. Rice, Dr R. Harle, Dr R. Sohan

Part III / MPhil

Elephant: Long memory for reproducible runs

The concept of scientific reproducibility is becoming an increasingly important issue both in academia and industry [1]. One of the pivotal aspects of reproducibility is the ability to reproduce the environment in which an experiment is run. However, the ad-hoc nature of today's scientific reporting process means most researchers only record the parameters deemed interesting meaning that those seeking to reproduce their results are reduced to either trying to replicate them from published results or contacting the authors directly.

In this project we are advocating the creation of "Elephant", an extensible reproducibility assurance tool. Elephant will capture the environment of a program (including hardware and software) such that the details of the run are recorded in a precise and complete manner. Futher to this, Elephant can then be used to validate the environment before a run is reproduced. Elephant is intended to be extensible so it can be augmented with domain specific reproducibility information.

Dr A. Rice, Dr R. Harle, Dr R. Sohan

Part III / MPhil

MRI-Based Diagnostic Prediction of Parkinson's Disease via Pattern Classification

This project is in conjunction with the MRC-CBU (Cognition and Brain Sciences Unit), who have diffusion MRI data for Parkinson's disease patients and aged matched controls and would like to investigate the use of machine learning to spot Parkinson's-related abnormalities. Diffusion MRI (dMRI) is an imaging technique that is able to produce quantitative maps of the microscopic natural displacements of water molecules that occur in brain tissues as the result of the natural diffusion process. In this technique, water molecules are used as a probe that can reveal microscopic details about the architecture of normal and diseased tissue. The project involves extracting features from images, implementing and comparing different machine learning algorithms and conducting statistical analyses of the results. There is significant opportunity to publish the work if successful.

Dr R Harle (DTG), Dr M. Correia (MRC-CBU)

Part III / MPhil

Static Binary Rewriting In a Dynamic Manner

This project involves statically rewriting binaries at load-time and during execution to optimise their execution over time and space. The basic premise is to write an in-vitrio, extensible binary rewriting engine that provides these capabilities. For various reasons I don't want to put more details out on this project online, but please contact me if you have an interest in programming languages, binary rewriting and program optimisation.

Dr R. Sohan

Part III / MPhil

Classifying DNS use

Given about 240M DNS message question/response pairs per day from two of Nominet's main nameservers that are responsible for the UK namespace, build a system that is able classify the requestors in near real time (that is, with as minimal data or limited time). Some of the heavy users are large ISP's resolvers, such as Google's DNS. Others are continuously scanning the namespace for changes for various reasons, ranging from speculating on the domain name market, botnets and spambots, regular uptime tests, etc, etc. Additionally, given the dataset and possible noSQL solutions like Hadoop/MapReduce, what other interesting aspects can be derived.

Dr R. Sohan

Part III / MPhil

Spotting distributed whois abuse

We occasionally see attempts to mine large amounts of data from our whois service. This may be seen as, say, a large number of requests from a single client. However we have anti-abuse limits in place to stop this, so an obvious tactic would be to use a distributed system where each client performs a small number of queries and so no individual limits are reached. It might be possible to see this happening and impose a single limit on "cooperating clients" (assuming it could be done in real time).

Dr R. Sohan

Part III / MPhil

Design a dedicated DNSSEC hardware security module (HSM)

Building a security module is hard. Outside interference may lead to compromised keys, or might trick the device in performing undesired operations. Lack of interference implies lack of an entropy source as well, hence the device needs a good internal source of entropy to generate keys. Design a device that is able to generate a key pair, guards the private keys, is able to generate signatures given the right credentials and is performant (at least 20 signatures a second).

Dr R. Sohan

Part III / MPhil

Next Generation Rate Limiting

DNS servers are often abused for amplified reflection attacks, where a third party receives large responses from a large nameserver due to incoming queries with a spoofed source address. Currently, some basic rate limiting is possible which, however, can easily be circumvented. We're interested in a taxonomy of possible solutions or novel ideas to defend networks against large scale amplified reflection attacks.

Dr R. Sohan