These are some suggestions for Part III and MPhil projects within the Digital Technology Group. If you have an idea related to our group's research interests that isn't mentioned here, get in touch.
Content Based Steering for Efficient Data Delivery in 10G NICs
The coarse grained coupling between the NIC and Endhost interface in current high speed NIC designs can result in lowered application performance. In particular, the mapping of NIC receive queues to CPUs can only be done at a very high (per connection) level. This makes it very hard to arrange the system so data is always received on the same CPU on which it is consumed resulting in higher application data processing latency (due to CPU cache misses) and lower throughput performance (due to CPU stalls). Furthermore, mapping a single connection to one or more CPUs can result in overloading of those CPUs (network processing overhead). While recent technologies such as Receive Side Scaling and FlowDirector try to mitigate this issue by spreading load over CPUs and trying to automatically deliver data to the "right" CPUs they are very simple mechanisms and therefore not very effective.
In this project you would be exploring the idea of "content based data steering". The core hypothesis of this work is that if the NIC possesses some simple application logic it can efficiently steer packets to the correct CPUs reducing latency and increasing performance. The idea is that you would examine the feasibility of the approach and design and implement the idea on the NetFPGA platform. Ideally the system would be based on a BPF-like language that allows applications to specify steering policies to the NIC coupled with minor application changes required to leverage the architecture.
Framework For High-Speed Endhost NIC Evaluation
Benchmarking high speed (10G) endhost NICs can be a complicated, error prone and tedious task. There can be significant differences in performance based on the hardware configuration, kernel, driver and software versions involved. Moreover, there are OS, application and device specific runes which often get overlooked or misconfigured impacting performance. Finally, most devices benchmarks are usually irreproducible as the entirety of the configuration is unknown.
In this project you will attempt to create an extensible open-source high-speed end host NIC evaluation framework that makes it simple and fast to obtain reproducible benchmarks over a wide range of 10G NICs and application types. You will then proceed to test it by benchmarking a number of production 10G NICs for the purposes of quantitative comparision.
Memory Mapped Socket Buffers
For data intensive network communication the memory copy between kernel and user space can amount to up to 40%  of network processing overheads. Various solutions have been proposed to mitigate this issue over the years. In particular, various implementations of hardware and software based zero-copy networking [2,3] has been investigated as has user-level networking [4, 5]. Recent attempts have also attempted at exporting the NIC DMA queue into user-space for the purposes of performance and flexibility . However, all these solutions require extensive application and/or specialised hardware.
In this project you will attempt to mitigate the kernel-user memory copy issue by extending the sockets interface to allow applications to memory map the socket send and receive buffers into their process address space for the purposes of zero-copy communciation between the application and the kernel. In conjunction with a lightweight application-kernel notification mechanism it is our hypothesis that this architecture will reduce the memory copy overhead for a number of data intensive network applications.
 Wimpy Nodes with 10GbE: Leveraging One-Sided Operations in Soft RDMA to Boost Memcached
 Zero Copy Direct Protocol Over Infiniband - Preliminary Implementation and Performance Analysis
 PF_RING: http://www.ntop.org/products/pf_ring/
 Atm and fast ethernet network interfaces for user-level communication
 Netmap - http://info.iet.unipi.it/~luigi/papers/20120503-netmap-atc12.pdf
Demincer: Making Meat from Mince
The concept of scientific reproducibility is becoming an increasingly important issue both in academia and industry . One of the pivotal aspects of reproducibility is the ability to reuse existing results for the purposes of extension and/or comparison. However, given the standard method for result dissemination in today's scientific environment is that of (PDF) paper publication it would be very useful to have a tool that is able to extract information from existing papers so that it can be augmented and/or extended.
In essence we are advocating the creation of an extensible tool that is able to extract text, figures, tables and other domain specific information from published papers in a manner that facilitates their reuse. Such a tool would be very useful in lowering the entry barrier to reproducibility and is thus likely to encourage reproducible research.
Elephant: Long memory for reproducible runs
The concept of scientific reproducibility is becoming an increasingly important issue both in academia and industry . One of the pivotal aspects of reproducibility is the ability to reproduce the environment in which an experiment is run. However, the ad-hoc nature of today's scientific reporting process means most researchers only record the parameters deemed interesting meaning that those seeking to reproduce their results are reduced to either trying to replicate them from published results or contacting the authors directly.
In this project we are advocating the creation of "Elephant", an extensible reproducibility assurance tool. Elephant will capture the environment of a program (including hardware and software) such that the details of the run are recorded in a precise and complete manner. Futher to this, Elephant can then be used to validate the environment before a run is reproduced. Elephant is intended to be extensible so it can be augmented with domain specific reproducibility information.
Static Binary Rewriting In a Dynamic Manner
This project involves statically rewriting binaries at load-time and during execution to optimise their execution over time and space. The basic premise is to write an in-vitrio, extensible binary rewriting engine that provides these capabilities. Please contact me if you have an interest in programming languages, binary rewriting and program optimisation for more information.
Haystacks from Needles: Want to write the google maps of provenance?
Data provenance (the science of ascertaining where data has come from, how it was produced and under what conditions) systems are becoming ever-more popular both in research and industry and are becoming more functional and performant.
We have built a system for provenance collection that is on all the time and that has very low overhead. It works by observing processes as they execute, recording their interaction with the c library and kernel and correlating this I/O with changes in the filesystem state. It works well, the problem is, it produces a lot of data (in the order of 10's of MBs a second) in the form of a directed graph.
While this provenance data is useful for us as computer scientists, it's completely overwhelming to end users looking to use the data to answer legitimate questions about the state of their data (e.g. Where did this file come from? which process wrote bytes 7--12 of this file? Why is this file in an unexpected state? Who modified this file behind my back?).
This problem is further complicated by the fact that non computer-scientists usually only have a vague notion that things have gone wrong but don't really know how to fix things or even where to start looking. Clearly a natural language tool or graphical user interface to this data maybe useful.
In this project, you will write a tool to make sense of this graph data. We propose exploring ways to view this data where you do and do not know what you're looking for [An example of an existing approach to this problem is this paper].
Possible approaches you could take are:
- Continuing the idea of summarised graphs (Google maps for provenance)
- Expert style systems that filter based on continuous questioning
- Natural language approaches to data querying
- FPS style immersion in the data
- Oculus style VR based provenance "worlds"
- Google glass based provenance overlays. (Subject to hardware availability)
In this work you have the opportunity of creating a real-world tool useful to many people and contributing to a new and topical research area. There may also be further research oppertunities in the FRESCO team for students that do particularly well.
Students interested in this work should have strong practical skills and an interest in efficient and pragmatic data analysis and retrieval. Students interestd in a graphical approach to the problem should have a decent CG background.
DNS Analysis, Optimisation And Security
Classifying DNS use
Given about 240M DNS message question/response pairs per day from two of Nominet's main nameservers that are responsible for the UK namespace, build a system that is able classify the requestors in near real time (that is, with as minimal data or limited time). Some of the heavy users are large ISP's resolvers, such as Google's DNS. Others are continuously scanning the namespace for changes for various reasons, ranging from speculating on the domain name market, botnets and spambots, regular uptime tests, etc, etc. Additionally, given the dataset and possible noSQL solutions like Hadoop/MapReduce, what other interesting aspects can be derived.
Spotting distributed whois abuse
We occasionally see attempts to mine large amounts of data from our whois service. This may be seen as, say, a large number of requests from a single client. However we have anti-abuse limits in place to stop this, so an obvious tactic would be to use a distributed system where each client performs a small number of queries and so no individual limits are reached. It might be possible to see this happening and impose a single limit on "cooperating clients" (assuming it could be done in real time).
Design a dedicated DNSSEC hardware security module (HSM)
Building a security module is hard. Outside interference may lead to compromised keys, or might trick the device in performing undesired operations. Lack of interference implies lack of an entropy source as well, hence the device needs a good internal source of entropy to generate keys. Design a device that is able to generate a key pair, guards the private keys, is able to generate signatures given the right credentials and is performant (at least 20 signatures a second).
Next Generation Rate Limiting
DNS servers are often abused for amplified reflection attacks, where a third party receives large responses from a large nameserver due to incoming queries with a spoofed source address. Currently, some basic rate limiting is possible which, however, can easily be circumvented. We're interested in a taxonomy of possible solutions or novel ideas to defend networks against large scale amplified reflection attacks.
Indoor Smartphone and Person Tracking
Bluetooth Low Energy Phone Tracking
This project will look at positioning mobile phones using BLE beacons distributed around the environment. This is a topic in industry at the moment but the beacons are expensive and the phone platforms have only just begun to support BLE properly. We will create beacons using raspberry pis and cheap bluetooth dangles. Unknowns include what power to beacon at, what update rate can be expected, what the continuous scan costs on the phone, whether we can infer good distance estimates and how easy it will be to spoof beacons and thereby cause havoc! The project has a high chance of international publication and further PhD work. Programming for android and/or iOS needed, along with good Linux skills. Previous knowledge of Bluetooth (in any form) is valuable but not essential.
Bluetooth Low Energy Sensor Network
This project will invert the typical Bluetooth tracking scenario by using BLE beacons on the person rather than in the environment building a sensor network of Raspberry Pi BLE sensors. Beacons are small and last years so each person could conceivably carry multiple to aid positioning and to minimise body attenuation. The research aim will be to establish the capabilities of such a system in terms of range, accuracy, power consumption, update rate and maximum beacon numbers. The project has a high chance of international publication and further PhD work. Experience working with Raspberry Pis or Linux environments essential. Previous knowledge of Bluetooth (in any form) is valuable but not essential.
Smartphone Camera-based Movement Classification
A key problem in Pedestrian Dead Reckoning is determining the direction of motion. It is hard to distincuish between back steps, side steps and forward steps. This project will look at repurposing smartphone cameras to estimate relative movement direction based on feature tracking applied to the ceiling or floor. Many optical flow-like algorithms exist that can be trialled. The important result is not just that the direction is correct, but also that the drain on the smartphone battery is minimised. This project will be carried out using the Android platform: some experience of programming for it will be necessary. The optical algorithms may be imported from e.g. openCV (which has an Android port), or written from scratch if preferable.
The dominant indoor positioning approach is the use of WiFi fingerprints. However, these are typically unable to unambiguously locate to a room since WiFi penetrates walls. We have in the past developed an InfraRed based location system (the Active Badge) that had people wear IR emitters and used IR receivers in each room. IR was very good at room localisation (IR does not penetrate walls) but we could not realistically install receivers everywhere. This project will look at exploiting the rising number of IR transmitters on recent smartphones (e.g. Galaxy S4) and simple networked IR receivers (built from e.g. Raspberry Pis or similar) to create a modern-day Active Badge that is fused with WiFi positioning data to create a more robust and ubiquitous tracking system. Experience of Android programming essential.
Programming language research to support physical science researchers
The Computer Lab is currently running a research project to apply programming language research to support programming in the sciences, via tools and languages (a slightly longer synopsis can be found here). As part of this project, we are investigating augmenting code with specifications to aid verification, program comprehension and construction, and improve bug analysis.
Stencil access specifications for verifying numerical code
Stencil computations are array based transformations, where each element of an output array, at position i, is computed from a finite set of neighbouring elements at position i in some input array(s) (e.g., convolution, the Game of Life). Some stencils are complicated, detailed, and dense (see for example, this stencil computation in a Navier-Stokes fluid simulator) where errors can be easily introduced by accidentally permuting indices, offsets, arrays, and even omitting particular indices.
The goal of this project is to design and implement a language of abstract stencil specifications, which can be attached to an existing general-purpose language, e.g. Fortran. These specifications will provide a guide to the programmer and a verification technique for the compiler.
For more examples of why this might be useful and how it might work see here
Smart phone usage and energy consumption
Energy consumption of web-service APIs
It is common for smartphone apps to make requests to server-side APIs either to download information or to post notifications. Commonly this is done using XMLRPC over HTTP. However, it could well be expected that this carries a considerable energy overhead due to use of a TCP connection, the addition of HTTP headers and the text-based encoding of information. This project seeks to measure the potential energy savings of different options such as more efficient encodings (e.g. Google's Protobuf) and the use of UDP.
Energy measurement hardware is available as are android phones for testing and equipment for building a controlled wifi testbed.
Interested students will need to demonstrate good programming ability and application development along with a good understanding of TCP, UDP, IP, Wifi and Cellular networking.
There are a variety of benchmarking tools available for Android which can produce a performance score for a particular handset. However, these tests do not really reflect the needs of actual phone users. The idea of this project is to use the Device Analyzer dataset to come up with better benchmarks.
The project will need to survey the various properties that current benchmarks are attempting to measure. Data analysis from Device Analyzer can then be used to work out how many users would be interested in these measurements and to see if there are more important properties to measure. We have a variety of phone handsets which can then be used for testing to see how they perform with the new designs.
Interested students will need to demonstrate good programming ability and have an interest in systems measurement. It is expected that data analysis will require some sort of distributed processing system such as Hadoop running on the DTG cluster.