Part III & MPhil Project Suggestions

These are some suggestions for Part III and MPhil projects within the Digital Technology Group. If you have an idea related to our group's research interests that isn't mentioned here, get in touch.

Systems projects

  1. Characterising asynchronous behavior in the Linux kernel

    Modern operating system kernels have moved towards a model where the preferred way of executing operations is in an nonblocking/asynchronous manner. Many applications are using those asynchronous features (for example from the storage/network stacks of the Linux kernel) to achieve better scalability properties.

    However, this also means that irrespective of application constraints, the exact timing of I/O operations now depends on asynchronous operation schedulers in the kernel. Those schedulers (for example, the IO scheduler) are responsible for managing the system-wide multiplexing of resources. Because of this, it becomes difficult to predict and understand how different applications will interact with each other and whether the workloads they service are synergistic or antagonistic.

    In this project, you will look at quantifying the side effects of applications executing asynchronous operations on other applications running on the same physical machine. The purpose is to understand the variance introduced in the latency/response times by interacting workloads and to discover optimisation opportunities (either in applications themselves or in the scheduling of operations).

    In order to achieve your goal, you will:

    1. Use a measurement infrastructure called Resourceful and extend it to record asynchronous behavior.
    2. Instrument a number of applications to expose their use of asynchronous operations.
    3. Run experiments to understand the interactions/variations in response times of those applications when they run concurrently on the same machine.

    If you complete this first stage of the project, we will look at extending the experiments for Linux containers.

    Interested students should have basic Operating Systems knowledge. Experience with the Linux kernel is advantageous as is programming experience in C.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  2. Soroban

    We have internally created a machine-learning technique for understanding performance interference. Currently we only use this to infer the performance overhead of virtualisation on a highly-contended machine. This project would pick another area where there is performance interference and apply our machine-learning approach to comprehend it.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  3. Resourceful For the HPCS Clan

    High performance computing workloads multiplex jobs between a cluster of machines. When they do so there is often performance interference between badly interacting workloads. This project would be a collaboration with the university’s high performance computing service to find how to combine some existing Linux mechanisms (eg cgroups) with an internal tool for measuring fine-grained resource consumption in order to limit bad interactions.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  4. Evaluating and improving low-level kernel probing mechanisms

    System measurement tools such as DTrace or SystemTap make use of low-level probing mechanisms for inserting small pieces of code in the normal kernel execution flow, at runtime. Those mechanisms, together with the infrastructure for inserting/activating/deactivating and running the inserted code have a significant impact on the efficiency and side-effects that the higher-level measurement tools introduce.

    In this project, you will be looking at comparing the mechanisms employed by DTrace, SystemTap (kprobes) and a new kind of probes developed internally in our group (kamprobes). The purpose is to further optimise the probes and to understand their execution and overheads on different computing architectures. Even simple improvements in those low-level mechanisms will generate opportunities for significantly advancing in-production system monitoring and performance diagnosis.

    Interested students should have basic Operating Systems knowledge. Experience with the Linux kernel, programming in C and ASM is advantageous.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  5. Improving DTrace reliability

    While DTrace is one of the leading tools for doing system-level measurement and introspection, it has been built with a focus on 'use as the exception': you would turn probes on when something goes wrong or can not be explained, solve the issue and then disable the probes. However, a number of different usecases (in security, provenance recording and in automatic root cause analysis) require probing to be extensive and "always on". In such a scenario, DTrace prioritises kernel liveliness over correctness, by dropping events from the recorded traces

    In this project, you will be looking at exploring the reverse tradeoff: maintaining correctness even by slowing down applications or other kernel tasks. You will first identify the cause of DTrace bottlenecks (limited buffer size, not scheduling the process that reads from the trace buffers, too many probes being fired) and then proceed to change DTrace in order to elliminate the dropping of events.

    You will explore the following strategies:

    1. Moving DTrace to per-application/per-cpu buffers instead of shared per-cpu buffers
    2. Keeping buffer highmarks in order to detect high event rates produced by certain applications
    3. Throttling applications that produce too many events by scheduling them less often (scheduler changes required)

    In case the primary goal of the project is achieved, opportunities exist for extending it to more subtle strategies for avoiding dropped events. For example, one can explore artificially prolonging the duration of system calls for offending processes.

    Interested students should have basic Operating Systems knowledge. Experience with the Linux/FreeBSD kernel is advantageous as is programming experience in C.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  6. Fast I/O Data Paths using eBPF

    Today, doing complex processing for network packets or general I/O streams requires traversing the full network or I/O stack (we exclude kernel-bypass mechanisms from the discussion). However, with the implementation of eBPF, an opportunity exists for applications to insert pieces of code in the kernel, at runtime and in a safe manner. This means at least a part of the complex logic for filtering/forwarding and load balancing or caching can be pushed towards the lowest points in the software stack.

    This project will be looking at implementing a programmable, high performance I/O data path by using eBPF programs. Depending on the interest of students, the focus can be placed on:

    • Programmable network data paths for mitigating DDOS attacks
    • Fine grained application hints for I/O caching and resource usage
    • Custom, application controlled I/O coalescing or persistence

    Interested students should have basic Operating Systems knowledge. Experience with the Linux kernel is advantageous as is programming experience in C.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  7. Characterizing the execution of interactive workloads on wimpy-core (Calxeda) and Tile (Tilera) architectures

    The current server/data center space is dominated by traditional x86-64 architectures. However, the drive for improved efficiency and low energy consumption has created the space for alternative architectures to exist. Calxeda (ARM) and Tilera are two such architectures, with hardware available publicly (and accessible within the DTG).

    Because of their novelty, the performance of running existing applications and the opportunities for optimising them for those architectures are not fully known.

    You will have the opportunity of choosing one of the architectures and characterizing the execution of server applications on top of it, in direct comparison to x86-64. We will try to understand things like:

    • The performance of the network and I/O stacks in comparison to x86-64.
    • The deployment of kernel-level measurement tools (perf / kprobes) or our custom probes implementation (kamprobes) to understand practical architectural differences.
    • Optimal scheduling/placement of interrupts, I/O and CPU tasks.
    • What is the scope of deploying virtualisation (Xen, containers) on top of those architectures.

    Well executed, this project will result in a top-tier publication.

    Contact:Dr R. Sohan

  8. Linux Kernel specialisation for advanced, virtualisation-assisted sandboxing

    We propose a new way of enforcing the sandboxing of Linux applications based on a primitive we have developed, called shadow kernels. In order to deny access to particular kernel functionalities for a given application, one can present that application with a kernel image in which the memory pages containing the restricted features are zeroed-out.

    We already have implemented the basic mechanisms for creating different kernel text sections and switching to them, under the control of the hypervisor (Xen).

    You will need to use this primitive to implement sandboxing and show that even given exploitable code (NULL pointer references), the application still can't access restricted features.

    Well executed, this project will result in a top-tier publication.

    Contact: Dr R. Sohan

  9. Compartmentalising applications using SGX and deprivileged OS services

    Applications can become quite complex consisting of numerous components in the form of libraries and modules. The OS treats each application as a single executing entity and grants all components in the application the same privileges and access to the same set of resources. This leads to the problem where a security vulnerability in any one component can affect all other components, compromising the entire application. Thus it is desirable to isolate and compartmentalise individual components from each other and only grant each component the privileges it requires to operate.

    Recent work (SOAAP) has shown that compartmentalisation can be implemented with the help of source code annotations combined with a custom compiler toolset to restrict the privileges (system calls) granted to individual components within applications and carefully controlling the communication between component boundaries. This approach however does not enforce strict memory isolation as the entire application's address space is still addressable from any code in the process.

    With modern hardware such as Intel SGX (ref) it is possible to isolate specific parts of an application's memory address space from the rest of the process including the OS by using the concept of memory enclaves. However, a major drawback of using SGX enclaves is that all system calls are prohibited from within an enclave region. Thus if a component in an application is provisioned within an enclave it is effectively cut off from the system. SGX also restricts ring-0 code from executing within an enclave, thus it is not possible in the conventional model to execute the OS within an enclave.

    In this project we endeavour to implement application compartmentalisation using SGX by de-privileging various OS services and running them in ring-3 mode within their own SGX enclaves, enabling user-space applications (in SGX enclaves) to invoke/link with required/allowed services directly. This can guarantee both memory isolation as well as restricted access to system resources for components within applications.

    Contact: Dr R. Sohan

  10. Fine-grained lineage to Apache Spark

    In-memory processing frameworks such as Apache Spark are increasingly being adopted in industry due to their good performance for many applications. We plan to add fine-grained provenance support for Spark. In fact, Spark uses coarse-grained lineage (instead of data replication) to achieve fault tolerance: by recomputing a lost data partition. However Spark is not able to capture precise relationships between input and output as (1) lineage is coarse-grained and (2) stateful data flow is not tracked. This project will augment Spark to capture fine-grained lineage that can be leveraged effectively for data audit and debugging use cases.

    Well executed, this project will result in a top-tier publication.

    Contact: Dr R. Sohan

  11. Deep packet-level inspection using Hadoop

    Inspecting network packets are of important use in many applications such as root-cause analysis and intrusion detection. In order for these applications to scale with current data volumes, the task of packet-level inspection has to leverage distributed "big data" processing frameworks. In this project, we plan to investigate how to build deep packet inspection tools on top of Hadoop. These tools will enable "realtime" analysis of network traffic without requiring costly/specialised hardware.

    Well executed, this project will result in a top-tier publication.

    Contact: Dr R. Sohan

Indoor Smartphone and Person Tracking

  1. Bluetooth Low Energy Phone Tracking

    This project will look at positioning mobile phones using BLE beacons distributed around the environment. This is a topic in industry at the moment but the beacons are expensive and the phone platforms have only just begun to support BLE properly. We will create beacons using raspberry pis and cheap bluetooth dangles. Unknowns include what power to beacon at, what update rate can be expected, what the continuous scan costs on the phone, whether we can infer good distance estimates and how easy it will be to spoof beacons and thereby cause havoc! The project has a high chance of international publication and further PhD work. Programming for android and/or iOS needed, along with good Linux skills. Previous knowledge of Bluetooth (in any form) is valuable but not essential.

    Contact: R. Harle

  2. Bluetooth Low Energy Sensor Network

    This project will invert the typical Bluetooth tracking scenario by using BLE beacons on the person rather than in the environment building a sensor network of Raspberry Pi BLE sensors. Beacons are small and last years so each person could conceivably carry multiple to aid positioning and to minimise body attenuation. The research aim will be to establish the capabilities of such a system in terms of range, accuracy, power consumption, update rate and maximum beacon numbers. The project has a high chance of international publication and further PhD work. Experience working with Raspberry Pis or Linux environments essential. Previous knowledge of Bluetooth (in any form) is valuable but not essential.

    Contact: R. Harle

  3. Smartphone Camera-based Movement Classification

    A key problem in Pedestrian Dead Reckoning is determining the direction of motion. It is hard to distincuish between back steps, side steps and forward steps. This project will look at repurposing smartphone cameras to estimate relative movement direction based on feature tracking applied to the ceiling or floor. Many optical flow-like algorithms exist that can be trialled. The important result is not just that the direction is correct, but also that the drain on the smartphone battery is minimised. This project will be carried out using the Android platform: some experience of programming for it will be necessary. The optical algorithms may be imported from e.g. openCV (which has an Android port), or written from scratch if preferable.

    Contact: R. Harle

  4. WiFi-IR Positioning

    The dominant indoor positioning approach is the use of WiFi fingerprints. However, these are typically unable to unambiguously locate to a room since WiFi penetrates walls. We have in the past developed an InfraRed based location system (the Active Badge) that had people wear IR emitters and used IR receivers in each room. IR was very good at room localisation (IR does not penetrate walls) but we could not realistically install receivers everywhere. This project will look at exploiting the rising number of IR transmitters on recent smartphones (e.g. Galaxy S4) and simple networked IR receivers (built from e.g. Raspberry Pis or similar) to create a modern-day Active Badge that is fused with WiFi positioning data to create a more robust and ubiquitous tracking system. Experience of Android programming essential.

    Contact: R. Harle

Programming language research to support physical science researchers

The Computer Lab is currently running a research project to apply programming language research to support programming in the sciences, via tools and languages (a slightly longer synopsis can be found here). As part of this project, we are investigating augmenting code with specifications to aid verification, program comprehension and construction, and improve bug analysis.

  1. Language agnostic analysis of programming patterns

    Kythe is a project originating from Google that tries to unite software support for programming across many programming languages and environments. This project involves using Kythe to build language agnostic analyses of programming patterns. We'd like to be able to anlayse code in a variety of different languages to see what similarities we can find.

    For more details on the proposal see here

    Contact: Andrew Rice

  2. Stencil access specifications for verifying numerical code

    Stencil computations are array based transformations, where each element of an output array, at position i, is computed from a finite set of neighbouring elements at position i in some input array(s) (e.g., convolution, the Game of Life). Some stencils are complicated, detailed, and dense (see for example, this stencil computation in a Navier-Stokes fluid simulator) where errors can be easily introduced by accidentally permuting indices, offsets, arrays, and even omitting particular indices.

    The goal of this project is to design and implement a language of abstract stencil specifications, which can be attached to an existing general-purpose language, e.g. Fortran. These specifications will provide a guide to the programmer and a verification technique for the compiler.

    For more examples of why this might be useful and how it might work see here

    Contact: Dominic Orchard

Smart phone usage and energy consumption

  1. Energy consumption of web-service APIs

    It is common for smartphone apps to make requests to server-side APIs either to download information or to post notifications. Commonly this is done using XMLRPC over HTTP. However, it could well be expected that this carries a considerable energy overhead due to use of a TCP connection, the addition of HTTP headers and the text-based encoding of information. This project seeks to measure the potential energy savings of different options such as more efficient encodings (e.g. Google's Protobuf) and the use of UDP.

    Energy measurement hardware is available as are android phones for testing and equipment for building a controlled wifi testbed.

    Interested students will need to demonstrate good programming ability and application development along with a good understanding of TCP, UDP, IP, Wifi and Cellular networking.

    Contact: Andrew Rice

  2. Reality-based benchmarks

    There are a variety of benchmarking tools available for Android which can produce a performance score for a particular handset. However, these tests do not really reflect the needs of actual phone users. The idea of this project is to use the Device Analyzer dataset to come up with better benchmarks.

    The project will need to survey the various properties that current benchmarks are attempting to measure. Data analysis from Device Analyzer can then be used to work out how many users would be interested in these measurements and to see if there are more important properties to measure. We have a variety of phone handsets which can then be used for testing to see how they perform with the new designs.

    Interested students will need to demonstrate good programming ability and have an interest in systems measurement. It is expected that data analysis will require some sort of distributed processing system such as Hadoop running on the DTG cluster.

    Contact: Andrew Rice