Department of Computer Science and Technology

Systems Research Group – NetOS

Student Projects (2017—2018)

NetOS

This page collects together various Part II project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address given; please get in touch if you want more information about the project.

Under construction: Please keep checking back, as more ideas will hopefully be added to this page during the coming weeks.

Note: there are a number of stumbling blocks in Part II project selection, proposal generation and execution. Some useful guidance from a CST alumnus (and current post-doc) is here.

Current project suggestions

1. TripIt! for Papers

Contact: Richard Mortier (email)

Recently introduced rules from the UK Research Councils require that all published academic outputs are made available as open access within 3 months of publication, typically via institutional repositories. Unfortunately most institutional repository submission processes are rather cumbersome, involve extensive human interaction, and are prone to being forgotten or delayed potentially rendering an academic output inadmissible for the next Research Evaluation Framework exercise.

TripIt is a rather useful service that allows a traveller to manage travel plans by forwarding email confirmations, tickets, receipts, etc to an email address ([Javascript required]). Upon receipt the TripIt service aggregates and parses the information concerned into a sequence of "trips" which can then be exported as (e.g.) a Google Calendar. This makes it quite straightforward to have trip details appear in one's calendar without needing to go through the tedious and error-prone process of re-entering all details manually.

This project will design and build a service that provides TripIt-like interaction for tracking academic outputs. Elaborating details of the workflow forms part of the project, with care needing to be paid to the requirements for REF eligibility as well as common academic working practices for conference and journal publication. The service can be implemented using any appropriate tools, though implementation as a MirageOS unikernel using the Irmin storage backend would be particularly welcome.


2. MirageOS Protocol Servers

Contact: Richard Mortier

MirageOS is a unikernel framework using the OCaml language. Among its primary targets are building small, lightweight cloud-hosted network services. This project will design and build such a service, evaluating for performance on a number of axes. Suitable services include the XMPP messaging standard, and the BGP v4 routing protocol.

Code for MirageOS must be written in OCaml, so familiarity with OCaml would be useful. (But projects of this type have been successfully undertaken without prior knowledge of OCaml.)


3. OCaml meets WebKit

Contact: Richard Mortier (email)

WebKit2 is an incompatible API layer for WebKit, the web content processing library from Apple, supporting OSX, iOS and Linux. It implements a split process model where web content lives in a separate process from the application UI. PhantomJS is a headless wrapper for WebKit, scriptable via JavaScript used for screen capture, page automation, site testing, etc., and supporting CasperJs as a higher-level API wrapper.

Web automation scripts in JavaScript for CasperJS can rapidly become rather complex -- it would be nice to have a more modern, feature rich language to do this, such as OCaml. One way forward would be to use the Ctypes library which enables binding C libraries using pure OCaml.

This project is complex, and you will benefit from having experience with at least one of OCaml or WebKit, as well as familiarity with C programming.


4. Niche Social Networks

Contact: Richard Mortier (email)

Social networks such as Facebook support a wide range of interactions and purposes. However, there are times where it is not appropriate to push smaller scale, more niche social groups onto such generic platforms while it would still be nice to take adavantage of some of their features.

This project will build a simple social-network-as-a-library that can provide the standard features of a social network (pseudo-identity, tracking followed/following relations) while interfacing with a range of other services such as email. A particular demonstrator application will also be built that allows members of a group (e.g., a sports society, a College fellowship) to express interest in a subset of other members' behaviours, and be notified when those members perform a certain action (e.g., sign up to attend a regular social event). A number of different notification channels could be integrated (e.g., email, SMS, telephone).

Ideally this will be built as one or more microservices using the MirageOS unikernel framework, but other implementation platforms can be considered.


5. Databox

Contact: Richard Mortier (email)

Databox is an active research project developing means to enable individuals and organisations to mediate access to data they consider personal, or private. The platform supports access to local external resources such as IoT devices as well as processing of data locally, potentially in coordinated fashion with processing of data on other Databoxes.

This proposal is really a placeholder for your idea-- what personal data processing application are you interested in building? What device are you interested in interfacing to the Databox? Let's talk...

Databox allows code to be produced in any language, but familiarity with Docker and containerisation may be helpful.


6. Computation Engine for Distributed Data Analytics

Contact: Liang Wang (email)

Owl is an emerging numerical platform developed in the functional programming language OCaml. The aim is to build a flexible and full-featured numerical library for modern data analytical applications. With Owl, the analytical applications can be written as concise as Numpy and Julia but runs as fast as C, with additional features such as static type-checking and etc.

Owl has its own parallel and distributed computation engine which is able to transform both low-level data structures and high-level neural network models into distributed computation objects. This project will focus on further developing and evaluating Owl's current computation engine. If you are interested in building modern distributed data analytical frameworks, please contact me.

References:
[1] Owl's repository: https://github.com/ryanrhymes/owl
[2] Owl - An OCaml Numerical Library, OCaml Workshop, Sept 2017, Oxford (TBA)

Pre-requisites: This project requires some basic knowledge on OCaml, synchronisation control, distributed systems, data processing frameworks.


7. Data Processing and Visualisation

Contact: Liang Wang (email)

For most data analysts and scientists, their daily job deals with data processing and visualisation. Efficient (pre-)processing algorithms and effective visualisation techniques together lay a solid foundation for all the modern data analytical platforms.

This project uses Owl as its underlying numerical platform and focusses on developing practical algorithms to handle various data sets. The goal is to provide an efficient and elegant data abstraction layer to other components in Owl.

Another focus is to further develop data visualisation component in Owl. The algorithms of interest range from the basic plots used in classic statistical analysis such as qqplot to the state-of-the-art visualisation techniques such as t-SNE to visualise high-dimensional data. If you are interested in data processing and visualisation, please contact me.

Pre-requisites: This project requires some basic knowledge on OCaml and statistical analysis.


8. Semantic Search and Recommendation Engine Design

Contact: Liang Wang (email)

The Internet is overloading its users with excessive information flows, so that effective content-based filtering becomes crucial in improving user experience and work efficiency. Latent semantic analysis has long been demonstrated as a promising information retrieval technique to search for relevant articles from large text corpora.

We built Kvasir, a semantic recommendation system, on top of latent semantic analysis and other state-of-art technologies to seamlessly integrate an automated and proactive content provision service into web browsing. This project will let you build and try out different language models and learn to integrated them into a practical recommendation system.

Alternatively, in case you are interested in the frontend technologies, Kvasir project also allows you to build different frontends atop its core engine, for example browser plugin, service bots, and etc. If you are interested in, please visit the project website to find out more.

References:
[1] Project website: Kvasir - Semantic Search
[2] Wang, Liang, et al. "Kvasir: Scalable provision of semantically relevant web content on big data framework." IEEE Transactions on Big Data 2.3 (2016): 219-233.
[3] Hyvönen, Ville, et al. "Fast nearest neighbor search through sparse random projections and voting." Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016.

Pre-requisites: The project requires some basic knowledge on information retrieval, natural language processing, and random projection.


9. Parallel Algorithmic Differentiation (on GPU)

Contact: Liang Wang (email)

This project focusses on the Algorithmic Differentiation (aka Automatic Differentiation). It is a general case of the well-known backpropagation algorithm and also an indispensable component of all the state-of-the-art data analytical platforms.

The neural network module in Owl is built completely atop of algorithmic differentiation. In this project, we would like you to investigate how to parallelise the computation for a given computation graph in order to optimise the training efficiency for deep neural networks.

Meanwhile, you can also chose to work on optimising Owl’s underlying numerical functions to speed up the calculation. In this direction, we would like you to optimise the vectorised mathematical functions using specific instruction sets on various architectures, e.g, Streaming SIMD Extensions (SSE) on Intel. Alternatively, you can also choose to work on writing GPU kernels for various functions to enhance the overall performance of training and inference.

References:
[1] Owl's repository: https://github.com/ryanrhymes/owl
[2] Owl - An OCaml Numerical Library, OCaml Workshop, Sept 2017, Oxford (TBA)

Pre-requisites: The project requires some basic knowledge on machine learning, neural networks, and OCaml.


10. Power Analysis of Rack-Scale Systems

Contact: Andrew Moore, Noa Zilberman (email)

Rack-scale computing is an emerging technology in networked systems, replacing the server as the basic building block in enterprises and data centres ICT infrastructure. Rack-scale computing provides superior performance compared with rack enclosures fitted with stand-alone servers, providing scalable high performance networked systems. Power efficiency is of utmost importance to rack-scale computing: the power budget of the system is bounded by rack enclosure (typically 10kW-20kW), thus any increase in performance must still retain the same system power consumption. We are building an apparatus for the evaluation of rack-scale systems implementation at scale. This project will focus on the instrumentation of the system for power consumption measurement and analysis, mainly through the instrumentation existing on the NetFPGA SUME platform.

References:
[1] NetFPGA
[2] Noa Zilberman, Yury Audzevich, Adam Covington, Andrew W. Moore. NetFPGA SUME: Toward Research Commodity 100Gb/s, IEEE Micro, vol.34, no.5, pp.32,41, Sept.-Oct. 2014
[3] Rack-scale Computing (Dagstuhl Seminar 15421)

Pre-requisites: This project requires basic knowledge of Verilog.


11. Rapid Prototyping of Network Services in C#

Contact: Andrew Moore, Noa Zilberman (email)

Due to their performance and flexibility, FPGAs are an attractive platform for the execution of network functions. Making FPGA programming accessible to a large audience of developers, however, has been a challenge for a long time. The Emu framework describes a new standard library for an FPGA hardware compiler that enables developers to rapidly create and deploy network functionality. Emu allows for high-performance designs without being bound to particular packet processing paradigms. Furthermore, it supports running the same programs on CPUs, in Mininet, and on FPGAs.
The goal of this project will be to build a large library of functions for Emu and to provide an easy to use software evaluation environment for users.

References:
[1] Nik Sultana, Salvator Galea, David Greaves, Marcin Wojcik, Jonny Shipton, Richard Clegg, Luo Mai, Pietro Bressana, Robert Soule, Richard Mortier, Paolo Costa, Peter Pietzuch, Jon Crowcroft, Andrew W Moore, Noa Zilberman, Emu: Rapid Prototyping of Networking Services, Usenix Annual Technical Conference (ATC), July 2017.

Pre-requisites: This project requires basic knowledge of computer networks and C#.


12. Visualising the Network Profiles of Datacentre Applications

Contact: Andrew Moore, Noa Zilberman (email)

Running an application in a datacentre, we have one important goal: performance. While improving the performance of datacentre applications is the focus of many works, understanding dynamic networking effects on application performance is still in its infancy. Network Profiles is a methodology for characterizing application's performance from a network perspective, using a network appliance developed by our team.
The goal of this project is to provide a control and management user interface to the orchestration system running the applications and network appliance, and to visualise the network profiles generated by the appliance.

Pre-requisites: This project requires basic knowledge of computer networks.


13. The Networking of Data Science

Contact: Andrew Moore, Noa Zilberman (email)

Data science has become part of our everyday life, even if we are not aware of it. Networked applications that process huge amounts of data run in the cloud and affect not only social networks and online shopping, but also finance, security and science. Due to the nature of data science applications, they usually run within data centres, and the knowledge of these application's behavior is limited to data centre operators. As data centre operators keep their data confidential, very little information was published (e.g. [1],[2]). In the lack of such ground truth, academic research is limited in its ability to develop novel system and networking solutions that fit the age of data science. This project aims to create network profiles for different data science applications: from common key-value store to academic big-data projects (e.g. SKA - the square kilometer array), gathering ground truth data from application running in a local data centre. The outputs of this project will be used to model and assess new data centre architectures, directing future designs.

References:
[1] Theophilus Benson, Aditya Akella and David Maltz. Network Traffic Characteristics of Data Centers in the Wild. Proceedings of the Internet Measurement Conference (IMC), 2010.
[2] Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. SIGMETRICS 2012.


Other proposers of systems-related projects:

14. Rusty Chain: mobile distributed ledger

Contact: Jon Crowcroft (email) (Marco Caballero Gutierrez, supporting PhD)

The idea for the project would be to develop (in Rust[1]) a distributed ledger [2] (proof-of-stake based) framework to target mobile devices (ARM-based[4]) that are power-constrained. Interesting things to measure would be:


1 System-load footprint (how expensive to run is the app?).
2 Power consumption (does it kill the battery life?).
3 Throughput (how many transactions per/minute can be processed). This is actually a well-measured metric on Bitcoin, Ethereum, etc, so we have something to compare against.

These 3 questions would be sufficient for a Part III/Mphil project I believe. However, further interesting things to explore would be:
If the distributed ledger is run in an ad-hoc network[3], the ledger would be partitioned very often due to the asynchrony of the transport and nodes entering and leaving. Given this *very partitioned* ledger, what effect does said partitioning has on system-load, power consumption and throughput.

A quick google search shows a bunch of results about mobile applications using distributed ledgers, but not running the distributed ledger protocol itself (so, acting as clients). It’s actually a bit hard to dig since a google search reveals millions of recent results. References:
[1] Rust
[2]Blockchain reference material
[3]Paper on ad hoc nets & tokens
[4] Pi