Computer Laboratory

Student Projects (2010)

NetOS

This page collects together various Part II project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address given; please get in touch if you want more information about the project.

Under construction: please keep checking back, as more ideas will hopefully be added to this page during the coming weeks.

Note: there are a number of stumbling blocks in Part II project selection, proposal generation and execution. Some useful guidance from a CST alumnus (and current NetOS PhD student) is here.

  • Visualising Temporal Graphs
    Contact: John Tang

    Graph theory learnt from the textbook is restricted to static, 2D representation of nodes and edges, however with real traces of human contacts and massive OSN's such as facebook and twitter, it is clear that graphs change over time. However, adding another dimension to this analysis clearly adds complexity in visualising and understanding such rich information.

    This project will develop novel visualisations to convey and highlight important features of temporal graphs to aid researchers in their analysis of such data. The tool will be evaluated on real, human mobility traces and massive online social networks i.e. facebook, twitter etc. with millions of nodes with timestamped edges. The ideal candidate will be comfortable with graph theory, 2D and 3D programming and optimising code in the programming language of their choice. Familiarity with complex network analysis or social network analysis would be helpful but not strictly necessary. This project will suit students interested in a career in online social networks and related businesses.

    KEYWORDS: online social networks, temporal graph theory, visualisation, infographics

  • Office Occupancy Detection using Multimodal Sensing
    Contact: Christos Efstratiou

    In the context of the FRESNEL project a number of sensors will be deployed around the computer laboratory. The deployment will consist of sensors with varying sensing capabilities, including temperature, humidity, ambient light, Bluetooth scanning, audio sensors and cameras.

    The purpose of this project is to build an application that will utilise the varying sensing capabilities of the network to determine the occupancy of office spaces. The general idea is to collect a range of sensor readings and to correlate these readings with the ground truth in order to identify an algorithm for detecting office occupancy. Data fusion techniques may need to be employed (i.e. comparing the ambient temperature in a room with the reading of a particular sensor to identify if an interesting event is recorded by that sensor).

  • Social Backup [taken]
    Contact: Malte Schwarzkopf

    Many people would like to have secure off-site backups of their personal data. Recently, so-called "cloud storage" services (such as e.g. Amazon S3) have started providing infrastructure for this. However, there is are many inherent issues to do with trust and control: users may not trust a cloud provider with their personal data, or they might feel uncomfortable about it being stored in a country with a jurisdiction that might not protect their privacy. Furthermore, cloud storage is still prohibitively expensive.
    A better concept might be for a group of friends to arrange a mututally beneficial backup system whereby everyone offers some storage space to the others (this could be on servers, desktop machines, laptops or mobile devices). The challenge is to do this in a secure way that maintains privacy and offers compelling performance.
    Different pairings of people might have different degrees of trust and this could be reflected in the strenght of pairwise encryption (though one would have to consider the case of a friend accidentally giving parties with weaker trust getting accessing the data!). Alternatively, data could be distributed across multiple different friends' devices, and only the combination would be sufficient to reconstruct the old data.
    This project has lots of scope for the design and implementation of clever synchronization algorithms and cryptographical constructs.
    Difficulty: ★★ to ★★★

  • Crowd Computing on mobile phones
    Contact: Malte Schwarzkopf

    Some people (including myself) in the SRG are currently working on Skywriting, a Turing-powerful cluster computing framework for cloud computing. Skywriting is written in Python and has been ported to the Android operating system, allowing us to run mobile phones as Skywriting workers.
    This project would extend Skywriting to enable delay-tolerant computation, i.e. computation that is scheduled on a mobile phone and which returns a result at some (potentially undefined, potentially weakly bounded) point the future. For example, the phone could compute only while it is being charged and thus connected to mains power. Furthermore, the bandwidth on 3G networks is limited, so the phone could wait until it has a fast network connection before transferring input and output data (an ambitious student could even look into ad-hoc networks between multiple Skywriting-enabled phones to epidemically spread input and result data).
    A student attempting this project should be familiar with Python and not afraid of working with research code that is in active development. The evaluation of this project would probably entail some simulations and a small practical evaluation with some phones in the lab.
    Difficulty: ★★

  • Encrypted Facebook
    Contact: Malte Schwarzkopf

    Many people have recently become increasingly worried about Facebook taking a rather relaxed attitude towards the privacy of personal data. However, attempts at building more secure social networks with technical solutions that ensure data privacy, such as encryption, have not enjoyed much success because Facebook captialises on the "network effect" of everyone else using it.
    It would be very useful if we could still use Facebook, but encrypt all the data stored there, enabling only friends who also use the same browser extension to see the cleartext (or the unscrambled photos, etc.). There are various directions one could take with this project: you could make it so that only messages exchanged with friends who also use the extension are encrypted, or so that everything is encrypted. To implement the plugin, you could use a toolkit such as Greasemonkey, or implement it entirely yourself.
    Difficulty: ★ to ★★

  • Implementation of Homomorphic Encryption [taken]
    Contact: Malte Schwarzkopf

    Homomorphic encryption is the idea of performing arbitrary computations on an encrypted piece of data ("ciphertext") without decrypting it first. In other words, the operations manipulate the ciphertext to generate another, different, ciphertext, which when decrypted produces a plaintext that is equivalent to the original plaintext with the operations applied. For example, operation "add 37" on plaintext "5" gives "42". Under homomorphic encryption, we apply "add 37" to E(5) (where E is an encryption function) and get back E(42), which when decrypted gives us "42" (the correct answer). The utility of this is that using homomorphic encryption, secure computation could be performed in a potentially hostile environment (e.g. untrusted servers on the "cloud").
    An important observation is that a fully homomorphic scheme only needs to support two operations: addition and multiplication. These are sufficient to bootstrap any computation that we can imagine (even though it would be very, very, very slow!).
    Homomorphic encryption has long (since the 1980s) been considered to be an unsolved "holy grail" of cryptography. However, recently, several working schemes have been published (the last is easiest to understand and implement as it does not require understanding of lattices). Most of the work in this space has been in theory though, and there is implementation of any of these systems available.
    Your task in this project would be to implement one of the published schemes and show that it works with small examples (due to the exponential increase in computing power required, you are likely to only be able to try it with small inputs and keys, but that is enough for a proof-of-concept).
    As a possible extension, you could investigate how much of the process can possibly be parallelised (a question that has received very little attention so far) to speed it up. Alternatively, you could look into data structures and algorithms that could be implemented using a weaker form of homomorphic encryption (e.g. partial schemes) that can be implemented at better performance.

    N.B.: This is a very challenging project, full of formal maths and theory despite being an implementation project, and is rooted in cutting-edge research (the publications in this area are not even a year old). If you would like to attack this, it will require lots of dedication and hard work! (However, if you manage, some glory will be yours! :-)).
    Difficulty: ★★★★

    N.B. (ii): A variation of this project could also be done as an MPhil project – in this case, the "extension" would form a mandatory part of the project.

  • Transparent NIC Failure Support
    Contact: Ripduman Sohan

    Recent studies have shown a large number of OS failures are due to faulty hardware drivers and/or transient errors in the interaction between the OS and devices. In this project, you will provide functionality so that the OS (and applications) is able to recover from transient and permanent errors in network devices, implemented as either kernel support to transfer network state between physical NICs or a user-level library that intercepts and reroutes network system calls between different physical NICs (your choice).

    To tackle this project you need to have strong practical skills, and some working knowledge of Linux. Difficulty: Medium

  • Pseudo-File Filesystems
    Contact: Ripduman Sohan

    Files are the basic unit for naming and controlling access to data in modern OSes. Files may be distinct for various reasons (security, ease of management, etc). While file-level naming and access control works most of the times, it is sometimes useful to be able to construct a "pseudo-file" that is composed of parts of other files (e.g. consider the case where I want to expose only the dates and message identifiers of received and sent emails in my private INBOX so that my boss can verify that I am replying to emails in a timely fashion). In this project you will construct a system (and associated tools) to allow users to create pseudo-files in the filesystem.

    To tackle this project you need to have strong practical skills, and some working knowledge of Linux. Difficulty: Medium

  • Crowd-sourced Traffic Routing For Developing Countries
    Contact: Ripduman Sohan

    Oftentimes, developing countries have quite bad traffic jams at rush hours due to underdeveloped infrastructure being overutilized coupled with inadequate traffic reporting and control mechanisms. In this project you will develop a smartphone (Android) based application that will utilize the sensors in the phone (GPS and accelerometer) to automatically provide traffic information and rerouting information to users. Ideally, the application will determine and its location and speed automatically, reporting this to a central server. Other instances of the application running on different devices can then use this information to suggest alternative routes to avoid traffic jams thus helping to ease traffic congestion in a distributed manner. Difficulty: Medium

  • Realtime Scam/Contact Alerting
    Contact: Ripduman Sohan

    People often get unsolicited emails, SMSes or IMs. Some of these are legitimate and some are scams. When this happens to me I hit the web to find out exactly what is going on before taking further action (e.g. Did I really win 3750 pounds in that accident I had?, Is it important for me to find out where this 0800 number originated? etc). On the web it quickly becomes obvious what the true situation is (e.g. Ah,a spammer just waiting for me to reply to his premium rate number). In this project, you will attempt to build an application that quickly and transparently tries to verify the origin and/or provide information about unsolicited contacts by using the web (and other useful data sources) to better inform the user about who is contacting him or her (and what it's likely to be about). Difficulty: Medium

  • Reintroducing forgetting in Twitter
    Contact: Daniele Quercia

    This project is about creating technological solutions that make forgetting a tiny bit easier than remembering in the digital world. The idea is to build a Twitter client based on human-controlled forgetting (which have been recently proposed for emails). You will build a Twitter client that allows users to post tweets along with corresponding expiration dates (technically, a tweet will be tagged with an expiration date using the Twitter API). The client automatically deletes tweets that have reached or exceeded their expiry dates. Users should also be able to change expiration dates in case they feel that a tweet has lost its informational value over time or, on the contrary, remains important beyond the initially set lifespan. The effectiveness of this approach should be evaluated with a small user study.
    * “Best If Used By”: Expiration Dates for Email. CHI '09 Difficulty: Medium

  • Sentiment Analysis for Multilingual Tweets
    Contact: Daniele Quercia

    Approaches for automatically classifying the sentiment (either positive or negative) of Twitter messages have been recently introduced. They are used by online shoppers to check the sentiment associated with products, or by companies to monitor the popularity of their brands. One limitation of those approaches is that they work for Twitter messages in English. This project considers the three most promising existing approaches (Naive Bayes, Maximum Entropy, and SVM) and evaluates them for Twitter messages in any continental European language. The three approaches will be made available through an API (similar to http://twittersentiment.appspot.com/).
    Difficulty: Medium

  • Topic Detection for Twitter profiles
    Contact: Daniele Quercia

    In this project, you will implement standard topic models (e.g. LDA) and use them to map the content of a Twitter profile into topics/categories. You will then explore the extent to which the restricted length of Twitter messages affects these schemes by running a comparative study on a Twitter dataset.
    Difficulty: Medium

  • City Sampling 2.0
    Contact: Daniele Quercia

    The legibility of a city is “the ease with which [a city’s] parts can be recognised and can be organised into a coherent pattern” and is associated with the emotional and physical well-being of city dwellers. In 1972, Stanley Milgram designed a "scene sampling" procedure to measure the legibility of a city. This project proposes to build the first web-based tool that automates Milgram's "scene sampling" procedure. Users of this tools will be presented with scenes of their city (retrieved from geo-tagged pictures on Flickr) and will asked whether they could recognise each scene. Based on their answers, legibility measures of different parts of the city will be computed and will be shown on a geographical map.
    Difficulty: Medium

  • Online Betting Site related proposals
    Contact: Jon Crowcroft

    See: Betfair Suggestions for more details!
    Difficulty: A range...

  • Energy in Data Centers
    Contact: Jon Crowcroft

    See: Citrix Suggestions for more details!
    Difficulty: A range...

  • Zeus Alert!
    Contact: Steven Hand or Contact: Jon Crowcroft

    See: Zeus Suggestions for more details!
    Difficulty: A range...

  • There are several projects to work under the umbrella of ErdOS. There are opportunities both for MPhil and Part II students depending on the research required by them. There are other possibilities in addition to the ones described below. Please, do not hesitate to contact us for further details (Jon Crowcroft and Narseo Vallina-Rodriguez).

    Energy Profiler
    Contact: Jon Crowcroft and Narseo Vallina-Rodriguez

    The resources available in a mobile handset have different power states and energy demands that might depend on the handset context (e.g. location impacts on the energy consumption of cellular nets). We would like to know how much of the energy consumption accounts for each individual resource and applications. We want to develop an energy profiler that uses exclusively information available in Android handsets (e.g. Kernel and Android API), however, the framework obtained could be also used for other embedded devices such as sensors. We have indications about the right variables that we should look at but more research is required. Once the model has been built, it is necessary to validate the results with the ones obtained with a high-resolution external measurement tool. As a result, we could obtain an energy profiler that estimates the relative energy consumption of each application individually, customised for each handset and without using inaccurate offline information and external tools. That project might require some research and working at the Kernel level for Android OS. Good knowledge of Java and C++ is recommendable, Previous experience on compiling kernels might be useful.
    Keywords: Mobile computing, resources management, energy allocation, energy profiling
    Difficulty: Medium - High

  • Activities Profiler
    Contact: Jon Crowcroft and Narseo Vallina-Rodriguez

    ErdOS profiles energy at an high-level abstraction called users activities (e.g. working, going to the pub with friends, ...). The goal behind this design is being able to predict future energy demands at those activities by monitoring what kind of applications are executed by the users at those scenarios and also the way users interact with their handsets (when and where they charge the handset, when they set the handset in airplane mode, etc). It will help ErdOS to efficiently allocate resources both in local and remote handsets proactively. It requires understanding how applications demand OS resources (intercepting system calls) and mapping them to the right activity inferred from contextual information (e.g. location and time). That project can be split in two sides, the implementation and the research one (context-awareness). It might require working at the Kernel level for Android OS. Good knowledge of Java and C++ is recommendable. Previous experience on compiling kernels might be useful.
    Keywords: Mobile computing, resources management, context-awareness
    Difficulty: Medium

  • Resources discovery protocols
    Contact: Jon Crowcroft and Narseo Vallina-Rodriguez

    ErdOS devices must be able to discover neighbouring resources quickly and in an energy efficient way. However, current wireless interfaces such as Bluetooth have a time consuming protocol that can require up to 60 seconds to perform an enquiry. A potential way of performing fast discovery is by adding information of the available resources and the resources demands of the handset on the radio beacons of wireless interfaces (e.g. Bluetooth and WiFi) and also by audio beacons taking advantage of the low-energy required by the microphone and the speakers available in the handset (see http://anil.recoil.org/papers/audio-networking.pdf). An implementation of those mechanisms accompanied by an evaluation of the energy - Hide quoted text - consumption required by them and the execution time can be easily done. An extension can be done as identifying the triggers to perform those advertisements efficiently. It might require Linux knowledge and C/C++ programming skills to modify the drivers. Previous experience on compiling kernels might be useful.
    Keywords: Mobile computing, resources management, discovery protocols, energy consumption, pervasive computing
    Difficulty: Medium

  • Name Manager and Access Control
    Contact: Jon Crowcroft and Narseo Vallina-Rodriguez

    Naming resources and users in ErdOS is another potential line of work. An implementation of ErdOS name manager following Haggle-enfant name graphs structure. It should comprise the handset available resources in the local handset, potential user names and social contacts from the users. Efficient algorithms for accessing that information are also required. An extension can be implementing an interface to define the access policies to the resources based on the different users' groups and affiliations (i.e. taking advantage of UNIX permissions for user, groups and others). It can be implemented on Java at the user space.
    Keywords: Naming systems, Access Control, OS
    Difficulty: Low