home search a-z help
University of Cambridge Computer Laboratory
Student Projects (2009)
Computer Laboratory > Research > Systems Research Group > NetOS > Student Projects > Student Projects (2009)


This page collects together various Part II project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address given; please get in touch if you want more information about the project.

Under construction: please keep checking back, as more ideas will hopefully be added to this page during September.

Note: there are a number of stumbling blocks in Part II project selection, proposal generation and execution. Some useful guidance from a CST alumnus (and current NetOS PhD student) is here.

  • Some Android Phone distributed applications n.b. there were some very succesful summer projects in the DTG which show that java development of sophisticated applications for these devices is pretty painless. We are interested in the possibility of kernel development projects but until we know if this can be done reliably, its not yet on the cards (an interesting kernel add on would be to make TCP work over multiple paths simulataneously so one could use both WiFi and HSPD/3G simultaneously...there's working code out there for this on a MAEMO/Linux stack which ought to port...).
  • P2P User Contributed Live Video

    Everyone has heard of youtube, but it takes a lot of resources to upload something to it. Most smart phones have a decent camera (e.g. android HTC G2 phones have 5Mpel video capabile). What if we could stream live video, in an ad hoc mobile net (wifi, bluetooth) rather than via the cellular net - people could subscribe by tags or by human source (e.g. friend on facebook or latitude).

    Essentially, this would be an application-layer multicast overlay, that would use intermediate phones (one in contact via whatever wireless local hop was avaialbe and that are part of the ssytem) to cache copies until the next "hop" towards a subscriber is availabile. Subscriptions would percolate through the net (indeed, subscriptions might be distributed using the cellular net since they will be small items) whereas video chunks or files will percolate "towards" subscribers, from "providers" in a dynamically changing mesh. The idea of doing it this way is to avoid the large cost of uploading (Indeed, the 3G nets might not be capable of dealing with the load), whereas much of the interest may in fact be relatively local (think sports arena, rock festival, traffic jam etc).

    Could be useful for giving live view of traffic ahead on a road or of unfamiliar scene, or for giving multiple crowd views of a sports event.

    The Haggle Project has some nice infrastructure software for this sort of application, including coping with disconnections (cache the content on a device for later pick up, or send it into the cloud).

    Contact: Jon Crowcroft

    Keywords: Mobile, Distributed, Media, Java, Smart Phone.

  • Home Tagging/Barcoding and Energy Use monitoring of stuff in your house - Neverlost

    You know when you lose something in your house how either backtracking through your day, or else asking someone else is the way to find it.

    So how about we tag things in the house using a camera on our smart phone. We also (see the MSR SenseCam project) log scene changes as we walk around. We can then find lost objects or replay our day til we trigger a memory of where they were. WHen you buy a new gadget, you take a pic, and classify it n the database and tag it. As you walk around, the phone takes snaps (e.g. whenver there's a big enough scene change triggers by enough image diff, or by acceleromter, or even by "arriving" or "departing" according to geo-loc (e.g. Google map/GPS or wifi detection of your Home Access Point).

    A nice extension is to take home appliances (TV, washing machine, PVR, dishwasher) that have LEDs that show what they are doing, and take a few snaps that classify them as "on", "off", "sleeping", etc. Then the people walking around the house taking snap shots, provide a log of devise usage during the day. THis could be later compared with the overall bill for the house (the meter might be readable with a camera phone app, or else meter readings might just be web accessible) and used to determine what devices in the house cost most and see if its worth changing peoples' usage patterns.

    We could do this collaboratively (with our facebook friends) for wider searches. THe results of pictures of gadgets (remote controls, keys, etc) would be put in a shared "tuple-space" that one can then search - this ould use a DHT on the phones, or could just be on a server (home server or in the "cloud").

    There was a project at Intel a while back that used smart camera phones and 2D barcodes, but this is a bit different - we use simple image mapping.

    Of course, if you lose your own phone, you are totally lost:) (Of course, if this design is any good, it will let you find your phone by asking friends' help!)

    Contact: Jon Crowcroft

    Keywords: Mobile, Distributed, Finding Lost Stuff with a Smart Phone.

  • Car Pooling extension to miniBus

    The miniBus application developed by David Tatersall this summer, allows users of public transport to see live arrival info.

    Essentially, the cloud (a server in the infrastructure) looks after aggregating information from multiple public transport providers, and then users can subscribe to this information, from their smart phone, via 3G or WiFI, and get local (e.g. if they have google map location service with GPS or via triangulating their position from cell ower or WiFi APs - all this is already done). They then get info appropriate (e.g. the CitiBus 1 is about to arrive at a stop just up the road) and can route plan.

    This project is an extension to allow private car drivers to offer their journey info, securely, as if they are a public transport operator into the same system. i.. Make the system symmetric, so users of a phone can also be (semi-)public transport operators.

    A key requirement is to understand how to integrate an acceptable privacy architecture into the system. This might be based aroudn extensions to Facebook groups and security mechanisms, or might be completely new. Clearly, location privacy is of much concern to many people, so this needs to be thought out carefully (a server should not hold suvh information in the plain, and key distribution and even revokation would need to be part of the system design architecture).

    An extension might see how to get several people in a google latitude group to meet at a point (e.g. a pub in cambridge) using suitable transport, shared or not! an extension might cope with politely leaving some people out.

    Contact: Jon Crowcroft

    Keywords: Mobile, Distributed, Ride Sharing, Security, Smart Phone.

  • Bio-feedback Juggling Tutor for Smart Phone w/ Acceleromerter

    A ot of new phones have prtty reactive accelerometers - one could imagine, then, juggling several of these phones (above a cusion, presumably to minimise cost when you drop them!). A suitably clever programme could tell when the optimal time to throw each phone in the air would be - it could use one of the well-known juggling pattern description languages (for which there are parsers and even simulators).

    Contact: Jon Crowcroft

    Keywords: Accelerometers, Geometry, Audio/learning, Smart Phone.

  • Everyone has heard of youtube, but it takes a lot of resources to

  • Goose. Distributed Social Network Services

    80% of the world population didn't have Internet access yet in 2008. This ratio is even higher if we consider exclusively developing regions. However, the mobile penetration in these areas is quite high and the people still rely on traditional social mechanisms to communicate. All these facts point out that using mobile phones and the human mobility as the network infrastructure can be the way for deploying social network services and connectivity in developing areas. However, developing areas with limited network are not the only scenarios where that kind of service can be used.

    Goose is a distributed Social Network Service (SNS) framework initially designed for providing SNSs for developing countries and also for social experimentation. Its main features are its modularity and scalability in order to support new services that can be adapted to the needs of the local communities. In fact, it allows the users to have full control of their information and it does not require any network infrastructure since the messages can be sent both using local connectivities on a store and forward fashion or through the existing infrastructure (e.g. GSM network).

    The first prototype was built on J2ME in order to target as many mobile phones as possible. However, the success of Android handsets makes interesting to explore the possibilities that this platform can offer us to deploy that kind of distributed systems (from GUI to connectivity features and security).

    We are looking for a student interested in porting and optimising the system for the Android platform and also willing to explore the possibility of including new features to the system (e.g. support to new applications, connectivities and a resource management optimisation). Resources management, specially memory, processing power and battery, are key when developing mobile and ubiquitous applications to do not affect the user experience.

    Contact: Narseo Vallina

    Keywords: mobile application, distributed system, social network services, Delay Tolerant Networks

  • Lost in Translation

    Ever find yourself in a foreign country eager (or clueless) to what is written on a shop sign or restaurant menu? With your trusty phone, its camera and some existing web-services that could all be a thing of the past.

    Image analysis skills are a bonus, but also an interest in making (the ever increasing) global travel experience easier, safer and possibly more fun.

    Contact: Kharsim Yousef

    Keywords: Mobile, Translation, Camera, Image, Analysis.

  • A rapid prototyping interpreter for native libraries

    Contact: Stephen Kell

    Dynamic languages like Python are commonly praised for shortening the programmer's edit--run--debug cycle. One useful feature is the read--eval--print loop, where snippets of code can be tried out “live” in an interactive fashion, simply by doing something like

    >>> import foo

    and then typing Python statements line-by-line which exercise the foo library. It would be nice if we could do this not only for Python libraries, but for regular shared libraries libfoo.so. This project would produce an implementation of a subset of Python, designed to satisfy this requirement.

    Thanks to modern dynamic linking and loading infrastructure, dynamically loading code in response to interpreter commands (like the above) is easily achieved (see dlopen(3)). However, as documented by a recent LtU post, interpreters of dynamic languages are not written to make use of this infrastructure, nor of the standard descriptive metadata (such as DWARF) output by compilers (and consumed by debuggers). Instead, they roll their own interpreter-specific data structures and metadata formats. The result is that there is an artificial “syntactic gap” between interpreted and compiled languages. The usual “solution” is a wrapper generator like Swig but this is hardly adequate: owing to the the syntactic gap, it still doesn't allow direct run-time sharing of data (i.e. objects in one world may not directly reference objects in the other), adds unnecessary runtime overhead (thanks to the proxying and/or copying which indirect access entails), necessitates boilerplate glue code whenever compiled and interpreted worlds communicate, and is very difficult to debug (since machine-level debuggers can't meaningfully inspect interpreter-level data structures, and vice-versa). By contrast, this project would avoid the problem by arranging that Python and “native” objects are essentially indistinguishable at run time. Supporting Python-style dynamism is a major challenge, but can be done. There are some interesting philosophical questions concerning raised by this project (such as the effect on language semantics, particularly run-time safety properties, when values in a running program no longer “belong” to one language or another) which would need to be covered in the dissertation. Note that supporting the Python C API would not be necessary at this stage.

    Warning: this is a research-oriented project! It's also at the difficult end of the spectrum. If you're interested, it's very important that you chat to me so that we can hammer out a proposal which makes a good project from the examiners' perspective (as well as being an interesting piece of work). It would also be nice for any prospective student to be familiar with Python (or be keen to learn) and to have a strong interest in both programming languages and practical system-building. Evaluation rests on our ability to evaluate an implementation of a known language, by one or more of completeness (how many language features are implemented?), conformance (of the features implemented, how accurately do they adhere to the specification?) and performance (what are the CPU and memory costs of running the implementation for various classes of program?). Essentially we can do all of these by running existing Python programs on the implementation.

  • A self-organising file system

    Contact: Stephen Kell

    If you're anything like me, your file system tends to build up collections of assorted downloads which ideally you'd like to sort through and file somewhere appropriate in a directory structure. You probably also have some files which you misfiled for various reasons. It'd be neat if a tool could learn the rules underlying our individual filing system, and then automatically file stuff as it comes in, and point out exceptions to the rules caused by misfiling.

    There are several potentially interesting aspects to this: learning rules based on a manually-classified training set; expressing rules in some sort of logic; implementing interesting predicates in that logic by analysing files in a semantically aware way. Although I picked the file system as the target domain (since it's what I find hardest to keep organised!), you could possibly do something similar with mail, bookmarks, and so on. However, the idea is to go beyond simple syntactic rules on available metadata: you should aim to support reasonably complex rules and sophisticated semantic discrimination, and this probably makes file systems the most suitable domain. For example, in my file system I make several distinctions that aren't captured simply by file name or type: distinctions among written articles based on what they're about, distinctions among music based on whether it's from a record I own (i.e. has a corresponding entry in my cddb cache), and so on. You might support these and other kinds of semantic awareness using NLP techniques (using named entity recognition, perhaps) or image recognition techniques, or music recognition techniques, and so on.

    To take on this project, you should really be someone who has a need for such a system! In other words you should have your own large file system to work with, and most importantly, some non-trivial filing policies to implement. Given these, a thorough evaluation of the system will involve measuring the accuracy of classification achieved by your system, and should be pretty easy. Overall difficulty: moderate.

  • Visualisation of software

    Contact: Stephen Kell

    Visualisation of software is useful for a number of reasons: comprehension, prototyping, debugging, profiling and so forth. Various projects are possible which would implement and evaluate a tool for visualising software in some particular way, aimed towards one or other of these goals.

    Specific example problems include: visualising dynamic structure of programs at the object level (most likely for debugging); visualising static structure at the module level (for comprehension or prototyping); visualising communication patterns within execution traces (for debugging or profiling). The hardest part in all cases is making the visuals comprehensible to the viewer: usually there will be too much data, and/or too dense an interconnection structure, for these to be useful without further processing. For example, in the case of static graph-based abstractions, such as call graphs or linkage graphs, usually the number of edges is a problem, since it makes the layout completely incomprehensible; there are many interesting techniques for clustering nodes, grouping related edges and eliminating uninformative ones. Three-dimensional visualisations are also a very interesting avenue, which has been less well-explored and can possibly allow more of the raw complexity to be usefully retained.

    Evaluation of the effectiveness of a visualisation is slightly tricky, but there are a number of reasonable proxy metrics such as reduction in size (number of nodes/edges in the graph), reduction in edge crossings (in a two-dimensional graph layout), accuracy of approximation (i.e. how well your edge grouping/clustering has preserved the information in the original graph). Difficulty: depends on what specific project you choose, but probably moderate to hard.

  • Optimising object placement in a client-server distributed system

    Contact: Stephen Kell

    If you've ever tried using a thin-client protocol (say an X11 application, or even a web application) over a high-latency link, you will have noticed that latency kills the user experience. Web applications fare better than X11, however, because the client side is “thicker”---the user interface elements are implemented in the browser, so simple UI operations do not require a round-trip.

    The aim of this project is to develop a simple distributed runtime which generalises these observations into a system which is continually optimising itself with respect to the placement of its working storage and/or computation. It must do this across variations in latency and throughput of the link between client and server, and amid disparity of computational resources available at each. Essentially we are exploiting a dynamic partitioning of computation and storage between the client and server locations, rather than the usual static one. This is neat because users generally don't care where computation happens, nor where working storage happens, but it's these which largely determine performance of the system. (Users usually do care a little about where persistent storage happens, by contrast.)

    To take on this project, you will need a very strong set of practical skills, and should already have ideas about how you would go about implementing such a system. For example, you might take an X11 client and server, and build a runtime monitoring and migration service which optimises the system by moving state (objects) or computations (threads; harder!) between client and server. One option would then be to reengineer the communication protocol to use an object middleware like Java RMI, and extend the JVM to support migration of objects (i.e. atomically replacing a local object with a stub, which hands off to a migrated copy of the original object on a remote system). However, you should expect to bring your own ideas to the problem. Experimental evaluation is fairly straightforward with this project. There are also (if you do it well!) lots of interesting algorithms involved. Difficulty: hard!

  • Boot Optimizer

    Contact: Ripduman Sohan

    The idea behind this project would be to build a tool that analysis and optimizes the boot for a typical Linux PC. There is much scope for integrating (and quantitatively evaluating) a number of techniques that have been implemented previously (parallel startup, background startup, reorg of disk blocks, etc) and scope for some new ideas (delayed loading, load balancing, etc).

    Ideally, at the end of the project you will have created an open-source tool that implements a number of boot optimization techniques and have numbers evaluating the effectiveness of each of these techniques.

    To take on this project you should have a good idea of Linux, esp what the boot process entails, a strong set of practical skills and the ability to design and write a moderately large program properly. Difficulty: moderate.

  • Calorie Counting For Nerds

    Contact: Ripduman Sohan

    As a tubby exercise avoider my only weaponry in the battle of the bulge is to pay close attention to the calorific content of the food I consume. With the recent (government) push towards healthy eating, food labelling had been standardized using the Traffic light standard which provides instant access on the major composition elements of most processed and pre-packaged foods.

    The point of this project would be to create a tool that will allow me to track my consumption using a smart phone (iPhone, Android based, etc). I should be able to take a picture of the label and have the application identify and parse the label, log it and account it towards my daily intake.

    While the baseline project is sufficient, there is a lot of scope for novelty. For example:

    • Is it possible to automatically parse and extract the more detailed food nutrional information?
    • Can you work out the embodied energy contained in the consumed foods?
    • How would you handle non-packaged foods? (You can do it simply using freely available food lists or more complicatedly through image recognition (e.g. using IMENSE's engine).
    • Is it possible to work out my energy consumption/expenditure differential using an Inertial Measurement Unit (some modern smartphones have these)?

    A successful execution of this project would not only gain my external gratitude, but also has the potential for earning you some money and notoriety. You should have a basic idea of computer vision and strong practical skills. Difficulty: moderate to hard.

  • Fine-Grained Energy Accounting

    Contact: Ripduman Sohan

    The purpose of this project is to instrument the Linux Kernel so what we can easily and accurately account energy consumption to user-level processes. It would be immensely useful to know, for example, that the Apache web server I'm using is costing me a certain amount in pence for the energy consumed in CPU usage, RAM usage, disk and network.

    While this project seems pretty abstract it's quite simple and easy to implement and is composed of two distinct elements. First you need to measure the runtime consumption of common peripherals and secondly you need to write the accounting framework. Fortunately, I have numbers for the first goal so you only need accomplish the second -- and I have ideas on how to do this part.

    To tackle this project you need to have a strong practical skills, a strong working knowledge of Linux and an overview understanding of how the Linux kernel works. Difficulty: moderate to hard.

  • TPM Based SSH Keys

    Contact: Ripduman Sohan

    Recently, Piete Brooks was forced to restrict remote SSH access to the lab from known locations because a login was compromised and the attacker harvested a bunch of SSH keys to try and break into the lab.

    One mechanism for preventing this sort of attack would be to make use of the Trusted Platform Module present in the BIOS of modern computers to store elements of the SSH key. This way, there is no copy of the private key available to harvest.

    A successful execution of this project would create a set of tools for storing, accessing and modifying the SSH key within the BIOS and modify the SSH server/client to use it. Extensions to the project would involve creating a generic framework to store arbitrary passwords (e.g. for website logins) so that users do not have to write them down or store them on disk.

    Tackling this project would require you to have strong practical skills and a cursory interest in computer security. Difficulty: moderate to hard.

  • Integrating Page Dirty Bit Sampling In The Linux Kernel

    Contact: Ripduman Sohan

    Modern operating systems use heuristics to determine when pages have been accessed and do not make use of hardware features for the purposes of portability and simplicity. However, using this information may lead to more accurate and useful page access information thus increasing the effectiveness of the page cache.

    The idea in this project is to create framework to enable to the Linux kernel to make use of the page dirty/access bits updated when the processor writes or reads from a page. The basic idea is to frequently scan the page table to obtain this information and integrate it into the kernel where useful things can be done with it (like updating the LRU cache).

    Tackling this project requires you have strong practical skills, an interest in OS/kernel development and the ability to program in C. Difficulty: moderate to hard.

  • A Toolbox For Common Image Operations

    Contact: Ripduman Sohan

    I have a very dear friend who frequently has to do simple transformation operations (border crop, affine transformations, lightening, darkening, etc) on a large number of files at once. She is not about to learn to use OpenCV but would love to do these operations automatically regardless. She is not unique in this requirement, there are quite a few people who commonly need to carry out image operations on large sets of files.

    The idea in this project is to create a toolbox for common image manipulations that non-technical people can use. Ideally there would be a simple english-like language they could use to describe the operations.

    Tackling this project requires you have strong practical skills, an interest computer vision, possibly compilers and natural language processing. Difficulty: moderate to hard.

  • Dude, where's my car?

    Contact: Fernando Ramos

    Almost everyone that owns a car has certainly experienced this: going to a shopping centre or to a car park, and forgetting where you parked it!... And then spending half an hour trying to realise if it was stolen or not...

    A simple application that records the GPS location of your car and then guides you to it could solve the problem nicely. This would mainly involve Java programming for an Android phone.

    Besides implementing the basic application, it would also be interesting to address some issues related to the accuracy of GPS readings and the need of GPS indoor for such system.

    Keywords: Mobile, GPS, Java, Smart Phone, Android.

  • Notice: your call may be recorded to the cloud!

    Contact: Fernando Ramos

    It is really nagging when you have a discussion with someone that goes like this: "look, I told you thatâ" "sorry, but you didn;t tell me that" "yes, I am sure I did!". Wouldn't it be nice to prove that, in fact, you really did say that?

    A Java application built on an Android phone could help you disentangle the knot. The phone would record all your calls and send them to a web server (the "cloud"). Later you could access all your calls (plus voice mail and SMS, if desired) from anywhere.

    Besides programming the application for the mobile phone, a web server would need to be setup and a webpage with decent security features put in place.

    Keywords: Mobile, Java, Smart Phone, Android, web server, voice mail, security.

  • Log me in to my mobile

    Contact: Fernando Ramos

    We are so dependent on our mobile phones that when we forget them at home it seems we've lost a limb. When this happens, it would be nice to access your mobile from, say, any web browser. Something like logmein (for example), but instead of accessing a PC remotely, you would access your mobile phone instead.

    An application that establishes a TCP connection with the mobile phone and retrieves all your missed calls and messages could do the trick. However, cell phone companies sometimes block "incoming" TCP connections towards the phone over 3G. To overcome this problem, the application could send an SMS with the application's IP address and port, and then it would be the mobile phone establishing the TCP connection. Then, it could easily send the data using the GSM or 3G network.

    Keywords: Mobile, GSM, 3G, Java, Smart Phone, Android.

  • Badger Social Mobility Map

    Wild life monitoring provides a huge quantity of data about animal behaviour, but it is often difficult to understand what insights we can extract from the collected information.

    This project aims to provide an interactive visualisation of animal movement data and their resulting social contacts. The data has been recorded over the last year through the use of RFID tags that are attached to each badger. The presence of an animal at a particular place is recorded by fixed RFID readers distributed throughout a forest environment. The amount of collected data spans different months and contains over 30 animals.

    The ultimate goal of this project is to develop an interactive application which allows the visualization and the analysis of badger movements and their social contacts over time.

    The application should provide:

    [1] Visualization of badger presence at different places (e.g. on Google maps) and extraction of mobility paths between them.

    [2] Visualization of the social network created by the encounters between badgers in different locations and ability to sub-select time intervals in order to investigate how badger behaviours evolve over time.

    [3] Ability to correlate external (e.g. micro-climate) data against badger movements and computation of some basic network and mobility statistics.

    The most challenging point of this project is to provide a user interface which is both appealing and usable, in order to grasp through interaction/visual inspection of hidden features in badger behaviour. As a consequence, this application must provide the ability to interact with the dataset and select different views of it.

    Finally, it must be a Web based application, ideally developed with a Flash, Java or Ajax framework that integrates with online mapping solutions as Google maps.

    Keywords: Badger, Mobility, Social Networks, Web, Flash, Ajax .

    Contact: Salvatore Scellato