Computer Laboratory - Student Projects (2003)

	Computer Laboratory Student Projects (2003)

Computer Laboratory > Research > Systems Research Group > NetOS > Student Projects (2003)

External

This year, Intel Research Cambridge, who are in the Gates Building with the Computer Lab and LCE, have suggested some projects - see below for some details. Other projects from them and from the LCE (Lab for Communicaitons Engineering) are also within this doc.

Miscellaneous

Route Planning on the London Underground
A couple of years ago there was a very successful diploma project that developed a simple system for planning the fastest route between pairs of London Underground stations, outputting which line to take and which stations to change at.
The Underground map was treated as a graph in which the edges were labelled with the travel time between adjacent stations. No account was taken of the cost to change between lines, thus simple application of Dijkstra can find the shortest path.
One of the 'outputs' of the previous project was the Underground travel time information in a machine readable form, and this could be used as an input into this year's project and hence save time allowing other extensions to be explored.
One nice extension would be to develop a parallel version of the route planning algorithm to enable the decision to be made more rapidly, and hence be prepared for any massive expansion of the Tube network ;-)
A further extension might take train frequency into account, and hence calculate 'expected', 'best case' and 'worst case' times for the path.
Alternatively, one could consider a different and much larger network such as the UK rail network. The UK rail time table is available as a PDF document, and this could be parsed into an internal representation hence providing a more complex database in which route lookups could be performed.
This project will provide ample opportunity for you to show off your computer science skills!
Potential Supervisor:
Jagdish Modi at89@dial.pipex.com / Ian Pratt ian.pratt@cl.cam.ac.uk
Special resources:
For some of the extensions, access to a large multiprocessor machine will be provided by the the Systems Research Group.
References:
- Wilson R., Introduction to Graph Theory, Academic Press, 1972
- Hollow S., Route Planning on the London Underground, Diploma project, Computer Laboratory 2001
- http://www.nationalrail.co.uk/enquiries/nrt_upd.asp

Programming language projects

Easy lock-free programming
Some of our recent research on lock-free programming has been looking at how to provide concurrency control primitives that are easy to use (i.e. hopefully more natural constructs for mainstream programmers than things like the mutexes, condition variables and semaphores covered in Part 1B CS&A).
One promising approach is to automate the difficult aspects of the programming and to allow the programmer to use a higher level syntax. For example, to transfer a value between two hash tables they could write:
```
        atomic {
          Object val = h1.remove(key);
          h2.insert (key, val);
        }
```
We have an initial implementation of this built of a Software Transactional Memory (STM). Essentially, the STM provides read-word and write-word operations and a way of grouping them together into atomic transactions. Our paper at OOPSLA 2003 has more details. In many cases this system performs better than a simple implementation using the synchronized keyword.
There are a number of possible student projects that could build on this work:
- Our current implementation has focused on the STM; the rest is a mess of unreleasable code. In particular, the compiler support for mapping atomic onto a series of STM operations would benefit from being thought out more clearly and implemented more carefully.
- As part of his PhD work, Keir Fraser has developed an alternative STM with a different object based interface. It would be interesting to explore whether this can also be used to implement a construct such as atomic.
Both of these projects would involve design work followed by modification to the javac source-to-bytecode compiler.
Contact: Tim Harris
Log-file query language
Numerous research projects involve generating and analysing log files - for instance traces of activity recorded by syslog, traces of operating systems or library services, traces of network packets, or of object allocations and de-allocations. Many people assemble assortments of ad-hoc tools and scripts for analysing these files.
Existing data query languages (e.g. SQL) are rarely suited to these problems. For example, consider how to express "calculate the total number of bytes read from the file xyz" given a list of the system calls that a process has invoked. It is necessary to find each "read" system call, then to find the most recent "open" call which returned the file descriptor that the read specifies, and then to form the total of the number of bytes returned by those reads.
Many examples seem to involve joins involving notions of "most recent" or "closest subsequent". For example, finding the most recent "write" that preceded a given "read" and making sure that the data written matches the data read.
This project is to design an effective system for analysing log files. One option would be to extend the language supported by an existing tool such as the mysql database, although you would need to agree the exact strategy with your supervisor.
Contact: Tim Harris

System modelling projects

Running a combinatorial auction
Combinatorial auction designs have been proposed in a variety of situations, notably within the telecommunications industry where they have been considered for the allocation of radio spectrum and for the distribution of subsidies for universal service. This project will design and implement such an auction using a recently proposed auction design with improved (in fact, Polynomial) computational performance relative to standard designs (which are NP-complete). The project will implement the Auctioneer's process for accepting and selecting winning bids from remote bidders. Since the auction design includes multiple rounds a format will need to be designed to publish interim results to all bidders.
For more details see: Frank Kelly and Richard Steinberg A Combinatorial Auction with Multiple Winners for Universal Service Management Science 46 (2000) 586-596.
Contact: Richard Gibbens
Performance modelling tool for enterprise IP networks
Performance modelling techniques originally applied to circuit-switched telephony networks (with great success) have recently been modified and proposed for IP networks. This project will involve the design and implementation of a modelling tool using performance modelling techniques based on fixed-point approaches. The project will include both the construction of the computational methods as well as the necessary interfaces to suitable input and output formats in order to complete a modelling tool suitable for network designers.
For more details see: R. J. Gibbens, S. K. Sargood, C. Van Eijl, F. P. Kelly, H. Azmoodeh, R. N. Macfadyen, N. W. Macfadyen Fixed-Point Models for the End-to-End Performance Analysis of IP Networks In Proc 13th ITC Specialist Seminar: IP Traffic Measurement, Modeling and Management, Sep 2000, Monterey, CA.
Contact: Richard Gibbens

Networking projects

Receiver Layered Multicast
Network audio and video require reasonably high bandwidth and relatively low jitter (variation in delay). These requirements make it difficult to deliver high fidelity data directly from the network. However these media types may be encoded in a layered format; i.e. as a number of layers 1, 2, 3, ... n.
The receiver may choose to decode any prefix of the sequence, obtaining increased fidelity in each case. It is possible to distribute audio and video in a layered format where each of n layers is given a different multicast "channel". Receivers may then decide to receive the first k of these, and to decode (and render) the first j <= k of these in order to maintain real-time rates and/or to match the fidelity to the characteristics of the receiving terminal.
This project would choose or construct a layered encoding scheme for either video or audio (or both), and develop transmitting and receiving software. As a first step, broadcast on a local ethernet could be used in the place of multicast, although it would be nicer (and more valid) to use multiple networks and a "proper" multicast protocol.
Contact: Steven Hand
Predicate Routing
In current IP networks, the routing state of the system is primarily represented as a set of routing tables (local to each router) and a set of filtering rules (also local to each router or firewall). In contrast, Predicate Routing represents the state of the network as a set of boolean expressions associated with links which assert which kinds of packet can appear where. From these expressions, routing tables and filter rules can be derived automatically. Conversely, the consequences of a change in network state can be calculated for any point in the network (link, router, or end system), and predicates derived from known configuration state of routers and links. This subsumes notions of both routing and firewalling. More details can be found in the following paper.
This project would involve designing and implementing a network control plane for predicate routing, probably in centralised manner. An extension would be to develop an inference tool to alllow one to derive properties of the network. A more ambitious project is to attempt to build a distributed version either in simulation (for example using SSF), or by packaging up a real implementation (e.g. Zebra).
Contact: Steven Hand
Audio Networking
Networking devices together using audible sounds is an area that has been much overshadowed by more exotic wireless technologies. However, it has a number of attractive characteristics, such as requiring very low power, providing tangible feedback to the user that something is being transmitted, and depending on existing ubiquitous hardware (normal microphones and speakers).
The challenge is to write the software necessary to allow two devices to communicate with each other using only audible sounds. The coding scheme between the two devices is entirely up to your choice, and you can choose either to aim for a high bandwidth between the two machines, or other characteristics (for instance, masking the sound in a melodic way to make it sound nice). You must not require any equipment beyond standard consumer-grade microphones and speakers.
For more advanced work, you can investigate coding schemes such as CDMA, which offer very robust transmission rates at the expense of greater complexity.
This project will require a good working knowledge of Digital Signal Processing (DSP) and coding and compression techniques.
Contact: Anil Madhavapeddy
Phoneme Encoded Communications
Human speech can be decomposed into sequences of phonemes in a manner analogous to the construction of words from an alphabet. This project involves exploring the use of techniques for encoding speech as phoneme sequences as a way of reducing the bandwidth required for telecommunication and decreasing the storage requirements for archiving. This could entail exploring suitable encoding and decoding schemes, design of an accessible audio archive storage system, and implementation of data transport protocols.
A GPRS experimental testbed is available in the lab and the project could explore use with a mobile handset either pushing the raw audio stream to a dedicated server or encoding and decoding directly on the device.
Contact: Chris Clark or Rajiv Chakravorty
TCP Ideal
The Internet design makes for a clean seperation of routing concerns into the IP packet layer (level 3), and the end-to-end reliability and flow control concerns within the TCP transport layer (level 4). Resource management crosses both of these layers, as do questions of flow multiplex identification and so forth. XCP and SCTP represent protocol departures from this split, with some elements of functionality operating across both layers. This project would look at implementing such a TCP/IP stack, but with optimal packet design (all packets exchange all state with all concerned nodes), and a TCP optimiser which then removes redundent state dynamically (there's work at berkeley on this)
One way to start this might be with SCTP (see RFC2960 (www.ietf.org) for details) - or also XCP
Contact: Jon Crowcroft
Visual Wireless Comms
Looking at my old laptop, I noticed that it was "lookin at me" and I thought: Why not write a comms protocol that uses the camera/screen as a wireless communications link: In principle, we should be able to get a couple of megabits (provided screensaver doesnt kick in:-)
A back of the envelope calculation from first principles says that we can get 10 frames a second of raw vga resolution, so thats 640x480*10*24bps raw (75Mbps:-) approximately.
So this project would build an OSI model and implementation of a protocol stack for for the "pixelated internet".
1. Physical layer - photonic:-)
2. Link later - we need to have a modulation technique, but need a framing and error correcting code. We would devise a training session where we put "pink" noise on the screen so that the decoder behind the camera can find: registration/relative angle of camera ccd plane and screen; color/pixel's per bit; feasible rate (may vary across screen too); We could use min/max entropy here possibly.
  Over this we might use a simple framing protocol such as PPP.
3. Network=IP
4. transport=TCP
5. session layer=reboot/shutdown:-)
6. presentation layer=lightenupmyday
7. application layer=whatever, e.g. rsynch, http, etc
A nice excercise, but also potentially useful.
Contact: Jon Crowcroft
Investigate the feasibility of applying sentient computing techniques in experimental science labs
Sentient computing [1] is the proposition that applications can be made more responsive and useful by observing and reacting to the physical world. It involves the deployment of location tags and other physical sensors and the use of sensor data to generate logs and data models of the physical world.
In experimental laboratories sensors can be used to automate the recording of experiments, log the conditions under which they were carried out and the apparatus used. Errors in experimental procedure could be more easily and quickly detected and the monitoring and recording of the experimental procedures might be simplified. Some relevant work has been done at the University of Washington [2].
Sentient computing is an important branch of pervasive/ubiquitous computing and there is a substantial literature on these topics. [3] and [4] are good starting points for reading.
Project goal
To identify situations in a science lab where sentient computing techniques will add to the effectiveness or the accuracy of the experimental procedure and to produce a practical demonstration of feasibility.
Project activities
1. 1. Two illustrative application scenarios are given below. But further work on the identification of requirements forms a part of the project. Additional scenarios and detailed requirements will be constructed after a careful look at the activities in a science laboratory. This study will necessarily be quick, but it should be in sufficient depth to provide a good understanding of the working methods of the experimenters.
2. 2. Design and develop a prototype system using at least one type of sensor (e.g. location tagging, RFID tags, weight, light, sound sensing) that provides useful information to the experimenters.
Illustrative examples
1. Laboratory inventory management Chemistry and biology labs are complex places containing a large amount of apparatus and many compounds. Experimenters may find themselves spending too much time tracking down or recording the location and status of apparatus and materials.
If items of apparatus and materials were electronically tagged a basic inventory audit system could be deployed to record the location and status of each item. An initial approach might tag only those items that are safety-critical or whose location is critical to the success of experimental procedures.
2. Secure experiment log When a chemist undertakes a practical experiment, the apparatus could produce a digitally-signed log file. Not only would such a log constitute a (tamper-proof) record of the experiment, it would also allow the experiment to be replayed by another scientist. If the work was done by a student, the log file could be assessed by a robotic examiner!
It would be necessary to uniquely identify each sensor and the raw materials used. The products could then be traced in their subsequent use by the same or other scientists in further experiments, much as an automotive factory can track every nut and bolt back to the batch of steel from which they were forged. The audit trail of chemical experiments could then be traced from one experiment to the next: for example, if a lab technician prepares some of chemical X to be used the following week by a colleague, all the details of the preparation would be recorded. X might be stored on a shelf in an RFID-labelled beaker whose tag connects that batch of X to the history of its creation, and the creation of its ingredients, and so forth back to some trusted entry point. If, when used in a subsequent experiment one beaker of X gives low yield, or an impurity of some kind, etc. the logs could be used to find the problem. The goal of a student project might be to write a program using suitable database technology to log data from experiments, store it in a database, link experiments together (e.g. using RFID-labelled beakers), and produce digitally-signed log files analogous to a lab book write-up. An extension to the project might be to query the database in other ways: e.g. to trace potential chemical contamination by answering questions such as, "for what else has this pipette been used recently?" The student could extend the basic ideas in many ways. For example, they might want to use heartbeats to ensure that no-one disconnects the sensors during an experiment.
Key computer science skills and knowledge would be needed to address the following:
- i) use of appropriate database to store data;
- ii) use of appropriate encryption algorithms to provide security / authenticity / tamper-resilience / and even for privacy if experimenters want to keep their top-secret chemical research from others!
- iii) interaction with hardware to gather sensor data: the LCE would have to provide sensors and (say) a means of "pinging" the sensor to check that it is alive. The student could then write a protocol that "pings" regularly to make sure the hardware is OK.
Resources
Access to the Active Bat sensor network and SPIRIT location management system in the Laboratory for Communication Engineering. Support in the development of hardware and software for the prototype application.
References
- 1. Andy Hopper, The Royal Society Clifford Paterson Lecture, 1999 - Sentient Computing .Phil. Trans. R. Soc. Lond. A (2000) , Volume 358, Pages 2349-2358, Royal Society, August 2000. http://www-lce.eng.cam.ac.uk/publications/files/tr.1999.12.pdf
- 2. The Labscape Project - Ubiquitous Computing in the Cell Biology Laboratory, http://labscape.cs.washington.edu/
- 3. M. Weiser, "Some computer science issues in ubiquitous computing." Communications of the ACM , July 1993. Pages 75-84.
- 4. M. Satyanarayanan, "Pervasive Computing: Vision and Challenges." IEEE Personal Communications , August 2001. http://citeseer.nj.nec.com/satyanarayanan01pervasive.html
Contact: Jon Crowcroft
Network Protocol Extensions for Linux
The networking aspect of modern operating systems cater very much to TCP/IP, the protocol used on the Internet. There are, however, situations in which it would be desirable to use alternate protocols, without modifying applications written to use IP. Two examples of such protocols are as follows:
1. BLT - high-speed ethernet streaming. IP presents an unnecessary overhead for communicating between hosts on a local high-speed LAN. A simple protocol that provides only flow control may be used to achieve much higher throughput.
2. NAT traversal - It is impossible for two hosts, each behind a network address translator (NAT) to address eachother, as neither has an address in the Internet's address space. One solution to this problem is to use a rendezvous point that both hosts can connect to and forward messages through the midpoint.
To complete this project, you will construct a shim layer in Linux. This layer will allow alternate protocols to be installed, and mapped onto applications using the existing network API. Here is an example:
The new BLT protocol is installed, and a user-level tool allows me to map a new identifier, "fileserver-FAST" to indicate that I want to use BLT instead of IP to connect to the host fileserver. A call to gethostbyname for "fileserver-FAST" returns an unroutable IP address (i.e. 0.0.0.5) and sets up a mapping between that IP and the new protocol stack in the kernel. When an application tries to connect to 0.0.0.5, the socket they receive is remapped to use the new protocol instead of IP.
Successful completion of this project will implement the shim layer in Linux, and the BLT protocol (1). The NAT traversal is bonus effort.
Contact: Andy Warfield
400Mb/s WLAN emulation
Groups around the world are putting forward specifications for the next generation Wireless LAN standard, which will operate at up to 400Mb/s. However, at these high speeds there are serious problems with the current 802.11 Media Access Control standard, which needs to be modified to allow multiple IP packets to be sent in the same MAC frame, and to allow IP data to be 'piggy backed' with MAC acknowledgements. The trouble is, no one really knows what effect this batching will have on the performance of applications (Web, real player etc). People have done simulations, but real-world TCP/IP implementations are very different from the simple ones used in simulation.
The aim of this project is to build a high-performance WLAN emulator. It will be a PC with 2 or more Gigabit Ethernet cards representing the wireless base station and one or more wireless clients. The emulator will forward packets across the interfaces, but inserting delays (and possibly errors) as dictated by an internel simulation of the MAC protocol under investigation. Using the emulator, we can use machines running a range of operating systems and applications to find out what the performance of the system is likely to be like in real life. We can then use this information to develop better MAC protocols.
This project is best done by someone with experience in 'C', and an interest in WLAN and Ethernet protocols. You might even end up with your name in the new IEEE WLAN standard document ;-)
Contact: Ian Pratt
Triggers for Wireless Networks
Link_Up and Link_Down triggers has been used in MobileIP and Ad Hoc Network. The se triggers provide inputs to routing protocol and have significant effects on p erformance of wireless network. In general, it notify routing protocol when sign al strength is above or below predefined theshold levels. Apparently, this metho d is not able to cope with fluctuation of signal strength under dynamic mobility condition.
In this project, statistical information of link signal/noise ratio, queue, CPU utilisation, number of retries and energy are to be exchanged between adjacent n odes. You are expected to explore into proactive triggers by using these raw inf ormation. Successful triggers should able to exhibit high accuracy of near future event.
Contact: Meng How Lim
Fun with wireless networking equipment and PDAs
This isn't so much a project description, as note to say that the SRG has a number of GSM/GPRS-capable phones and datacards, Wireless LAN cards, and iPAQ PDAs that could be used in the execution of part II projects, providing you can come up with a cool application or service to use them for. We can provide help with configuring the kit, and hosting the server-side component of any applications.
Contact: Ian Pratt

Distributed Systems projects

Co-operative Web Servers
Having a reliable and fast WWW presence is important for many organisations, including the University. This is pretty tough for small or under funded organisations, particularly in the presence of machine or network failures, or unexpected spikes in demand (the 'slashdot effect'). However, the situation might be improved if a number of sites could agree to co-operate and transparently mirror, cache, and serve content on behalf of each other.
Suppose we had a group of e.g. 50 organisations across the world, each running their own server on a single PC. For each web site, a couple of nodes in the group would be selected as secondary replicas, and content mirrored to them. The machines would monitor each other, and use dynamic DNS to update the list of available hosts capable of serving the web site. If a demand surge is experienced by a web site, we would like nodes in addition to the replicas to start caching and serving content on behalf of the replicas. We can do this by having the replica nodes re-write links in the web pages returned to clients such that they point at other servers, which would then fetch and cache the content from the replicas. Ideally, we'd point clients at the server closest to them, to get best performance.
The system would be as much as possible `self-organising' without requiring sysadmin intervention. It would generate reports showing what work had been done on behalf of other sites, so the operators could identify 'free riders' and ask them to leave the federation. There's loads of good computer science in this project, though it would ideally require someone with experience using Apache and Squid. It could also have serious practical impact if done well.
Contact: Ian Pratt
Distributed Execution Tracing
PlanetLab is a set of over 150 machines distributed amongst research institutions across the world. It's a testbed to enable experimental deployment of new distributed services and algorithms e.g. peer to peer networks. We have arranged for the kernels on these machines to run Linux Trace Toolkit (LTT), to gather execution information of the prototype applications which are running on them. We have a research project which is attempting to harvest this tracing information from multiple machines, and combine the traces to gain insight into the patterns of communication and computation of these applications, and perhaps be able to make predictions as to how they would perform when scaled to larger numbers of machines as might happen in a real deployment.
The goal of this project would be to take the same tracing information on each node, but to selectively make it available to the actual applications running on the node, enabling them to adapt and tune their own behaviour based on the information observed. One part of this project would be implementing something akin to the "Berkeley Packet Filter" (normally used by tcpdump/libpcap) for trace events, enabling an application to selectively filter the set of events it wants to be informed about, and for the OS to determine which events an application should be allowed to receive (e.g. not events generated by applications run by other users). The second part of the project is to modify an application such that it optimises its behaviour based on information gleaned from the trace messages.
Contact: Ian Pratt

Trust and reputation management

Incentives for honest participation in trust management (Pinocchio)
Peer-to-peer systems, on-line auction sites and public computing platforms often employ trust management systems to allow users to share their experiences about the performance of other users in such settings (for example, the amazon marketplace and eBay ratings schemes). However, the success of these trust management systems depends heavily on the willingness of users to provide feedback. These systems have no mechanisms to encourage users to participate by submitting honest information. Providing rewards is an effective way to improve feedback, according to the widely recognised principle in economics which states that people respond to incentives.

Our view is that under these conditions the users who participate in the trust management scheme by submitting information about their interactions with others are, in fact, pursuing ``hidden'' rewards, often with unwanted effects. For instance, in the eBay case, there is strong empirical evidence to suggest that buyers and sellers advertise positive feedback regarding each other, seeking to increase in their reputation via mutual compliments. In this case, the reward implicitly offered by the system is the possibility of getting a positive review about oneself.

Also, people who have had particularly bad experiences will be normally more inclined to advertise their experiences as a form of revenge against the user that did not provide the desired service. Such hidden rewards bias the feedback system; users who have had average experiences with other users and are not aiming at increasing their reputation or seeking revenge against a bad service provider will have little reason to provide feedback. An explicit reward system has the advantage of attracting those users across the board.

This project will build Pinocchio, a framework for providing explicit incentives for honest participation in global-scale distributed trust management infrastructures. The system will improve the quality of information supplied by these systems by reducing free-riding and encouraging honesty. Our approach will be twofold: (1) we will provide rewards for participants that advertise their experiences to others, and (2) impose the credible threat of halting the rewards, for a substantial amount of time, for participants who consistently provide suspicious feedback. For this purpose we have developed an honesty metric which can indicate the accuracy of feedback.

More information about this project can be found in our paper in the 2nd International Conference on Trust Management.

Contact: Evangelos Kotsovinos

Operating System projects

Most of these are based around Xen, a Virtual Machine Monitor for x86, which lets you run multiple operating system images at the same time on the same PC hardware, with unprecedented levels of performance and resource isolation. Even under the most demanding workloads the performance overhead is just a few percent: considerably less than alternatives such as VMware Workstation and User Mode Linux. This makes Xen ideal for use in providing secure virtual hosting, or even just for running multiple OSes on a desktop machine.

The systems research group is continuing to develop and do research using Xen, but there are a couple of areas which we feel would make good self-contained student projects:

Resource Scheduling and Accounting
Xen contains resource schedulers to share out the physical resources of the hardware among the different virtual machines running on Xen. Currently, our schedulers for the CPU, network interface and disk are fairly rudimentary, and geared toward providing a "fair share" rather than providing controls to guarantee (or restrict) a guest OS to particular pre-agreed quota.
There are plenty of algorithms in the literature for schedulers which do this. The goal of this project will be to select appropriate algorithms, implement them, and perhaps enhance them for the virtual machine environment. Previous experience with C would be useful for this project. You'll have plenty of good data structures and algorithms stuff to write up in the dissertation.
Contact: Ian Pratt or Keir Fraser
Logging, Auditing and Denial of Service prevention
Xen is a key component of the XenoServers project, which has the grand vision of creating an "Open Infrastructure for Global Distributed Computing". We envisage a world in which Xenoserver execution platforms will be scattered across the globe and available for any member of the public to submit code for execution. The sponsor of the code will be billed for all the resources used or reserved during the course of execution. This will serve to encourage load balancing, limit congestion, and hopefully even make the platform self-financing.
One big problem we're going to have is that some (many?) users of the XenoServer infrastructure may use the system for nefarious purposes, for launching denial of service attacks, spamming, or as a platform for hacking other systems. We need a way of efficiently (in space and time) logging all the network activity occurring on Xen along with the 'sponsor' that caused it to occur, keeping it safe in case forensic analysis is required. We also need to be able to detect and filter or rate limit operations which are likely to be dubious. For example, "limit outgoing connections to SMTP ports to be one per user per 10 seconds".
We can also quench a large class of denial of service attack by looking at the symmetry of the number of packets received on a connection to the number sent: for a TCP connection you should receive an ACK every other packet. If not, chances are you're hosing the remote host, and hence Xen should throttle the connection until the symmetry improves. A similar technique can be used to throttle attempts to do port scanning from XenoServers.
The goal of this project is to implement some of the detection and throttling mechanisms described above, and maybe think of a few more.
Contact: Ian Pratt
Xen suspend/resume for migration
One cool feature we'd like to add to Xen would be the ability to "suspend to file" an operating system image running over Xen, and then "resume from file" on another machine running Xen, hence moving an operating system and along with all the applications running over it! Because of the virtualization Xen provides, this sort of thing isn't impossible, but it will require someone with a decent knowledge of how the x86 handles page tables and such like.
The code that does the spend/resume will run in another operating system image, inspecting the memory footprint of the operating system to be suspended, and then writing it to disk, along with a load of meta information. The hard part is that when the OS is 'resumed', the physical memory pages that it was using are unlikely to be the same ones it's been allocated this time round, so its going to be necessary to re-write the page tables before resuming the OS, based on the meta information that's been stored.
As an extension, you could modify the guest operating system to make suspend resume more efficient, perhaps by evicting the entire buffer cache, hence minimising the suspended state.
Contact: Ian Pratt, Keir Fraser or Steven Hand
Restartable Device Drivers
Like most operating systems, the Xen Virtual Machine Monitor runs its device drivers in the most privileged execution level (ring 0 in x86 terminology), giving them unrestricted access to memory, I/O addresses, privileged instructions etc. Unfortunately, many device drivers are not written by programmers who actually understand the subtleties of writing kernel code. Forgetting to turn interrupts on, or accessing an un-mapped page are common mistakes. Unfortunately, these simple errors can result in the OS/VMM crashing or just hanging.
System reliability could be significantly improved if device drivers were moved out of 'ring 0', and executed in a somewhat more restricted environment where bugs can do less harm. For example, only giving device drivers I/O access to the particular piece of hardware they're supposed to be controlling, limiting what memory they can write to, ensuring that interrupts aren't turned off for long etc. If a device driver then causes a page fault, or violates any of the assertions, it can then just be stopped, and potentially restarted, hopefully clearing the fault without any users even noticing.
Of course, some faults (like DMA'ing a received network packet to a bad address) can't be protected against, but hopefully that kind of error isn't common. To do this project you should have a decent knowledge of device drivers, and a bit about x86 memory protection.
Contact: Ian Pratt or Keir Fraser
Copy-on-write Filesystem
The Xen Virtual Machine Monitor shares the storage capacity of underlying disks between guest operating systems (guestOSes). In many cases, however, several guestOSes may be running a similar system image (e.g. a several virtual machines each hosting apache on linux) and it would be nice to minimize the storage cost where possible.
One way of achieving this is to provide a copy-on-write file-system where each guestOS has a private copy only of those files it creates or modifies and retains a shared copy of all files it accesses read-only. Work within the SRG is already close to achieving this in the context of NFS, but it would also be interesting to build such a system without requiring interaction between (potentially competing) virtual machines.
This project would implement a copy-on-write filesystem for Linux / XenoLinux which allowed the use of an ordered set of partitions / devices to hold a single logical file-system image, somewhat like "nested mounts" in BSD. The overall system would be evaluated in terms of time overhead and space saving for a set of common application scenarios. If time permits (and code is ready) a comparison may also be made against the NFS-based scheme.
Contact: Steven Hand
Querying File System
Contemporary filing systems map names to files by using a hierarchical lookup scheme built from nested directories. This allows a relatively efficient name resolution algorithm along with considerable flexibility for the user. Search paths or naming conventions may be used to prefer a given version of a file. In many cases, however, users would prefer a scheme which allows them to retrieve or specify files based on queries, boolean expressions relating to meta-data such modification date, owner, type, keywords, etc. Such operations may return 0, 1, or many files.
This project will investigate suitable schemes for both on-disk and in-core data structures to support a query filing system (QFS). A prototype implementation will be developed at user-space level ( e.g. under Linux using the UserFS library). As a possible extension, information about queries could be used to provide "hints" to the buffer cache and prefetching scheme and so improve performance.
Note: this project has been done successfully a couple of times before; interested students may like to look at the Part II project dissertations in the library for accelerated background reading.
Contact: Steven Hand

Hardware projects

Fun with solid-state accelerometers
It's now possible to buy miniature 2 axis accelerometers that output very accurate acceleration data in digital form. This can be integrated to determine relative velocity or even position. This project would involve building some simple hardware (perhaps interfacing to a serial line on a PDA), and writing software to use the acceleration data. For example, you could do a gesture based UI, auto-updating map, magic pen, tracking system for a radio controlled car etc...
Contact: Ian Pratt
Software AM radio / Location using AM radio signals
High-speed analog to digital converters now mean it is possible to digitize the entire AM radio band, then performing the 'tuning' in software on a PC. The first phase of this project is to assemble the hardware and software to build such a "software" radio (we have a high-speed ADC PCI card that could be used). One direction the project could take would be to use AM radio signals for location: AM radio stations emmit a very stable carrier frequency from a fixed location (the transmitter). By using a digital radio to monitor phase changes in the carrier frequency, it should be possible to tell whether a mobile device has moved towards, or away from, the transmitter. By tuning into several AM stations simultaneously it should be possible to recover accurate position information.
This project could take two forms: A student interested in hardware could build the digital radio, or perhaps use a standard PCI card with high-speed ADC. Alternatively, the work could be done entirely in simultaion. Either way, there's loads of cool DSP stuff to be done.
Contact: Ian Pratt

Security projects

Secure packets
The goal of this project is to design and build a proof of concept implementation of a networking protocol that would enable "opportunistic encryption" to be utilized whenever two Internet hosts wished to communicate. The aim is to achieve this in as efficient a way as possible. For example, the payload data contained first few packets of a new flow would be encrypted using a public key cipher while a key exchange takes place piggy-backed on top of the data packets to enable a symmetric cipher to be used for the remainder of the flow. Multiple TCP and UDP connections between the same hosts could make use of the same tunnel, which would be timed-out after being idle for some period.
A key extension to the project would be to prevent a class of denial of service attack whereby a host or hosts bombard a receiver with flow-setup packets that it needs to decipher using an expensive PK algorithm. One way to try and avoid this problem would be to use "crypto puzzles", whereby the sender needs to do a significant amount of computation in order to generate a ticket that is attached to the packet. The puzzle is asymmetric in that the receiver is able to easily check whether the ticket is valid for this particular packet. Hence, it's much harder to create a flood of packets that overloads the receiver.
Contact: Ian Pratt

Suggestions from

Schedulers for Virtual Machines
Xen is a virtual machine monitor (primarily developed within the NetOS group at the Computer Lab) capable of running multiple guest operating systems concurrently. The primary guest OS is Linux, with others being in the progress of being ported.
Xen currently implements one particular scheduling algorithm. The aim of this project is to implement a number of different scheduling algorithms and compare their characteristics against each other. As part of this project it is anticipated to design and implement a general interface for scheduling algorithm. The project could be extended to address issues of process migration in multi-processor and multi-threaded systems.
Contact: Rolf Neugebauer
Control Plane Virtualization
In addition to forwarding data packets, IP routers run a number of control plane protocols. Normally such protocols run on a single processor, which can lead to resource contention issues that can impact reliability for the entire network. For example, within a single routing stack, different protocols - such as BGP and OSPF - may contend for memory and CPU resources, and starvation can cause the loss of routing adjacencies and network-wide disruptions. Or, in the area of Virtual Private Networks (VPNs), a single router might be supporting multiple disjoint VPNs. It is difficult to implement control-plane protocols so that instability in one VPN cannot impact other VPNs due to shared resource contention. Yet without such guarantees such VPNs are not strictly isolated form one another.
In general we are interested in exploring the application of more advanced operating system mechanisms, in particular, virtual machine technology, to ameliorating such control plane problems. We have a number of ideas about possible Part II projects in this area involving Network Processor (IXP) programming; operating system kernel development, in particular related to the Xen virtual machine monitor, and general control plane software architecture.
Contact: Tim Griffin and Rolf Neugebauer
Ad-hoc Name Space Service
Very often we want to communicate or locate other people using our laptops or PDAs while in indoor environments, for instance, in a meeting room or inside a large building. It seems an easy task when network infrastructure and services to support this are locally available. However, sometimes this is not the case either because no infrastructure can be found or simply because we don't want to rely on external services such as centralised messenger servers.
With 802.11 wireless networks in peer-to-peer mode, we can set up an ad-hoc network so that machines can communicate directly with each other without using an Access Point. Although this gives us the necessary communication platform, basic services still need to be made available.
This project aims at developing an ad-hoc name service for ad-hoc networks that can be used by applications such instant messaging. This service should be designed and implemented on top of a distributed tuple space, similarly to Linda (see D. Gelernter, "Generative communication in Linda", ACM Computing Surveys, 7(1):80-112, Jan. 1985), where each participating node holds parts of the entire tuple space. A tuple is a data structure consisting of one or more typed fields. ("foo", 75, false) is an example of a tuple with three typed fields: a string ("foo"), an integer (75) and a boolean (false). The Linda framework offers operations to insert/delete tuples to/from a tuple space. To validate the prototype implementation of this service, a simple instant messaging application may be developed.
If time allows, as an optional task, this name service may be implemented in a sensor network that we have available at Intel Research Cambridge.
Contact: Marcelo Pias, Christophe Diot or Derek McAuley
Hash Functions in Networking - Performance Evaluation Kit
Hashing is used extensively in contemporary networks, to improve performance of various lookup and memory access schemes, to provide for load balancing among multiple targets, and for various other tasks. Although selected publications exist, when designing a particular application, it is often not easy to determine which hash function suits best the particular task and the particular input data distribution.
In fact, it is generally impossible to evaluate a hash function on a theoretical basis only, as for any deterministic hash function, such input data patterns can be constructed that would cause the hash function to perform poorly. Thus, a hash function can only be evaluated against data exhibiting similar properties to the data that the application is destined for.
The authors have interest in hash functions as apply to network traffic and propose that this project develop a framework for quick testing and evaluating any hash function against a representative set of networking traces, being rapidly able to compare the following criteria:
- computation overhead
- number of collisions in various table sizes
- entropy of input and output data
- comparison to the best-known hash functions
Pre-requisites: basic programming, basic information theory
Additionally a knowledge of MATLAB helpful although this is not a prerequsite; access to MATLAB will be provided if necessary. Representative network trace data will also be supplied.
Contact: L. Kencl (Intel) or A. Moore (Cambridge University)
Sensors for the Evaluation of Football Rules - Demonstrator and Visualizer
The goal of this project is to develop a demonstrator of the usefulness of deploying sensors to monitor the movement of participants and of the ball in ball games. The work would consist of two parts:
- simulate the collection of the input data from an existing match (processed video footage or available collected robot match data);
- develop and implement scenarios for analysing the data and providing real-time feedback to the referee, coach, spectators, about various aspects of the game (rules confirmation, players activity, etc.).
It may be possible to obtain additional data from LCE regarding their robot football tournaments. There is also an option to process still images from exisiting games using the Virtual Referee Tool developed at the TECGRAPH Laboratory at PUC Rio de Janeiro: The tool would be applied to stills of a football video footage to obtain the sample data for the project.
Pre-requisites: basic programming, ball games knowledge helpful
Contact: L. Kencl (Intel)
Query Optimization for Record Streams
Many network measurement applications (route monitoring of BGP, OSPF, IS-IS, TCP trace analysis, HTTP trace analysis) operate on streams of data objects that can be thought of a records. When processing streams at high data rates or large streams stored in trace archives, it is important to make data parsing and filtering as efficient as possible. Most applications are not interested in all available data, but only in a subset of records that can be identified by some Boolean predicate over the attribute values of each record. Stream processing can be greatly improved by using such predicates as input to a "query optimizer" that parses the minimal number of required to determine the result of the predicate.
This project will involve:
1. Design of a simple specification language for record formats.
2. Design of a simple specification language for the "atomic predicates" required each attribute of a give record format specification. (Example: "use regular expression matching for attribute FOO.")
3. Design and implementation of a "query optimizer" that optimizes the stream filtering of formatted records with respect to an arbitrary Boolean queryconstructed from atomic predicates.
4. Testing the system on a collection of network trace data.
Contact: Tim Griffin (Intel)



© 2004 University of Cambridge Computer Laboratory Please send any comments to steven.hand@cl.cam.ac.uk Page last updated on 4-Oct-2004 at 09:55 by Evangelos Kotsovinos