| External
This year, Intel Research Cambridge, who are in the Gates Building with
the Computer Lab and LCE,
have suggested some projects - see below for
some details. Other projects from them and from the LCE (Lab for Communicaitons
Engineering) are also within this doc.
 Miscellaneous
Route Planning on the London Underground
A couple of years ago there was a very successful diploma project
that developed a simple system for planning the fastest route
between pairs of London Underground stations, outputting which
line to take and which stations to change at.
 
The Underground map was treated as a graph in which the edges
were labelled with the travel time between adjacent stations. No
account was taken of the cost to change between lines, thus
simple application of Dijkstra can find the shortest path. 
One of the 'outputs' of the previous project was the Underground
travel time information in a machine readable form, and this
could be used as an input into this year's project and hence save
time allowing other extensions to be explored. 
One nice extension would be to develop a parallel version of the
route planning algorithm to enable the decision to be made more
rapidly, and hence be prepared for any massive expansion of the
Tube network ;-) 
A further extension might take train frequency into account, and
hence calculate 'expected', 'best case' and 'worst case' times
for the path. 
Alternatively, one could consider a different and much larger
network such as the UK rail network. The UK rail time table is
available as a PDF document, and this could be parsed into an
internal representation hence providing a more complex database
in which route lookups could be performed. 
This project will provide ample opportunity for you to show off
your computer science skills! 
Potential Supervisor:Jagdish Modi at89@dial.pipex.com / Ian Pratt ian.pratt@cl.cam.ac.uk
 
Special resources:For some of the extensions, access to a large multiprocessor
  machine will be provided by the the Systems Research Group.
 
References:  
 Wilson R., Introduction to Graph Theory, Academic Press, 1972
 Hollow S., Route Planning on the London Underground,
  Diploma project, Computer Laboratory 2001
 http://www.nationalrail.co.uk/enquiries/nrt_upd.asp
 
|  |  Programming language projects
Easy lock-free programming
Some of our recent research on lock-free programming has been
looking at how to provide concurrency control primitives that are easy
to use (i.e. hopefully more natural constructs for mainstream
programmers than things like the mutexes, condition variables and
semaphores covered in Part 1B CS&A).
 
One promising approach is to automate the difficult aspects of
the programming and to allow the programmer to use a higher level
syntax.  For example, to transfer a value between two hash tables they
could write:
 
        atomic {
          Object val = h1.remove(key);
          h2.insert (key, val);
        }
We have an initial implementation of this built of a Software
Transactional Memory (STM).  Essentially, the STM provides
read-word and write-word operations and a way of
grouping them together into atomic transactions.  Our paper at OOPSLA 2003 has more
details.  In many cases this system performs better than a
simple implementation using the synchronized keyword.
 
There are a number of possible student projects that could build on
this work:
 
 Our current implementation has focused on the STM; the rest is a
mess of unreleasable code.  In particular, the compiler support for
mapping atomic onto a series of STM operations would benefit
from being thought out more clearly and implemented more carefully.
 As part of his PhD work, Keir Fraser has
developed an alternative STM with a different object based
interface.  It would be interesting to explore whether this can also
be used to implement a construct such as atomic.
 
Both of these projects would involve design work followed by
modification to the javac source-to-bytecode compiler.
 Contact: Tim Harris
 
 
Log-file query language
Numerous research projects involve generating and analysing log
files - for instance traces of activity recorded by syslog, traces
of operating systems or library services, traces of network
packets, or of object allocations and de-allocations.  Many people
assemble assortments of ad-hoc tools and scripts for analysing
these files.
 
Existing data query languages (e.g. SQL) are rarely suited to these
problems.  For example, consider how to express "calculate the
total number of bytes read from the file xyz" given a list of the
system calls that a process has invoked.  It is necessary to find
each "read" system call, then to find the most recent "open" call
which returned the file descriptor that the read specifies, and
then to form the total of the number of bytes returned by those
reads.
 
Many examples seem to involve joins involving notions of "most
recent" or "closest subsequent".  For example, finding the most
recent "write" that preceded a given "read" and making sure that
the data written matches the data read.
 
This project is to design an effective system for analysing log
files.  One option would be to extend the language supported by an
existing tool such as the mysql database, although you would need
to agree the exact strategy with your supervisor.
 Contact: Tim Harris
 
 
|  |  System modelling projects
Running a combinatorial auction
Combinatorial auction designs have been proposed in a variety of
situations, notably within the telecommunications industry where they
have been considered for the allocation of radio spectrum and for the
distribution of subsidies for universal service.  This project will
design and implement such an auction using a recently proposed auction
design with improved (in fact, Polynomial) computational performance
relative to standard designs (which are NP-complete).  The project
will implement the Auctioneer's process for accepting and selecting
winning bids from remote bidders.  Since the auction design includes
multiple rounds a format will need to be designed to publish interim
results to all bidders.
 
For more details see: Frank Kelly and Richard Steinberg A Combinatorial
Auction with Multiple Winners for Universal Service  Management
Science 46 (2000) 586-596.  
 Contact: Richard 
Gibbens 
Performance modelling tool for enterprise IP networks
Performance modelling techniques originally applied to
circuit-switched telephony networks (with great success) have recently
been modified and proposed for IP networks.  This project will involve
the design and implementation of a modelling tool using performance
modelling techniques based on fixed-point approaches.  The project
will include both the construction of the computational methods as
well as the necessary interfaces to suitable input and output formats
in order to complete a modelling tool suitable for network designers.
 
For more details see: R. J. Gibbens, S. K. Sargood, C. Van Eijl, F. P. Kelly,
H. Azmoodeh, R. N. Macfadyen, N. W. Macfadyen Fixed-Point
Models for the End-to-End Performance Analysis of IP Networks In
Proc 13th ITC Specialist Seminar: IP Traffic Measurement, Modeling
and Management, Sep 2000, Monterey, CA.
 Contact: Richard 
Gibbens 
 
|  |  Networking projects
Receiver Layered Multicast
Network audio and video require reasonably high bandwidth and
relatively low jitter (variation in delay). These 
requirements make it difficult to deliver high fidelity data directly
from the network. However these media types may be encoded in a 
layered format; i.e. as a number of layers 1, 2, 3, ... n.
 
The receiver may choose to decode any prefix of the sequence,
obtaining increased fidelity in each case.
It is possible to distribute audio and video in a layered format 
where each of n layers is given a 
different multicast "channel". Receivers 
may then decide to receive the first k of these, and to decode
(and render) the first j <= k of these in 
order to maintain real-time 
rates and/or to match the fidelity to the characteristics of the 
receiving terminal.
 
This project would choose or construct a layered encoding scheme for
either video or audio (or both), and develop transmitting and
receiving software. 
As a first step, broadcast on a local ethernet could be used 
in the place of multicast, although it would be nicer (and more valid)
to use multiple networks and a "proper" multicast protocol.
 Contact: Steven
Hand
 
Predicate Routing
In current IP networks, the routing state of the system is primarily
represented as a set of routing tables (local to each router) and a
set of filtering rules (also local to each router or firewall).  In
contrast, Predicate Routing represents the state of the network as a
set of boolean expressions associated with links which assert
which kinds of packet can appear where.  From these
expressions, routing tables and filter rules can be derived
automatically.  Conversely, the consequences of a change in network
state can be calculated for any point in the network (link,
router, or end system), and predicates derived from known
configuration state of routers and links. This subsumes notions of
both routing and firewalling. More details can be found in the 
following  
paper.
 
This project would involve designing and implementing a network 
control plane for predicate routing, probably in centralised manner. 
An extension would be to develop an inference tool to alllow one 
to derive properties of the network. 
A more ambitious project is to attempt to build a distributed version 
either in simulation (for example using SSF), or by packaging up a 
real implementation (e.g. Zebra). 
 Contact: Steven Hand
 
Audio Networking
Networking devices together using audible sounds is an area 
that has been much overshadowed by more exotic wireless technologies.
However, it has a number of attractive characteristics, such as requiring
very low power, providing tangible feedback to the user that something
is being transmitted, and depending on existing ubiquitous hardware
(normal microphones and speakers).
 
The challenge is to write the software necessary to allow two devices
to communicate with each other using only audible sounds.  The coding
scheme between the two devices is entirely up to your choice, and you
can choose either to aim for a high bandwidth between the two machines,
or other characteristics (for instance, masking the sound in a melodic
way to make it sound nice).  You must not require any equipment beyond
standard consumer-grade microphones and speakers.
 
For more advanced work, you can investigate coding schemes such as 
CDMA, which offer very robust transmission rates at the expense of
greater complexity.
 
This project will require a good working knowledge of Digital Signal
Processing (DSP) and coding and compression techniques.
 Contact: Anil 
Madhavapeddy 
Phoneme Encoded Communications
Human speech can be decomposed into sequences of phonemes in a manner
analogous to the construction of words from an alphabet.
This project involves exploring the use of techniques for encoding speech as
phoneme sequences as a way of reducing the bandwidth required for
telecommunication and decreasing the storage requirements for archiving.
This could entail exploring suitable encoding and decoding schemes,
design of an accessible audio archive storage system, and implementation
of data transport protocols.
 
A GPRS experimental testbed is available in the lab and the project
could explore use with a mobile handset either pushing the raw audio
stream to a dedicated server or encoding and decoding directly on the
device.
 Contact: Chris Clark 
 or Rajiv Chakravorty 
TCP Ideal
The Internet design makes for a clean seperation of routing concerns
into the IP packet layer (level 3), and the end-to-end reliability and
flow control concerns within the TCP transport layer (level 4).
Resource management crosses both of these layers, as do questions of
flow multiplex identification and so forth.
XCP and SCTP represent protocol departures from this split, with some
elements of functionality operating across both layers. 
This project would look at implementing such a
TCP/IP stack, but with optimal packet design 
(all packets exchange all state with
all concerned nodes), and a TCP optimiser which then removes redundent
state dynamically (there's work at berkeley on this)
 
One way to start this might be with SCTP (see RFC2960 (www.ietf.org)
for details) - or also XCP
 Contact: Jon 
Crowcroft 
Visual Wireless Comms
Looking at my old laptop, I noticed that it was "lookin at me" and I thought: 
Why not write a comms protocol that uses the camera/screen as a
wireless communications link: In principle, we
should be able to get a couple of megabits (provided screensaver doesnt
kick in:-)
 
A back of the envelope calculation from first principles says that
we can get 10 frames a second of raw vga resolution, so thats
640x480*10*24bps raw (75Mbps:-) approximately.
 
So this project would build an
OSI model and implementation of a protocol stack for
for the "pixelated internet".
 
Physical layer - photonic:-)
Link later - we need to have a modulation technique,
but need a framing and error correcting code. We would devise
a training session where we put "pink" noise on the screen so
that the decoder behind the camera can find:
registration/relative angle of camera ccd plane and screen;
color/pixel's per bit; feasible rate (may vary across screen too);
We could use min/max entropy here possibly.
Over this we might use a simple framing protocol such as PPP.
Network=IP
transport=TCP
session layer=reboot/shutdown:-)
presentation layer=lightenupmyday
application layer=whatever, e.g. rsynch, http, etc
 
A nice excercise, but also potentially useful.
 Contact: Jon 
Crowcroft 
Investigate the feasibility of applying sentient computing techniques in experimental science labs
Sentient computing [1] is the proposition that applications can be made more responsive and
useful by observing and reacting to the physical world. It involves the deployment of location tags and other physical sensors and the use of sensor data to generate logs and data models of the physical world.
 
In experimental laboratories sensors can be used to automate the recording of experiments, log the conditions under which they were carried out and the apparatus used. Errors in experimental procedure could be more easily and quickly detected and the monitoring and recording of the experimental procedures might be simplified. Some relevant work has been done at the University of Washington [2]. 
 
Sentient computing is an important branch of pervasive/ubiquitous computing and there is a substantial literature on these topics. [3] and [4] are good starting points for reading.
 
Project goal
 
To identify situations in a science lab where sentient computing techniques will add to the effectiveness or the accuracy of the experimental procedure and to produce a practical demonstration of feasibility.
 
Project activities
 
1. Two illustrative application scenarios are given below. But further work on the identification of requirements forms a part of the project. Additional scenarios and detailed requirements will be constructed after a careful look at the activities in a science laboratory. This study will necessarily be quick, but it should be in sufficient depth to provide a good understanding of the working methods of the experimenters.
2. Design and develop a prototype system using at least one type of sensor (e.g. location tagging, RFID tags, weight, light, sound sensing) that provides useful information to the experimenters.
 
Illustrative examples
1. Laboratory inventory management
Chemistry and biology labs are complex places containing a large amount of apparatus and many compounds. Experimenters may find themselves spending too much time tracking down or recording the location and status of apparatus and materials.
If items of apparatus and materials were electronically tagged a basic inventory audit system could be deployed to record the location and status of each item. An initial approach might tag only those items that are safety-critical or whose location is critical to the success of experimental procedures. 
2. Secure experiment log
When a chemist undertakes a practical experiment, the apparatus could produce a digitally-signed log file. Not only would such a log constitute a (tamper-proof) record of the experiment, it would also allow the experiment to be replayed by another scientist. If the work was done by a student, the log file could be assessed by a robotic examiner!
It would be necessary to uniquely identify each sensor and the raw materials used. The products could then be traced in their subsequent use by the same or other scientists in further experiments, much as an automotive factory can track every nut and bolt back to the batch of steel from which they were forged. The audit trail of chemical experiments could then be traced from one experiment to the next: for example, if a lab technician prepares some of chemical X to be used the following week by a colleague, all the details of the preparation would be recorded. X might be stored on a shelf in an RFID-labelled beaker whose tag connects that batch of X to the history of its creation, and the creation of its ingredients, and so forth back to some trusted entry point. If, when used in a subsequent experiment one beaker of X gives low yield, or an impurity of some kind, etc. the logs could be used to find the problem.
The goal of a student project might be to write a program using suitable database technology to log data from experiments, store it in a database, link experiments together (e.g. using RFID-labelled beakers), and produce digitally-signed log files analogous to a lab book write-up.
An extension to the project might be to query the database in other ways: e.g. to trace potential chemical contamination by answering questions such as, "for what else has this pipette been used recently?" 
The student could extend the basic ideas in many ways. For example, they might want to use heartbeats to ensure that no-one disconnects the sensors during an experiment.
 
Key computer science skills and knowledge would be needed to address the following:
 
i) use of appropriate database to store data;
ii) use of appropriate encryption algorithms to provide security / authenticity / tamper-resilience / and even for privacy if experimenters want to keep their top-secret chemical research from others!
iii) interaction with hardware to gather sensor data: the LCE would have to provide sensors and (say) a means of "pinging" the sensor to check that it is alive. The student could then write a protocol that "pings" regularly to make sure the hardware is OK.
 
Resources
 
Access to the Active Bat sensor network and SPIRIT location management system in the Laboratory for Communication Engineering. Support in the development of hardware and software for the prototype application.
 
References
 
1. Andy Hopper, The Royal Society Clifford Paterson Lecture, 1999 - Sentient Computing .Phil. Trans. R. Soc. Lond. A (2000) , Volume 358, Pages 2349-2358, Royal Society, August 2000. http://www-lce.eng.cam.ac.uk/publications/files/tr.1999.12.pdf
2. The Labscape Project - Ubiquitous Computing in the Cell Biology Laboratory, http://labscape.cs.washington.edu/
3. M. Weiser, "Some computer science issues in ubiquitous computing." Communications of the ACM , July 1993. Pages 75-84.
4. M. Satyanarayanan, "Pervasive Computing: Vision and Challenges." IEEE Personal Communications , August 2001.  http://citeseer.nj.nec.com/satyanarayanan01pervasive.html
 Contact: Jon
Crowcroft 
Network Protocol Extensions for Linux
The networking aspect of modern operating systems cater very much to
TCP/IP, the protocol used on the Internet.  There are, however,
situations in which it would be desirable to use alternate protocols,
without modifying applications written to use IP.  Two examples of such
protocols are as follows:
  
  
 BLT - high-speed ethernet streaming.  IP presents an unnecessary
overhead for communicating between hosts on a local high-speed LAN.  A
simple protocol that provides only flow control may be used to achieve
much higher throughput.
 NAT traversal - It is impossible for two hosts, each behind a
network address translator (NAT) to address eachother, as neither has an
address in the Internet's address space.  One solution to this problem
is to use a rendezvous point that both hosts can connect to and forward
messages through the midpoint.
 
To complete this project, you will construct a shim layer in Linux.
This layer will allow alternate protocols to be installed, and mapped
onto applications using the existing network API.  Here is an example:
The new BLT protocol is installed, and a user-level tool allows me to
map a new identifier, "fileserver-FAST" to indicate that I want to use
BLT instead of IP to connect to the host fileserver.  A call to
gethostbyname for "fileserver-FAST" returns an unroutable IP address
(i.e. 0.0.0.5) and sets up a mapping between that IP and the new
protocol stack in the kernel.  When an application tries to connect to
0.0.0.5, the socket they receive is remapped to use the new protocol
instead of IP. 
Successful completion of this project will implement the shim layer in
Linux, and the BLT protocol (1).  The NAT traversal is bonus effort.
 Contact: Andy 
Warfield 
400Mb/s WLAN emulation
Groups around the world are putting forward specifications for the
next generation Wireless LAN standard, which will operate at up to
400Mb/s. However, at these high speeds there are serious problems with
the current 802.11 Media Access Control standard, which needs to be
modified to allow multiple IP packets to be sent in the same MAC
frame, and to allow IP data to be 'piggy backed' with MAC
acknowledgements. The trouble is, no one really knows what effect this
batching will have on the performance of applications (Web, real
player etc). People have done simulations, but real-world TCP/IP
implementations are very different from the simple ones used in
simulation.
 
The aim of this project is to build a high-performance WLAN
emulator. It will be a PC with 2 or more Gigabit Ethernet cards
representing the wireless base station and one or more wireless
clients. The emulator will forward packets across the interfaces, but
inserting delays (and possibly errors) as dictated by an internel
simulation of the MAC protocol under investigation. Using the
emulator, we can use machines running a range of operating systems and
applications to find out what the performance of the system is likely
to be like in real life. We can then use this information to develop
better MAC protocols.
 
This project is best done by someone with experience in 'C', and an
interest in WLAN and Ethernet protocols. You might even end up with
your name in the new IEEE WLAN standard document ;-)
 Contact: Ian Pratt 
Triggers for Wireless Networks
Link_Up and Link_Down triggers has been used in MobileIP and Ad Hoc
Network. The se triggers provide inputs to routing protocol and have
significant effects on p erformance of wireless network. In general,
it notify routing protocol when sign al strength is above or below
predefined theshold levels. Apparently, this metho d is not able to
cope with fluctuation of signal strength under dynamic mobility
condition.
 
In this project, statistical information of link signal/noise ratio,
queue, CPU utilisation, number of retries and energy are to be
exchanged between adjacent n odes. You are expected to explore into
proactive triggers by using these raw inf ormation. Successful
triggers should able to exhibit high accuracy of near future event.
 Contact: Meng How Lim 
Fun with wireless networking equipment and PDAs
This isn't so much a project description, as note to say that the SRG
has a number of GSM/GPRS-capable phones and datacards, Wireless LAN
cards, and iPAQ PDAs that could be used in the execution of part II
projects, providing you can come up with a cool application or
service to use them for. We can provide help with configuring the kit,
and hosting the server-side component of any applications.
 Contact: Ian Pratt
 
 
|  |  Distributed Systems projects
Co-operative Web Servers
Having a reliable and fast WWW presence is important for many
organisations, including the University.  This is pretty tough for
small or under funded organisations, particularly in the presence of
machine or network failures, or unexpected spikes in demand (the
'slashdot effect').  However, the situation might be improved if a
number of sites could agree to co-operate and transparently mirror,
cache, and serve content on behalf of each other.
 
Suppose we had a group of e.g. 50 organisations across the world, each
running their own server on a single PC.  For each web site, a couple
of nodes in the group would be selected as secondary replicas, and
content mirrored to them. The machines would monitor each other, and
use dynamic DNS to update the list of available hosts capable of
serving the web site. If a demand surge is experienced by a web site,
we would like nodes in addition to the replicas to start caching and
serving content on behalf of the replicas. We can do this by having
the replica nodes re-write links in the web pages returned to clients
such that they point at other servers, which would then fetch and
cache the content from the replicas.  Ideally, we'd point clients at
the server closest to them, to get best performance.
 
The system would be as much as possible `self-organising' without
requiring sysadmin intervention. It would generate reports showing
what work had been done on behalf of other sites, so the operators
could identify 'free riders' and ask them to leave the federation.
There's loads of good computer science in this project, though it
would ideally require someone with experience using Apache and Squid.
It could also have serious practical impact if done well.
 Contact: Ian Pratt 
Distributed Execution Tracing
PlanetLab is a set of over 150 machines distributed amongst research
institutions across the world. It's a testbed to enable experimental
deployment of new distributed services and algorithms e.g. peer to
peer networks.  We have arranged for the kernels on these machines to
run Linux Trace Toolkit (LTT), to gather execution information of the
prototype applications which are running on them. We have a research
project which is attempting to harvest this tracing information from
multiple machines, and combine the traces to gain insight into the
patterns of communication and computation of these applications, and
perhaps be able to make predictions as to how they would perform when
scaled to larger numbers of machines as might happen in a real
deployment.
 
The goal of this project would be to take the same tracing information
on each node, but to selectively make it available to the actual
applications running on the node, enabling them to adapt and tune
their own behaviour based on the information observed. One part of
this project would be implementing something akin to the "Berkeley
Packet Filter" (normally used by tcpdump/libpcap) for trace events,
enabling an application to selectively filter the set of events it
wants to be informed about, and for the OS to determine which events
an application should be allowed to receive (e.g. not events generated
by applications run by other users).  The second part of the project
is to modify an application such that it optimises its behaviour based
on information gleaned from the trace messages.
 Contact: Ian Pratt 
 Trust and reputation management
Incentives for honest participation in trust management (Pinocchio)
Peer-to-peer systems, on-line auction sites and
public computing platforms often employ trust management systems to allow users
to share their experiences about the performance of other users in such
settings (for example, the amazon marketplace and eBay ratings schemes).
However, the success of these trust management systems depends heavily on the
willingness of users to provide feedback.  These systems have no mechanisms to
encourage users to participate by submitting honest information.  Providing
rewards is an effective way to improve feedback, according to the widely
recognised principle in economics which states that people respond to
incentives.
 Our view is that under these conditions the users who participate in the
trust management scheme by submitting information about their interactions with
others are, in fact, pursuing ``hidden'' rewards, often with unwanted effects.
For instance, in the eBay case, there is strong empirical evidence to suggest
that buyers and sellers advertise positive feedback regarding each other,
seeking to increase in their reputation via mutual compliments. In this case,
the reward implicitly offered by the system is the possibility of getting a
positive review about oneself. Also, people who have had particularly bad experiences will be normally more
inclined to advertise their experiences as a form of revenge against the user
that did not provide the desired service. Such hidden rewards bias the feedback
system; users who have had average experiences with other users and are not
aiming at increasing their reputation or seeking revenge against a bad service
provider will have little reason to provide feedback. An explicit reward system
has the advantage of attracting those users across the board. This project will build Pinocchio, a framework for providing explicit
incentives for honest participation in global-scale distributed trust
management infrastructures.  The system will improve the quality of information
supplied by these systems by reducing free-riding and encouraging honesty.  Our
approach will be twofold: (1) we will provide rewards for participants that
advertise their experiences to others, and (2) impose the credible threat of
halting the rewards, for a substantial amount of time, for participants who
consistently provide suspicious feedback. For this purpose we have developed an
honesty metric which can indicate the accuracy of feedback.  More information about this project can be found in our paper in the  2nd International 
Conference on Trust Management. Contact: Evangelos Kotsovinos
 
|  |  Operating System projectsMost of these are based around 
Xen, a Virtual
Machine Monitor for x86, which lets you run multiple operating system
images at the same time on the same PC hardware, with unprecedented
levels of performance and resource isolation. Even under the most
demanding workloads the performance overhead is just a few percent:
considerably less than alternatives such as VMware Workstation and
User Mode Linux. This makes Xen ideal for use in providing secure
virtual hosting, or even just for running multiple OSes on a desktop
machine.
The systems research group is continuing to develop and do research
using Xen, but there are a couple of areas which we feel would make
good self-contained student projects: 
 
Resource Scheduling and Accounting
Xen contains resource schedulers to share out the physical resources
of the hardware among the different virtual machines running on Xen.
Currently, our schedulers for the CPU, network interface and disk are
fairly rudimentary, and geared toward providing a "fair share" rather
than providing controls to guarantee (or restrict) a guest OS to
particular pre-agreed quota.
 
There are plenty of algorithms in the literature for schedulers which
do this. The goal of this project will be to select appropriate
algorithms, implement them, and perhaps enhance them for the virtual
machine environment.  Previous experience with C would be useful for
this project. You'll have plenty of good data structures and
algorithms stuff to write up in the dissertation.
 Contact: Ian Pratt
or Keir Fraser
 
Logging, Auditing and Denial of Service prevention
Xen is a key component of the XenoServers project,
which has the grand vision of creating an "Open Infrastructure for
Global Distributed Computing".  We envisage a world in which
Xenoserver execution platforms will be scattered across the globe and
available for any member of the public to submit code for
execution. The sponsor of the code will be billed for all the
resources used or reserved during the course of execution. This will
serve to encourage load balancing, limit congestion, and hopefully
even make the platform self-financing.
 
One big problem we're going to have is that some (many?) users of the
XenoServer infrastructure may use the system for nefarious purposes,
for launching denial of service attacks, spamming, or as a platform
for hacking other systems.  We need a way of efficiently (in space and
time) logging all the network activity occurring on Xen along with the
'sponsor' that caused it to occur, keeping it safe in case forensic analysis is required. We also need to be able to detect
and filter or rate limit operations which are likely to be
dubious. For example, "limit outgoing connections to SMTP ports to be
one per user per 10 seconds".
 
We can also quench a large class of denial of service attack by
looking at the symmetry of the number of packets received on a
connection to the number sent: for a TCP connection you should receive
an ACK every other packet. If not, chances are you're hosing the
remote host, and hence Xen should throttle the connection until the
symmetry improves. A similar technique can be used to throttle
attempts to do port scanning from XenoServers.
 
The goal of this project is to implement some of the detection and
throttling mechanisms described above, and maybe think of a few more.
 Contact: Ian Pratt 
Xen suspend/resume for migration
One cool feature we'd like to add to Xen would be the ability to
"suspend to file" an operating system image running over Xen, and then
"resume from file" on another machine running Xen, hence moving an
operating system and along with all the applications running over it!
Because of the virtualization Xen provides, this sort of thing isn't
impossible, but it will require someone with a decent knowledge of how
the x86 handles page tables and such like.
 
The code that does the spend/resume will run in another operating
system image, inspecting the memory footprint of the operating system
to be suspended, and then writing it to disk, along with a load of
meta information.  The hard part is that when the OS is 'resumed', the
physical memory pages that it was using are unlikely to be the same
ones it's been allocated this time round, so its going to be necessary
to re-write the page tables before resuming the OS, based on the meta
information that's been stored.
 
As an extension, you could modify the guest operating system to make
suspend resume more efficient, perhaps by evicting the entire buffer
cache, hence minimising the suspended state.
 Contact: Ian Pratt,
Keir Fraser
or Steven Hand
  
Restartable Device Drivers
Like most operating systems, the Xen Virtual Machine Monitor runs its
device drivers in the most privileged execution level (ring 0 in x86
terminology), giving them unrestricted access to memory, I/O addresses,
privileged instructions etc. Unfortunately, many device drivers are
not written by programmers who actually understand the subtleties of
writing kernel code. Forgetting to turn interrupts on, or accessing an
un-mapped page are common mistakes. Unfortunately, these simple errors
can result in the OS/VMM crashing or just hanging.
 
System reliability could be significantly improved if device drivers
were moved out of 'ring 0', and executed in a somewhat more restricted
environment where bugs can do less harm.  For example, only giving
device drivers I/O access to the particular piece of hardware they're
supposed to be controlling, limiting what memory they can write to,
ensuring that interrupts aren't turned off for long etc.  If a device
driver then causes a page fault, or violates any of the assertions, it
can then just be stopped, and potentially restarted, hopefully
clearing the fault without any users even noticing.
 
Of course, some faults (like DMA'ing a received network packet to a
bad address) can't be protected against, but hopefully that kind of
error isn't common. To do this project you should have a decent
knowledge of device drivers, and a bit about x86 memory protection.
 Contact: Ian Pratt
or Keir Fraser
  
Copy-on-write Filesystem
The Xen Virtual Machine Monitor shares the storage capacity of 
underlying disks between guest operating systems (guestOSes). In many cases, 
however, several guestOSes may be running a similar system image (e.g. a
several virtual machines each hosting apache on linux) and it would be 
nice to minimize the storage cost where possible.
 
One way of achieving this is to provide a copy-on-write file-system 
where each guestOS has a private copy only of those files it creates or 
modifies and retains a shared copy of all files it accesses read-only. 
Work within the SRG is already close to achieving this in the context
of NFS, but it would also be interesting to build such a system without
requiring interaction between (potentially competing) virtual machines. 
  
This project would implement a copy-on-write filesystem for Linux / XenoLinux 
which allowed the use of an ordered set of partitions / devices to hold a
single logical file-system image, somewhat like "nested mounts" in BSD. 
The overall system would be evaluated in terms of time overhead and 
space saving for a set of common application scenarios. If time permits 
(and code is ready) a comparison may also be made against the NFS-based
scheme. 
 Contact: Steven Hand  
Querying File System
Contemporary filing systems map names to files by using a
hierarchical lookup scheme built from nested directories. This allows
a relatively efficient name resolution algorithm along with
considerable flexibility for the user. Search paths or naming
conventions may be used to prefer a given version of a file.
In many cases, however, users would prefer a scheme which allows
them to retrieve or specify files based on queries, boolean
expressions relating to meta-data such modification date, owner, type,
keywords, etc. Such operations may return 0, 1, or many files.
 
This project will investigate suitable schemes for both on-disk and
in-core data structures to support a query filing system (QFS). A
prototype implementation will be developed at user-space level (
e.g. under Linux using the UserFS library). As a possible extension,
information about queries could be used to provide "hints" to the
buffer cache and prefetching scheme and so improve performance.
 
Note: this project has been done successfully a couple of times before; 
interested students may like to look at the Part II project dissertations 
in the library for accelerated background reading. 
 Contact: Steven Hand
 
 
|  |  Hardware projects
 
Fun with solid-state accelerometers 
It's now possible to buy miniature 2 axis accelerometers that output
very accurate acceleration data in digital form. This can be
integrated to determine relative velocity or even position. This
project would involve building some simple hardware (perhaps
interfacing to a serial line on a PDA), and writing software to use
the acceleration data. For example, you could do a gesture based UI,
auto-updating map, magic pen, tracking system for a radio controlled
car etc...
 Contact: Ian Pratt  
Software AM radio / Location using AM radio signals
High-speed analog to digital converters now mean it is possible to
digitize the entire AM radio band, then performing the 'tuning' in
software on a PC.  The first phase of this project is to assemble the
hardware and software to build such a "software" radio (we have a
high-speed ADC PCI card that could be used).
One direction the project could take would be to use AM radio signals
for location: AM radio stations emmit a very stable carrier frequency
from a fixed location (the transmitter).  By using a digital radio to
monitor phase changes in the carrier frequency, it should be possible
to tell whether a mobile device has moved towards, or away from, the
transmitter. By tuning into several AM stations simultaneously it
should be possible to recover accurate position information.
  This project could take two forms: A student interested in
hardware could build the digital radio, or perhaps use a standard PCI
card with high-speed ADC. Alternatively, the work could be done
entirely in simultaion. Either way, there's loads of cool DSP
stuff to be done. 
 Contact: Ian Pratt
 
 
|  |  Security projects
Secure packets
The goal of this project is to design and build a proof of concept
implementation of a networking protocol that would enable
"opportunistic encryption" to be utilized whenever two Internet hosts
wished to communicate. The aim is to achieve this in as efficient a
way as possible. For example, the payload data contained first few
packets of a new flow would be encrypted using a public key cipher
while a key exchange takes place piggy-backed on top of the data
packets to enable a symmetric cipher to be used for the remainder of
the flow.  Multiple TCP and UDP connections between the same hosts
could make use of the same tunnel, which would be timed-out after
being idle for some period.
 A key extension to the project would be to prevent a class of denial
of service attack whereby a host or hosts bombard a receiver with
flow-setup packets that it needs to decipher using an expensive PK
algorithm. One way to try and avoid this problem would be to use
"crypto puzzles", whereby the sender needs to do a significant amount
of computation in order to generate a ticket that is attached to the
packet. The puzzle is asymmetric in that the receiver is able to
easily check whether the ticket is valid for this particular
packet. Hence, it's much harder to create a flood of packets that
overloads the receiver.  
 Contact: Ian Pratt
 Suggestions from 
 
Schedulers for Virtual Machines 
Xen is a virtual machine monitor (primarily developed within the NetOS
group at the Computer Lab) capable of running multiple guest operating
systems concurrently. The primary guest OS is Linux, with others being
in the progress of being ported.
 
Xen currently implements one particular scheduling algorithm. The aim of
this project is to implement a number of different scheduling algorithms
and compare their characteristics against each other. As part of this
project it is anticipated to design and implement a general interface
for scheduling algorithm. The project could be extended to address
issues of process migration in multi-processor and multi-threaded
systems.
 
Contact: Rolf Neugebauer 
 
Control Plane Virtualization 
In addition to forwarding data packets, IP routers run a number of
control plane protocols.  Normally such protocols run on a single
processor, which can lead to resource contention issues that can
impact reliability for the entire network.  For example, within a
single routing stack, different protocols - such as BGP and OSPF -
may contend for memory and CPU resources, and starvation can cause the
loss of routing adjacencies and network-wide disruptions.  Or, in the
area of Virtual Private Networks (VPNs), a single router might be
supporting multiple disjoint VPNs. It is difficult to implement
control-plane protocols so that instability in one VPN cannot impact
other VPNs due to shared resource contention. Yet without such
guarantees such VPNs are not strictly isolated form one another.
 
In general we are interested in exploring the application of more
advanced operating system mechanisms, in particular, virtual machine
technology, to ameliorating such control plane problems. We have a
number of ideas about possible Part II projects in this area involving
Network Processor (IXP) programming; operating system kernel
development, in particular related to 
the Xen 
virtual machine monitor, and general
control plane software architecture.
 
Contact: Tim
Griffin and Rolf Neugebauer
 
 
Ad-hoc Name Space Service 
Very often we want to communicate or locate other people using our
laptops or PDAs while in indoor environments, for instance, in a meeting
room or inside a large building. It seems an easy task when network
infrastructure and services to support this are locally available.
However, sometimes this is not the case either because no infrastructure
can be found or simply because we don't want to rely on external
services such as centralised messenger servers.
  With 802.11 wireless networks in peer-to-peer mode, we can set up
an ad-hoc network so that machines can communicate directly with each
other without using an Access Point. Although this gives us the
necessary communication platform, basic services still need to be made
available.
  This project aims at developing an ad-hoc name service
for ad-hoc networks that can be used by applications such instant
messaging.  This service should be designed and implemented on top of
a distributed tuple space, similarly to Linda (see D. Gelernter,
"Generative communication in Linda", ACM Computing Surveys,
7(1):80-112, Jan. 1985), where each participating node holds parts of
the entire tuple space. A tuple is a data structure consisting of one
or more typed fields. ("foo", 75, false) is an example of a tuple with
three typed fields: a string ("foo"), an integer
(75) and a boolean (false). The Linda framework offers
operations to insert/delete tuples to/from a tuple space. To validate
the prototype implementation of this service, a simple instant
messaging application may be developed.
 
If time allows, as an optional task, this name service may be
implemented in a sensor network that we have available at Intel Research
Cambridge.
 
Contact: Marcelo Pias, Christophe Diot 
or Derek McAuley
 
Hash Functions in Networking - Performance Evaluation Kit
Hashing is used extensively in contemporary networks, to improve
performance of various lookup and memory access schemes, to provide for
load balancing among multiple targets, and for various other tasks.
Although selected publications exist, when designing a particular
application, it is often not easy to determine which hash function suits
best the particular task and the particular input data distribution.
 
In fact, it is generally impossible to evaluate a hash function on a
theoretical basis only, as for any deterministic hash function, such
input data patterns can be constructed that would cause the hash
function
to perform poorly. Thus, a hash function can only be evaluated against
data exhibiting similar properties to the data that the application is
destined for.
 
The authors have interest in hash functions as apply to network traffic
and propose that this project develop a framework for quick testing and
evaluating any hash function against a representative set of networking
traces, being rapidly able to compare the following criteria:
 
 computation overhead
 number of collisions in various table sizes
 entropy of input and output data
 comparison to the best-known hash functions
 
Pre-requisites: basic programming, basic information theory
 
Additionally a knowledge of MATLAB helpful although this is not a
prerequsite; access to MATLAB will be provided if necessary. 
Representative network trace data will also be supplied.
 Contact: L.
 Kencl (Intel) or A. Moore
(Cambridge University)  
Sensors for the Evaluation of Football Rules - Demonstrator and
Visualizer
The goal of this project is to develop a demonstrator of the usefulness
of deploying sensors to monitor the movement of participants and of the
ball in ball games. The work would consist of two parts:
 
 
 simulate the collection of the input data from an existing match
(processed video footage or available collected robot match data);
 develop and implement scenarios for analysing the data and providing
real-time feedback to the referee, coach, spectators, about various
aspects of the game (rules confirmation, players activity, etc.).
 
It may be possible to obtain additional data from LCE regarding their
robot football tournaments. There is also an option to process still
images from exisiting games using the 
Virtual Referee 
Tool developed at the TECGRAPH Laboratory at PUC Rio de Janeiro:
The tool would be applied to stills of a football video footage to
obtain the sample data for the project.  
  Pre-requisites: basic programming, ball games knowledge helpful
 Contact: L. Kencl (Intel) 
Query Optimization for Record Streams
Many network measurement applications (route monitoring of BGP, OSPF,
IS-IS, TCP trace analysis, HTTP trace analysis) operate on streams of
data objects that can be thought of a records.  When processing
streams at high data rates or large streams stored in trace archives,
it is important to make data parsing and filtering as efficient as
possible.  Most applications are not interested in all available data,
but only in a subset of records that can be identified by some Boolean
predicate over the attribute values of each record.  Stream processing
can be greatly improved by using such predicates as input to a "query
optimizer" that parses the minimal number of required to determine the
result of the predicate.
 
This project will involve:
  
 Design of a simple specification language for record formats.
 Design of a simple specification language for the "atomic
predicates" required each attribute of a give record format
specification.  (Example: "use regular expression matching for
attribute FOO.")
 Design and implementation of a "query optimizer" that optimizes
the stream filtering of formatted records with respect to an arbitrary
Boolean queryconstructed from atomic predicates.
 Testing the system on a collection of network trace data. 
 Contact: Tim Griffin (Intel)
 
 |