External
This year, Intel Research Cambridge, who are in the Gates Building with
the Computer Lab and LCE,
have suggested some projects - see below for
some details. Other projects from them and from the LCE (Lab for Communicaitons
Engineering) are also within this doc.
Miscellaneous
-
Bittorrent Lifetimes
Bittorrent is a peer-to-peer file sharing tool that allows a file,
seeded by a single source, to be downloaded concurrently by a set of
users. It is interesting in that downloading users must also upload
portions of the file to each other, lessening the load on the initial
seed. In fact, the original source of a file may discontinue it's
participation in the session once it has sent a single complete copy
of the file. In this manner, a file may be made available over a long
period of time without being completely available at a single location
(as users who finish downloading files will typically quit the
bittorrent session).
This project is concerned with understanding how files shared in such
a manner behave over time. It consists of two parts:
Part A: You will read the bittorrent source and provide a
description of how the protocol works. The intention of this model is
to identify specific factors (i.e. the arrival/departure rates of
users, access bandwidth of clients, file size, etc.) that effect how
long an object will remain available. For instance, a large increase
in access bandwidth might potentially break such a scheme as downloads
would complete too quickly for a community of active clients to
persist.
Part B: You will validate your model by collecting real data on
existing bittorrent sessions. This part will involve writing
monitoring tools to watch bittorrent 'in the wild'. It will be
important to attack this part of the project very early in the year,
in order to have an opportunity to collect a reasonable amount of
data. As this is likely to involve active monitoring (your monitor
will connect to live torrents), careful design will be required
address concerns such as the maximum bandwidth consumed.
Contact: Andrew Warfield
-
Route Planning on the London Underground
Recently there have been two successful diploma projects that have
investigated fastest-route determination for pairs of London
Underground stations. The first project implemented a sequential
algorithm based on Dijkstra?s algorithm for finding the shortest path
between two nodes; the second found that Dijkstra's algorithm is not
highly parallel, and instead developed a parallel algorithm based on
closely related one for matrix multiplication.
The aim of this project is to take the already available travel
information, for either the London Underground or the UK Rail network,
and to further develop the parallel algorithm. Parallel computers have
as a core structure a set of interconnected processors. The optimal
choice of interconnection for a particular problem may not always be
available. In fact, in practice the interconnection usually takes
place within some sort of plane two-dimensional lattice.
In the implementation of the parallel algorithm, it will be
interesting to explore the features of the two-dimensional graph
arising from the connections of stations (nodes) and the travel times
between stations (edges), and to compare this with the lattice of
processor connectivity. One could take, for example, parallel
techniques such as recursive doubling, or domain decomposition, and
examine embedding onto lattice connectivity. If the student is
mathematically inclined, it may be interesting to investigate certain
properties possessed by these techniques, such as degree of symmetry
and homogeneity, as well as the extent to which they can be embedded
onto a lattice.
In addition it may be useful to implement the algorithm on a random
topology of interconnected processors in order to evaluate the cost
component, and then to compare this with the cost on a lattice
connectivity, so as to determine the extent to which interconnection
between processors plays an important role. In this case it would be
instructive to find out what other attributes of the parallel machine
have bearing on the overall cost.
The project will certainly provide ample opportunity to demonstrate
your skills in computing and thinking in parallel!
Potential Supervisor: Jagdish Modi at89@dial.pipex.com / Ian Pratt ian.pratt@cl.cam.ac.uk
Special resources: For some of the extensions, access to a multiprocessor
machine will be provided by the the Systems Research Group.
References:
- Wilson R., Introduction to Graph Theory, Academic Press, 1972
- Hollow S., Route Planning on the London Underground,
Diploma project, Computer Laboratory 2001
- http://www.nationalrail.co.uk/enquiries/nrt_upd.asp
-
Inexplicable behaviour of the heart
Originator: Jagdish Modi, email: at89@dial.pipex.com
Lab Contact: Ian Pratt, email: ian.pratt@cl.cam.ac.uk
Much work has been undertaken using time-series analysis to model applications such as turbulence in flights and financial markets. More recently scientists have started looking heartbeats as a model based on dynamical system. Some of the preliminary results suggest that heart rate changes have a complicated clustering pattern in healthy people, but that pattern tends to break down- sometimes becoming too regular.
The work was carried out using data from National Institutes of Health, Research Resource for Complex Physiological Signals database.
This project will take the data available, and using time-series analysis will model the behaviour of heartbeats. As with other applications initially it would be interesting to analyse the data by comparing with normal distribution, and both to investigate any 'scaling' properties as well as identify existence of any 'power' laws. Furthermore it would be interesting verify that healthy heartbeats manifests several superimposed fractal patterns with different scaling behaviour.
Initially the candidate would be expected to examine recent literature in the field of cardiology and computers.
The project will certainly provide ample opportunity to demonstrate your skills in computing and the candidate would be encouraged to use a large multiprocessor machine.
Potential Supervisor: Jagdish Modi at89@dial.pipex.com / Ian Pratt ian.pratt@cl.cam.ac.uk
Refs:
- Randy Atkins: "Mysterious Ways of the Heart: New understanding May Lead to Earlier Diagnoses". Physics Central, Feb. 21, 2001
(http://www.physicscentral.com/news/news-01-4.html)
- Multifractality in human heartbeat dynamics, Math in the Media, Media of the American Mathematical Society, July 1999.
(http://www.ams.org/new-in-math/note-archive.html)
- Y.Ashkenazy, P.Ch. Ivanov, S.Havlin, C.-K.Peng, Y.Yamamoto, A.L. Goldberger, H.E. Stanley. "Decomposition of heartbeat time series: scaling analysis of the sign sequence". Computers in Cardiology, 2000;27:139-142.
-
A New Data Mining approach
Data Mining can involve the training of classification schemes to
recognise desirable data and then operating these classifiers upon
databases inorder to locate the data of interest.
A useful data-mining framework already exists called Weka. However, Weka
does not yet implement QDA (Quadratic Discriminant Analysis). This
project would rectify the situation.
Weka is writtern in java and has copiuos documentation describing its
interfaces; what is required is that the student recognise the
difficulties that the QDA algorithm posses and design an appropraite
strategy to cope.
An implementation of QDA is available for cross-validation.
The link for Weka.
Note: this project requires a student that is not scared by the java
programming language and is comfortable with mathematics.
QDA is described in
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge
University Press.
Excellent materials covering both the (validation) implementation of QDA and the QDA technique itself are available in the University
Stats Lab.
Contact: Andrew Moore
-
Real-Time Data Mining
Data Mining can involve the training of classification schemes to
recognise desirable data and then operating these classifiers upon
databases inorder to locate the data of interest.
A useful data-mining framework already exists called Weka. However, Weka
is not designed for Real-Time operation, this project would attempt to
rectify this situation.
Weka is writtern in java and has copiuos documentation describing its
interfaces; what is required is that the student recognise the
difficulties with real-time code implementations, particularly in Java,
and design an appropraite strategy to permit use of this system.
The objective would be to provide a library of operations that allow
a subset of Weka operations to be incorporated into other code. Students
will also need to validate their approach as to be real-time the
approach must possess some time-bounded properties; the program cannot
make a library call that might never return.
The link for Weka.
Note: this project requires a student that is not scared by the java
programming language. A familiarity with C will assist in creating an
appropriate library.
Contact: Andrew Moore
 |
 |
System modelling projects
-
Running a combinatorial auction
Combinatorial auction designs have been proposed in a variety of
situations, notably within the telecommunications industry where they
have been considered for the allocation of radio spectrum and for the
distribution of subsidies for universal service. This project will
design and implement such an auction using a recently proposed auction
design with improved (in fact, Polynomial) computational performance
relative to standard designs (which are NP-complete). The project
will implement the Auctioneer's process for accepting and selecting
winning bids from remote bidders. Since the auction design includes
multiple rounds a format will need to be designed to publish interim
results to all bidders.
For more details see: Frank Kelly and Richard Steinberg A Combinatorial
Auction with Multiple Winners for Universal Service Management
Science 46 (2000) 586-596.
Contact: Richard
Gibbens
-
Performance modelling tool for enterprise IP networks
Performance modelling techniques originally applied to
circuit-switched telephony networks (with great success) have recently
been modified and proposed for IP networks. This project will involve
the design and implementation of a modelling tool using performance
modelling techniques based on fixed-point approaches. The project
will include both the construction of the computational methods as
well as the necessary interfaces to suitable input and output formats
in order to complete a modelling tool suitable for network designers.
For more details see: R. J. Gibbens, S. K. Sargood, C. Van Eijl, F. P. Kelly,
H. Azmoodeh, R. N. Macfadyen, N. W. Macfadyen Fixed-Point
Models for the End-to-End Performance Analysis of IP Networks In
Proc 13th ITC Specialist Seminar: IP Traffic Measurement, Modeling
and Management, Sep 2000, Monterey, CA.
Contact: Richard
Gibbens
 |
Networking projects
-
Receiver Layered Multicast
Network audio and video require reasonably high bandwidth and
relatively low jitter (variation in delay). These
requirements make it difficult to deliver high fidelity data directly
from the network. However these media types may be encoded in a
layered format; i.e. as a number of layers 1, 2, 3, ... n.
The receiver may choose to decode any prefix of the sequence,
obtaining increased fidelity in each case.
It is possible to distribute audio and video in a layered format
where each of n layers is given a
different multicast "channel". Receivers
may then decide to receive the first k of these, and to decode
(and render) the first j <= k of these in
order to maintain real-time
rates and/or to match the fidelity to the characteristics of the
receiving terminal.
This project would choose or construct a layered encoding scheme for
either video or audio (or both), and develop transmitting and
receiving software.
As a first step, broadcast on a local ethernet could be used
in the place of multicast, although it would be nicer (and more valid)
to use multiple networks and a "proper" multicast protocol.
Contact: Steven
Hand
-
Predicate Routing
In current IP networks, the routing state of the system is primarily
represented as a set of routing tables (local to each router) and a
set of filtering rules (also local to each router or firewall). In
contrast, Predicate Routing represents the state of the network as a
set of boolean expressions associated with links which assert
which kinds of packet can appear where. From these
expressions, routing tables and filter rules can be derived
automatically. Conversely, the consequences of a change in network
state can be calculated for any point in the network (link,
router, or end system), and predicates derived from known
configuration state of routers and links. This subsumes notions of
both routing and firewalling. More details can be found in the
following
paper.
This project would involve designing and implementing a network
control plane for predicate routing, probably in centralised manner.
An extension would be to develop an inference tool to alllow one
to derive properties of the network.
A more ambitious project is to attempt to build a distributed version
either in simulation (for example using SSF), or by packaging up a
real implementation (e.g. Zebra).
Contact: Steven Hand
-
Project Proposal on Network Coding
Many users download many similar files which are stored on multiple
sites. Once, we used to use a client to download a file from a given
server - or we might find one of a set of servers that hold a file (e.g.
using google or rpmfind or some other tool) and retrieve it, or we might
have used a peer-to-peer network to locate and return the file.
Bittorrent took things a bit further - noting that many sites may be
bandwidth-limited in the rate they can upload (maybe more than the
client is download bandwidth limited, especially in these days where
ADSL and cable modem offer asymmetric capacity), a tracker service tells
you multiple servers you can retrieve a file from, and then the protocol
allows you to access different parts of the file from each server, in
parallel. A certain amount of resource allocation is also supported.
Forward error protection is a scheme to protect from packet loss by
adding redundency to each packet or pool of packets transmitted. Some
researchers have devised clever schemes (called source or joint source
and chanel coding schemes) that allow one to recosntruct the object
transmitted out of a subset of packets received. This can be used to send
a file over lossy Internet paths, or even to send a file in several forms
(e.g. several resolution or quality levels for audio or video) and have
the receiver reconstruct up to the quality desired in the available
download time.
Such codes are good with a known number of sources. Such systems appear
to work well and some companies (e.g. Digital Fountain - see below) have
products that implement them. Reed-Solomon codes are the most well-known
codes, but are inefficient in terms of CPU load for coding large blocks
and very slow to decode too, althoguh they are optimal in terms of coding
overhead. Tornado codes (and other schemes) are closer to linear in
decode cost, at some overhead cost (extra redundency).
Network coding is a technique where intermediate systems carry out the
redundency coding, but with an extra twist - the packets from multiple
different sources are linearly combined to produce packets to be sent
along the download paths towards multiple receivers - repeeated re-coding
is used at each hop (router, cache or whatever - also works in an ad hoc
multihop radio net e.g. community wifi nets, where each node "helps" all
the other nodes reach each other) - receivers then decode a set of
packets to extract the data they wish to receive. Even a modest number of
simultaenous flows will see a potential improvement in the avaialble
capacity from this new virtually shared channel.
To build a replacement for bittorrent that implemented such a scheme is
an interesting project - practical problems to tackle include:
- protocol design details (UDP based?)
- helper/tracker design
- choice and implementation of coder/recoder
- support for heterogeneous rates
Several interesting rather more researchy problems also remain
in this area, ranging from pragmatic to open ended.
- Performance tradeoffs (delay in code/decode and forward v.throughput)
- NAT traversal and other obstacle removal!
- Design of Novel Codes for recombination
- Security, esp. privacy and prevention of DoS attacks
Selected References
-
C. Gkantsidis, P. Rodriguez, "Network Coding for Large Scale Content
Distribution", Microsoft Research Technical Report (MSR-TR-2004-80).
http://www.research.microsoft.com/~pablo/papers/nc_contentdist.pdf
Bit Torrent
http://bitconjurer.org/BitTorrent/
Network Coding
http://research.microsoft.com/~pachou/pubs/ChouWJ04.ppt
Tornado Codes
http://www.icsi.berkeley.edu/~luby/
Homomorphic Hash Functions
http://www.scs.cs.nyu.edu/~mfreed/research/authcodes-ieee04.html
Receiver driven layered multicast
http://www.cl.cam.ac.uk/~jac22/otalks/infocom98.pdf
Contact: Jon Crowcroft
-
srg@home: The cambridge computer lab large scale research network
(aka the cambridge computer lab screen saver :-)
(alternative names computerlab@home, cambridge@home, planetlab@home :-)
Goal: Establish a research infrastructure to evaluate distributed
algorithms on real large scale networks
Description: Running distributed algorithms on real large scale
networks is important to evaluate the properties of those algorithms.
Today testing an algorithm on 10 000 machines is challenging.
Building a large scale network for this purpose is difficult and
costly [1]. Additionaly it is difficult to reproduce the properties
of deployed networks such as the Internet in a tightly controlled
environment. On the other hand there are millions of Internet
connected machines that are unused for large periods of time. Several
projects, for instance seti@home, have made clear that people are
willing to contribute freely their unused computing resources to noble
causes. Advancing the state of the art in computer science is one such
noble cause. Therefore if srg built a simple container to run
distributed experiments, for instance a screen saver, it is
conceivable that many people would run it. One simple model to build
and operate such a network, would be to use it exclusively to run srg
experiments (as opposed to having an open platform where everyone
could run experiments). The base service would allow srg to load
experiment code into the hosts on the network and would require srg to
sign all the code shipped in this way. The service should run the code
in a sanboxing environment (although this is not strictly required,
since we are assuming that people trust srg to install the service
anyway). The service would export a simple API such as:
public interface RemoteServiceInterface
{
void InstallCode(byte[] assemblyCode, byte[] codeSignature, string appDomain);
void CreateAndRunObject(string appDomain, string className, string[] params);
void InstallNewService(byte[] serviceCode, byte[] designated);
string Version();
void RestartService();
void CreateAppDomain(string friendlyName);
void TerminateAppDomain(string appDomain);
void TerminateAllAppDomains();
}
(appdomain is an isolation environment inside the service, analagous
to a process in an operating system; this is not strictly required).
Other models where code from other origins (i.e. not from srg) could
be loaded into the service could also be considered, but since more
complicated models would probably hinder the deployment of the system,
the primary goal should be to support code only from srg. The service
should include an administration interface to allow monitoring the
state of the network and to perform simple operations shuch start/stop
instances. The service should also allow scripting of common
operations. The service and the management tools should be scalable
enough to support timely operation even when the number of deployed
nodes is very large (e.g. 100 000 machines), peer-to-peer mechanisms
[2,3,4] may be used to achieve this. Finally a set of support
services should be provided as building blocks for most applications:
for instance a logging service should allow any application instance
to report data back to srg.
References
-
[1] http://www.planet-lab.org/
[2] A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems", Middleware'2001, Germany, November 2001
[3] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM 2001, San Deigo, CA, August 2001
[4] Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John Kubiatowicz IEEE Journal on Selected Areas in Communications, January 2004
Contact: Jon Crowcroft
-
400Mb/s WLAN emulation
Groups around the world are putting forward specifications for the
next generation Wireless LAN standard, which will operate at up to
400Mb/s. However, at these high speeds there are serious problems with
the current 802.11 Media Access Control standard, which needs to be
modified to allow multiple IP packets to be sent in the same MAC
frame, and to allow IP data to be 'piggy backed' with MAC
acknowledgements. The trouble is, no one really knows what effect this
batching will have on the performance of applications (Web, real
player etc). People have done simulations, but real-world TCP/IP
implementations are very different from the simple ones used in
simulation.
The aim of this project is to build a high-performance WLAN
emulator. It will be a PC with 2 or more Gigabit Ethernet cards
representing the wireless base station and one or more wireless
clients. The emulator will forward packets across the interfaces, but
inserting delays (and possibly errors) as dictated by an internel
simulation of the MAC protocol under investigation. Using the
emulator, we can use machines running a range of operating systems and
applications to find out what the performance of the system is likely
to be like in real life. We can then use this information to develop
better MAC protocols.
This project is best done by someone with experience in 'C', and an
interest in WLAN and Ethernet protocols. You might even end up with
your name in the new IEEE WLAN standard document ;-)
Contact: Ian Pratt
-
Triggers for Wireless Networks
Link_Up and Link_Down triggers has been used in MobileIP and Ad Hoc
Network. The se triggers provide inputs to routing protocol and have
significant effects on p erformance of wireless network. In general,
it notify routing protocol when sign al strength is above or below
predefined theshold levels. Apparently, this metho d is not able to
cope with fluctuation of signal strength under dynamic mobility
condition.
In this project, statistical information of link signal/noise ratio,
queue, CPU utilisation, number of retries and energy are to be
exchanged between adjacent n odes. You are expected to explore into
proactive triggers by using these raw inf ormation. Successful
triggers should able to exhibit high accuracy of near future event.
Contact: Meng How Lim
-
Fun with wireless networking equipment and PDAs
This isn't so much a project description, as note to say that the SRG
has a number of GSM/GPRS-capable phones and datacards, Wireless LAN
cards, and iPAQ PDAs that could be used in the execution of part II
projects, providing you can come up with a cool application or
service to use them for. We can provide help with configuring the kit,
and hosting the server-side component of any applications.
Contact: Ian Pratt
-
DummyNet plus Xen = planetlab in a box
The virtualization of operating systems provided by Xen allows us to
intercept the interactions between an OS and the hardware devices that
it uses. An example of this is to intercept packets to or from the
network devices and overlay properties onto the packets as they
enter/exit the network. Additionally, Xen can allow network traffic
of one virtual machine to be exchanged with another virtual machine,
through a virutal network interface. The properties of this virtual
interface (such as the delay between machines) can also be changed. Such
an ability --- to emulate change in the physical properties of the
underlying network --- opens up an interesting application of Xen for
testing/simulating wide-area deployments.
To this end an existing system for allowing the simulation of physical
networks: DummyNet, is a perfect match to be incorporated with Xen. It
then becomes feasible to construct a micro-planetlab: a network of hosts
with Internet-wide properties (latency, delay etc)
It is anticipated that a DummyNet system would itself be a VM with
virtualised network interfaces offered to other domains on the system;
the DummyNet virtual machine can then define (and change) the parameters
of each interface: bandwidth, latency to given destinations, etc.
For the emulation of a network, a characterization of the properties
of that network are essential. To this end, in addition to implementing
DummyNet-type functionality within Xen, the student will need to
understand it's limitations and perhaps, time-permitting, explore
mechanisms to overcome those limitations.
The link for Xen is above and this
will find DummyNet.
Note: this project requires a student that is comfortable with the C
programming language.
Contact: Andrew Moore
 |
XenoServers platform projects
-
Selecting XenoServers to maximise service availability
In the XenoServers platform, a user may choose to deploy multiple
instances of a service across a set of machines so that, for example,
requests from clients can be directed to a nearby server. XenoSearch
allows you to issue queries to select such sets of servers
simultaneously, with the location of the servers being constrained
inter-dependently, for example `far(near(S1, S2), near(S3, S4), C1)'
to request server S1 near to server S2, server S3 near to S4 and both
pairs of chosen servers far from one another and far from client C1.
It also allows you specify resource availability at the machines:
e.g. CPU, free memory, cost, etc.
Users might also like to choose sets of nodes based on their failuire
rates, and with low rates of correlated failure, so that the overall
availability of their service is maximised. Such failure may be caused
by network outage or subsequent delays in reconvergence, or power or
machine failure at the XenoServer in question. Obviously, choosing two
XenoServers that share a network link makes a service more vulnerable.
The aim of this project is to develop a monitoring system that watches
XenoServers and tracks their periods of inaccessiblity, and correlates
this with those of other nodes. A small part of the work will involve
modifying the XenoSearch implementation to choose sets of servers with
acceptable (or the best) failure rate, at the stage when it considers
whether the resouces of the candidate sets of servers match
requirements.
Network partitions are a common occurence, but mean that a XenoServer
that is inaccessible from the client may still be accessible from a
single monitoring point (and vice versa). As such, monitoring might be
performed from across the network using a distributed system perhaps
based around a DHT (e.g. Bamboo). If the service operator is concerned
with being near client C1, then perceptions of failure are subjective
with respect to C1 (and consider the extension to a number of clients
in the query). So it involves getting failure estimates from a
monitoring node near C1.
Conversely, if we distribute the load of tracking XenoServers between
a number of monitoring nodes, we need to make sure that we don't mark
a server as failed if it was due a network partition near or failure
of the monitoring node.
Take a look at Weatherspoon
et al "Introspective Failure Analysis: Avoiding Correlated Failures in
Peer-to-Peer Systems" for one approach for inferring failure
correlation.
Bamboo is written in Java, so
it might make sense to use this as the basis for the monitoring
system. We have a mature implementation of XenoSearch written in
Standard ML.
Contact: Tim
Moreton
-
For other XenoServers control-plane projects, see Vangelis' suggestions.
 |
Distributed Systems projects
-
Co-operative Web Servers
Having a reliable and fast WWW presence is important for many
organisations, including the University. This is pretty tough for
small or under funded organisations, particularly in the presence of
machine or network failures, or unexpected spikes in demand (the
'slashdot effect'). However, the situation might be improved if a
number of sites could agree to co-operate and transparently mirror,
cache, and serve content on behalf of each other.
Suppose we had a group of e.g. 50 organisations across the world, each
running their own server on a single PC. For each web site, a couple
of nodes in the group would be selected as secondary replicas, and
content mirrored to them. The machines would monitor each other, and
use dynamic DNS to update the list of available hosts capable of
serving the web site. If a demand surge is experienced by a web site,
we would like nodes in addition to the replicas to start caching and
serving content on behalf of the replicas. We can do this by having
the replica nodes re-write links in the web pages returned to clients
such that they point at other servers, which would then fetch and
cache the content from the replicas. Ideally, we'd point clients at
the server closest to them, to get best performance.
The system would be as much as possible `self-organising' without
requiring sysadmin intervention. It would generate reports showing
what work had been done on behalf of other sites, so the operators
could identify 'free riders' and ask them to leave the federation.
There's loads of good computer science in this project, though it
would ideally require someone with experience using Apache and Squid.
It could also have serious practical impact if done well.
Contact: Ian Pratt
-
Distributed Execution Tracing
PlanetLab is a set of over 150 machines distributed amongst research
institutions across the world. It's a testbed to enable experimental
deployment of new distributed services and algorithms e.g. peer to
peer networks. We have arranged for the kernels on these machines to
run Linux Trace Toolkit (LTT), to gather execution information of the
prototype applications which are running on them. We have a research
project which is attempting to harvest this tracing information from
multiple machines, and combine the traces to gain insight into the
patterns of communication and computation of these applications, and
perhaps be able to make predictions as to how they would perform when
scaled to larger numbers of machines as might happen in a real
deployment.
The goal of this project would be to take the same tracing information
on each node, but to selectively make it available to the actual
applications running on the node, enabling them to adapt and tune
their own behaviour based on the information observed. One part of
this project would be implementing something akin to the "Berkeley
Packet Filter" (normally used by tcpdump/libpcap) for trace events,
enabling an application to selectively filter the set of events it
wants to be informed about, and for the OS to determine which events
an application should be allowed to receive (e.g. not events generated
by applications run by other users). The second part of the project
is to modify an application such that it optimises its behaviour based
on information gleaned from the trace messages.
Contact: Ian Pratt
 |
Operating System projects
Most of these are based around
Xen, a Virtual
Machine Monitor for x86, which lets you run multiple operating system
images at the same time on the same PC hardware, with unprecedented
levels of performance and resource isolation. Even under the most
demanding workloads the performance overhead is just a few percent:
considerably less than alternatives such as VMware Workstation and
User Mode Linux. This makes Xen ideal for use in providing secure
virtual hosting, or even just for running multiple OSes on a desktop
machine.
The systems research group is continuing to develop and do research
using Xen, but there are a couple of areas which we feel would make
good self-contained student projects:
-
Virtual Discs
The virtualization of operating systems provided by Xen allows us to
intercept the interactions between an OS and the hardware devices that
it uses. One example of this is block device interposition: messages
to and from a physical disc can be redirected to a separate VM, where
a virtual disc may be implemented.
For this project, you will take advantage of existing code that allows
interception of disc messages to build some examples of virtual
discs. You will start with an (easy) initial example of building
either an encrypted or compressed file store as a virtual disc. If
this is successful, there will be plenty of opportunity to extend the
functionality to include copy-on-write and "time-travel" discs.
Contact: Andrew Warfield
-
Resource Scheduling and Accounting
Xen contains resource schedulers to share out the physical resources
of the hardware among the different virtual machines running on Xen.
Currently, our schedulers for the CPU, network interface and disk are
fairly rudimentary, and geared toward providing a "fair share" rather
than providing controls to guarantee (or restrict) a guest OS to
particular pre-agreed quota.
There are plenty of algorithms in the literature for schedulers which
do this. The goal of this project will be to select appropriate
algorithms, implement them, and perhaps enhance them for the virtual
machine environment. Previous experience with C would be useful for
this project. You'll have plenty of good data structures and
algorithms stuff to write up in the dissertation.
Contact: Ian Pratt
or Keir Fraser
-
Logging, Auditing and Denial of Service prevention
Xen is a key component of the XenoServers project,
which has the grand vision of creating an "Open Infrastructure for
Global Distributed Computing". We envisage a world in which
Xenoserver execution platforms will be scattered across the globe and
available for any member of the public to submit code for
execution. The sponsor of the code will be billed for all the
resources used or reserved during the course of execution. This will
serve to encourage load balancing, limit congestion, and hopefully
even make the platform self-financing.
One big problem we're going to have is that some (many?) users of the
XenoServer infrastructure may use the system for nefarious purposes,
for launching denial of service attacks, spamming, or as a platform
for hacking other systems. We need a way of efficiently (in space and
time) logging all the network activity occurring on Xen along with the
'sponsor' that caused it to occur, keeping it safe in case forensic analysis is required. We also need to be able to detect
and filter or rate limit operations which are likely to be
dubious. For example, "limit outgoing connections to SMTP ports to be
one per user per 10 seconds".
We can also quench a large class of denial of service attack by
looking at the symmetry of the number of packets received on a
connection to the number sent: for a TCP connection you should receive
an ACK every other packet. If not, chances are you're hosing the
remote host, and hence Xen should throttle the connection until the
symmetry improves. A similar technique can be used to throttle
attempts to do port scanning from XenoServers.
The goal of this project is to implement some of the detection and
throttling mechanisms described above, and maybe think of a few more.
Contact: Ian Pratt
-
Copy-on-write Filesystem
The Xen Virtual Machine Monitor shares the storage capacity of
underlying disks between guest operating systems (guestOSes). In many cases,
however, several guestOSes may be running a similar system image (e.g. a
several virtual machines each hosting apache on linux) and it would be
nice to minimize the storage cost where possible.
One way of achieving this is to provide a copy-on-write file-system
where each guestOS has a private copy only of those files it creates or
modifies and retains a shared copy of all files it accesses read-only.
Work within the SRG is already close to achieving this in the context
of NFS, but it would also be interesting to build such a system without
requiring interaction between (potentially competing) virtual machines.
This project would implement a copy-on-write filesystem for Linux / XenoLinux
which allowed the use of an ordered set of partitions / devices to hold a
single logical file-system image, somewhat like "nested mounts" in BSD.
The overall system would be evaluated in terms of time overhead and
space saving for a set of common application scenarios. If time permits
(and code is ready) a comparison may also be made against the NFS-based
scheme.
Contact: Steven Hand
-
Querying File System
Contemporary filing systems map names to files by using a
hierarchical lookup scheme built from nested directories. This allows
a relatively efficient name resolution algorithm along with
considerable flexibility for the user. Search paths or naming
conventions may be used to prefer a given version of a file.
In many cases, however, users would prefer a scheme which allows
them to retrieve or specify files based on queries, boolean
expressions relating to meta-data such modification date, owner, type,
keywords, etc. Such operations may return 0, 1, or many files.
This project will investigate suitable schemes for both on-disk and
in-core data structures to support a query filing system (QFS). A
prototype implementation will be developed at user-space level (
e.g. under Linux using the UserFS library). As a possible extension,
information about queries could be used to provide "hints" to the
buffer cache and prefetching scheme and so improve performance.
Note: this project has been done successfully a couple of times before;
interested students may like to look at the Part II project dissertations
in the library for accelerated background reading.
Contact: Steven Hand
 |
Hardware projects
-
Fun with solid-state accelerometers
It's now possible to buy miniature 2 axis accelerometers that output
very accurate acceleration data in digital form. This can be
integrated to determine relative velocity or even position. This
project would involve building some simple hardware (perhaps
interfacing to a serial line on a PDA), and writing software to use
the acceleration data. For example, you could do a gesture based UI,
auto-updating map, magic pen, tracking system for a radio controlled
car etc...
Contact: Ian Pratt
-
Software AM radio / Location using AM radio signals
High-speed analog to digital converters now mean it is possible to
digitize the entire AM radio band, then performing the 'tuning' in
software on a PC. The first phase of this project is to assemble the
hardware and software to build such a "software" radio (we have a
high-speed ADC PCI card that could be used).
One direction the project could take would be to use AM radio signals
for location: AM radio stations emmit a very stable carrier frequency
from a fixed location (the transmitter). By using a digital radio to
monitor phase changes in the carrier frequency, it should be possible
to tell whether a mobile device has moved towards, or away from, the
transmitter. By tuning into several AM stations simultaneously it
should be possible to recover accurate position information.
This project could take two forms: A student interested in
hardware could build the digital radio, or perhaps use a standard PCI
card with high-speed ADC. Alternatively, the work could be done
entirely in simultaion. Either way, there's loads of cool DSP
stuff to be done.
Contact: Ian Pratt
 |
Security projects
-
Secure packets
The goal of this project is to design and build a proof of concept
implementation of a networking protocol that would enable
"opportunistic encryption" to be utilized whenever two Internet hosts
wished to communicate. The aim is to achieve this in as efficient a
way as possible. For example, the payload data contained first few
packets of a new flow would be encrypted using a public key cipher
while a key exchange takes place piggy-backed on top of the data
packets to enable a symmetric cipher to be used for the remainder of
the flow. Multiple TCP and UDP connections between the same hosts
could make use of the same tunnel, which would be timed-out after
being idle for some period.
A key extension to the project would be to prevent a class of denial
of service attack whereby a host or hosts bombard a receiver with
flow-setup packets that it needs to decipher using an expensive PK
algorithm. One way to try and avoid this problem would be to use
"crypto puzzles", whereby the sender needs to do a significant amount
of computation in order to generate a ticket that is attached to the
packet. The puzzle is asymmetric in that the receiver is able to
easily check whether the ticket is valid for this particular
packet. Hence, it's much harder to create a flood of packets that
overloads the receiver.
Contact: Ian Pratt
Suggestions from 
-
Ad-hoc Name Space Service
Very often we want to communicate or locate other people using our
laptops or PDAs while in indoor environments, for instance, in a meeting
room or inside a large building. It seems an easy task when network
infrastructure and services to support this are locally available.
However, sometimes this is not the case either because no infrastructure
can be found or simply because we don't want to rely on external
services such as centralised messenger servers.
With 802.11 wireless networks in peer-to-peer mode, we can set up
an ad-hoc network so that machines can communicate directly with each
other without using an Access Point. Although this gives us the
necessary communication platform, basic services still need to be made
available.
This project aims at developing an ad-hoc name service
for ad-hoc networks that can be used by applications such instant
messaging. This service should be designed and implemented on top of
a distributed tuple space, similarly to Linda (see D. Gelernter,
"Generative communication in Linda", ACM Computing Surveys,
7(1):80-112, Jan. 1985), where each participating node holds parts of
the entire tuple space. A tuple is a data structure consisting of one
or more typed fields. ("foo", 75, false) is an example of a tuple with
three typed fields: a string ("foo"), an integer
(75) and a boolean (false). The Linda framework offers
operations to insert/delete tuples to/from a tuple space. To validate
the prototype implementation of this service, a simple instant
messaging application may be developed.
If time allows, as an optional task, this name service may be
implemented in a sensor network that we have available at Intel Research
Cambridge.
Contact: Marcelo Pias, Christophe Diot
or Derek McAuley
-
Hash Functions in Networking - Performance Evaluation Kit
Hashing is used extensively in contemporary networks, to improve
performance of various lookup and memory access schemes, to provide for
load balancing among multiple targets, and for various other tasks.
Although selected publications exist, when designing a particular
application, it is often not easy to determine which hash function suits
best the particular task and the particular input data distribution.
In fact, it is generally impossible to evaluate a hash function on a
theoretical basis only, as for any deterministic hash function, such
input data patterns can be constructed that would cause the hash
function
to perform poorly. Thus, a hash function can only be evaluated against
data exhibiting similar properties to the data that the application is
destined for.
The authors have interest in hash functions as apply to network traffic
and propose that this project develop a framework for quick testing and
evaluating any hash function against a representative set of networking
traces, being rapidly able to compare the following criteria:
- computation overhead
- number of collisions in various table sizes
- entropy of input and output data
- comparison to the best-known hash functions
Pre-requisites: basic programming, basic information theory
Additionally a knowledge of MATLAB helpful although this is not a
prerequsite; access to MATLAB will be provided if necessary.
Representative network trace data will also be supplied.
Contact: L.
Kencl (Intel) or A. Moore
(Cambridge University)
-
Sensors for the Evaluation of Football Rules - Demonstrator and
Visualizer
The goal of this project is to develop a demonstrator of the usefulness
of deploying sensors to monitor the movement of participants and of the
ball in ball games. The work would consist of two parts:
- simulate the collection of the input data from an existing match
(processed video footage or available collected robot match data);
- develop and implement scenarios for analysing the data and providing
real-time feedback to the referee, coach, spectators, about various
aspects of the game (rules confirmation, players activity, etc.).
It may be possible to obtain additional data from LCE regarding their
robot football tournaments. There is also an option to process still
images from exisiting games using the
Virtual Referee
Tool developed at the TECGRAPH Laboratory at PUC Rio de Janeiro:
The tool would be applied to stills of a football video footage to
obtain the sample data for the project.
Pre-requisites: basic programming, ball games knowledge helpful
Contact: L. Kencl (Intel)
-
Query Optimization for Record Streams
Many network measurement applications (route monitoring of BGP, OSPF,
IS-IS, TCP trace analysis, HTTP trace analysis) operate on streams of
data objects that can be thought of a records. When processing
streams at high data rates or large streams stored in trace archives,
it is important to make data parsing and filtering as efficient as
possible. Most applications are not interested in all available data,
but only in a subset of records that can be identified by some Boolean
predicate over the attribute values of each record. Stream processing
can be greatly improved by using such predicates as input to a "query
optimizer" that parses the minimal number of required to determine the
result of the predicate.
This project will involve:
- Design of a simple specification language for record formats.
- Design of a simple specification language for the "atomic
predicates" required each attribute of a give record format
specification. (Example: "use regular expression matching for
attribute FOO.")
- Design and implementation of a "query optimizer" that optimizes
the stream filtering of formatted records with respect to an arbitrary
Boolean queryconstructed from atomic predicates.
- Testing the system on a collection of network trace data.
Contact: Tim Griffin (Intel)
|