This project is concerned with understanding how files shared in such a manner behave over time. It consists of two parts:
Part A: You will read the bittorrent source and provide a description of how the protocol works. The intention of this model is to identify specific factors (i.e. the arrival/departure rates of users, access bandwidth of clients, file size, etc.) that effect how long an object will remain available. For instance, a large increase in access bandwidth might potentially break such a scheme as downloads would complete too quickly for a community of active clients to persist.
Part B: You will validate your model by collecting real data on existing bittorrent sessions. This part will involve writing monitoring tools to watch bittorrent 'in the wild'. It will be important to attack this part of the project very early in the year, in order to have an opportunity to collect a reasonable amount of data. As this is likely to involve active monitoring (your monitor will connect to live torrents), careful design will be required address concerns such as the maximum bandwidth consumed.
Thanks to Andrew Warfield for thinking of this one
Data Mining can involve the training of classification schemes to recognise desirable data and then operating these classifiers upon databases inorder to locate the data of interest.
A useful data-mining framework already exists called Weka. However, Weka does not yet implement QDA (Quadratic Discriminant Analysis). This project would rectify the situation.
Weka is writtern in java and has copiuos documentation describing its interfaces; what is required is that the student recognise the difficulties that the QDA algorithm posses and design an appropraite strategy to cope.
An implementation of QDA (non-weka) is available for cross-validation.
Note: this project requires a student that is not scared by the java programming language and is comfortable with mathematics.
QDA is described in Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.
Excellent materials covering both the (validation) implementation of QDA and the QDA technique itself are available in the University Stats Lab.
Data Mining can involve the training of classification schemes to recognise desirable data and then operating these classifiers upon databases inorder to locate the data of interest.
A useful data-mining framework already exists called Weka. However, Weka is not designed for Real-Time operation, this project would attempt to rectify this situation.
Weka is writtern in java and has copiuos documentation describing its interfaces; what is required is that the student recognise the difficulties with real-time code implementations, particularly in Java, and design an appropraite strategy to permit use of this system.
The objective would be to provide a library of operations that allow a subset of Weka operations to be incorporated into other code. Students will also need to validate their approach as to be real-time the approach must possess some time-bounded properties; the program cannot make a library call that might never return.
Note: this project requires a student that is not scared by the java programming language. A familiarity with C will assist in creating an appropriate library.
The systems research group is continuing to develop and do research using Xen, but there are a couple of areas which we feel would make good self-contained student projects:
To this end an existing system for allowing the simulation of physical networks: DummyNet, is a perfect match to be incorporated with Xen. It then becomes feasible to construct a micro-planetlab: a network of hosts with Internet-wide properties (latency, delay etc)
It is anticipated that a DummyNet system would itself be a VM with virtualised network interfaces offered to other domains on the system; the DummyNet virtual machine can then define (and change) the parameters of each interface: bandwidth, latency to given destinations, etc.
For the emulation of a network, a characterization of the properties of that network are essential. To this end, in addition to implementing DummyNet-type functionality within Xen, the student will need to understand it's limitations and perhaps, time-permitting, explore mechanisms to overcome those limitations.
The link for Xen is above and this will find DummyNet.
Note: this project requires a student that is comfortable with the C programming language.
For this project, you will take advantage of existing code that allows interception of disc messages to build some examples of virtual discs. You will start with an (easy) initial example of building either an encrypted or compressed file store as a virtual disc. If this is successful, there will be plenty of opportunity to extend the functionality to include copy-on-write and "time-travel" discs.
Thanks to Andrew Warfield for thinking of this one