Project suggestions from the Systems Research Group
This page collects together various ACS project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address/website given; please get in touch if you want more information about the project.
Under construction: please keep checking back, as more ideas will hopefully be added to this page during the coming weeks.
Building an Energy Efficient, High Throughput Router from Simple, Low Power Switches
Thanks to network equipment manufacturers like Cisco and Juniper, approximately every 18 months new Internet core routers are introduced that are:
- very expensive (current Cisco CRS systems cost several million dollars each);
- very feature rich (there is an increasing focus on enterprise application support);
- very high bandwidth (standard line rates are 40 Gpbs and 100 Gbps are available); and
- very power hungry (current Cisco CRS systems consume hundreds to thousands of watts).
One of the most significant challenges facing equipment manufacturers today is heat dissipation in the custom ASICS that are at the heart of these systems. This challenge is currently being addressed through multi-chassis implementations but the specter of water-cooling looms, which will escalate costs dramatically. This motivates the question â are there alternative designs?
Of less interest to most people in the network research community, is the fact that at the other end of the performance spectrum, new networking equipment targeted for home use is routinely introduces that is:
- very inexpensive (typical Ethernet/WiFi-enabled switches cost less than $50/ea.);
- fairly feature rich (many are Linux enabled which allows OpenFlow/NOX use);
- modest bandwidth (typical devices have several 1 Gbps ports); and
- very low power (typical devices are rated around 10W).
This project proposal seeks an answer to the following question: Can simple, low power switches be configured to create a high throughput router that consumes significantly less power than today's high performance systems?
The key elements of this project include:
- Specification of the details of the design objectives for the envisioned switch (throughput, features, power consumption, etc.);
- Identifying a extensible switching network design that accommodates standard commodity switches;
- Developing and testing a NOX/Openflow configuration for standard Linux-enabled switches that meets the specified design targets.
Attack Vulnerability of Temporal Networks
While network theory has been used for decades to study computer systems and human social interactions, with the rise of massive online social networks and with the emergence of mobile wireless systems it becomes important to understand how these graphs change over time. For instance, social interactions happen with certain periodical temporal patterns, whereas in mobile systems communication links may disappear because devices go out of range. At the same time, it is difficult to understand how temporal variability affects network robustness, and many questions arise: what happens when a malicious attacker disrupts the network by impersonating genuine users or destroying real devices? what are the time instants when such a temporal network is more fragile? can we find strategies to better protect these systems?
We have developed a theoretical framework to assess how these dynamic networks can be represented and studied as time-varying graphs: as communication and interaction links appear and disappear, the graph changes its structure. Standard graph properties such as paths and distance have been generalized to take into account how the graph changes over time. Finally, we have defined a measure of temporal robustness which quantifies how much such a time-varying graph is affected by some nodes randomly failing to communicate.
We want to shift our focus from random failures to intelligent attacks. We want to investigate how an adversary with partial or complete knowledge of the temporal network properties can choose to attack a portion of the system to disrupt it: this analysis will allow us to understand how real systems can be better protected by providing the necessary redundancy and how network behaviour can be disguised or optimized in order to offer less knowledge to the attacker.
This project involves: definition and investigation of potential strategies to attack temporal networks theoretical analysis of attack vulnerability on random network models analysis of mobile networks via existing network simulators analysis of real-world mobile/social traces, with comparison with models previously studied [optional] investigation of distributed attack-detection protocols for mobile systems [optional] recommend mobility patterns that can improve the robustness of the network
Skills involved: knowledge of (mobile) wireless systems, statistical modeling, network/graph theory
Culturally-profiling Facebook users
The prevalence of social networks provides vast amounts of data about people's interests and likes that can be processed and analysed. This project is concerned with the transformation of Facebook profiles into cultural profiles. Facebook profiles contain text detailing a user's favourite books, music, films, and TV shows. Most of the entries are specific titles, artists, songs, etc. These items can be easily translated into genres/subgenres, through services such as IMDB/Amazon/etc., and so it's possible to collect a set of genres/interests for each user. Ultimately creating a cultural profile in the form of a matrix of users to genres/interests. The additional mining of a user's posts/likes/comments would provide even more detailed information and a even temporal dimension. The application of network theory to this aspect would facilitate more meaningful analysis. An investigation of the variety, similarity and covariance of people's profiles and their friends is the expected outcome.
The student working on this project will write a program that will use the Facebook API to quantify the preference information that users report in their profiles (e.g. favourite bands, films, books, etc.). Authorisation for the dataset has already been arranged through researchers in the Psychology department. The program will be run on a large sample of approximately 3 million users, leading to scaling issues and probably requiring a distributed solution. Researching and applying current techniques on the data will allow us to gain an understanding of what people like. Furthermore, how these features are affected by a user's friends and their friend's actions should prove interesting.
There are several projects to work under the umbrella of ErdOS. There are opportunities both for MPhil and Part II students depending on the research required by them. There are other possibilities in addition to the ones described below. Please, do not hesitate to contact us for further details (Jon Crowcroft and Narseo Vallina-Rodriguez).
ErdOS Energy Profiler
The resources available in a mobile handset have different power states and energy demands that might depend on the handset context (e.g. location impacts on the energy consumption of cellular nets). We would like to know how much of the energy consumption accounts for each individual resource and applications. We want to develop an energy profiler that uses exclusively information available in Android handsets (e.g. Kernel and Android API), however, the framework obtained could be also used for other embedded devices such as sensors. We have indications about the right variables that we should look at but more research is required. Once the model has been built, it is necessary to validate the results with the ones obtained with a high-resolution external measurement tool. As a result, we could obtain an energy profiler that estimates the relative energy consumption of each application individually, customised for each handset and without using inaccurate offline information and external tools. That project might require some research and working at the Kernel level for Android OS. Good knowledge of Java and C++ is recommendable, Previous experience on compiling kernels might be useful.
Keywords: Mobile computing, resources management, energy allocation, energy profiling
Difficulty: Medium - High
ErdOS Activities Profiler
ErdOS profiles energy at an high-level abstraction called users activities (e.g. working, going to the pub with friends, ...). The goal behind this design is being able to predict future energy demands at those activities by monitoring what kind of applications are executed by the users at those scenarios and also the way users interact with their handsets (when and where they charge the handset, when they set the handset in airplane mode, etc). It will help ErdOS to efficiently allocate resources both in local and remote handsets proactively. It requires understanding how applications demand OS resources (intercepting system calls) and mapping them to the right activity inferred from contextual information (e.g. location and time). That project can be split in two sides, the implementation and the research one (context-awareness). It might require working at the Kernel level for Android OS. Good knowledge of Java and C++ is recommendable. Previous experience on compiling kernels might be useful.
Keywords: Mobile computing, resources management, context-awareness
ErdOS Resources discovery protocols
ErdOS devices must be able to discover neighbouring resources quickly and in an energy efficient way. However, current wireless interfaces such as Bluetooth have a time consuming protocol that can require up to 60 seconds to perform an enquiry. A potential way of performing fast discovery is by adding information of the available resources and the resources demands of the handset on the radio beacons of wireless interfaces (e.g. Bluetooth and WiFi) and also by audio beacons taking advantage of the low-energy required by the microphone and the speakers available in the handset (see http://anil.recoil.org/papers/audio-networking.pdf). An implementation of those mechanisms accompanied by an evaluation of the energy consumption required by them and the execution time can be easily done. An extension can be done as identifying the triggers to perform those advertisements efficiently. It might require Linux knowledge and C/C++ programming skills to modify the drivers. Previous experience on compiling kernels might be useful.
Keywords: Mobile computing, resources management, discovery protocols, energy consumption, pervasive computing
Topic Models for Location-Based Social Network Feeds
We have access to a dataset where users' communication content is being described, together with the geographical position of the locations where the content generation took place. In addition, there is information about a user's social acquaintances.
The focus of the project will be processing and analysing the text describing the user communication, making use of the location-based information as part of the analysis. For example, one possibility would be to build a topic model analysing the semantic topics contained in an individual's communication history, based on standard topic models from the language processing literature, but also include variables in the model which represent the location of the user. The hypothesis would be that the location of a user affects the topic of the user's discussions. In addition to the analysis of individual users, another aspect of the project could consider identifying the topics discussed at given places or areas of a city. That would require the analysis of text generated by multiple users at proximate locations.
Remarks: This is a good opportunity for a student to work with an exciting, cutting-edge social networks dataset, and to combine natural language processing techniques with social network analysis. This is a collaborative project between the Natural Language and Networks research groups.
Differentially Private Open Society
Contacts: Daniele Quercia and Jon Crowcroft
Nowadays the public expects to have access to government data and social media data. Governmental organisations and web 2.0 companies enthusiastically support the public, saying that the point of open information is not merely to expose the world but to change it. World-changing technologies often require people to ponder the ethics and safety of the technologies, and the open society is no exception: open-society-enthusiasts have conceded that looming privacy issues need to be resolved first. Researchers have been recently working on a promising privacy-preserving technology called "differential privacy". To see what "differential privacy" means, consider the case of a dataset containing location data. The dataset is stored in a database that everyone can query. The response to a query is "differentially private" if the response does not "change much" depending on whether an individual is or isn't in the dataset; as such, the presence or absence of individual records is concealed. The good news is that differential privacy gives us resilience to linkage attacks. The bad news is it tends to hide outliers in the dataset, might be infeasible in certain situations, and needs to be adapted to the specific measurements one needs. The goals of this project are: (1) construct differentially private versions of the measurements that are typically performed on different types of data (location data, rating data, and social network data); and (2) empirically test the new versions of the measurements on real sets of data that are available in our lab. The project is difficult but its potential impact on the open society might be huge.
Requirements: willingness to work on a challenging project is essential; strong background in Math is required; prior experience in programming is a plus.
Private Sentiment Analysis of Groups
Contacts: Daniele Quercia and Jon Crowcroft
The purpose of sentiment analysis algorithms in Twitter is to classify the sentiment (either positive or negative) of Twitter messages. Existing algorithms classify Twitter messages of a single individual and are not able to classify messages of a group of individuals. The goal of this project is to study how sentiment analysis of a group's messages might be used for personal and group reflection and awareness. More specifically, this project proposes to build new algorithms that: (1) make the messages that express sentiment/mood/well-being fully private; (2) aggregate the "private" messages to support group awareness; (3) exploit the temporal similarities among users to support personal awareness. Requirements: prior experience in Machine Learning is required; prior experience in programming is a plus.
Encryption and authentication for component-based middleware
Supervisor: Alastair Beresford
Originator & Day-to-day supervisor: David Evans
Component-based distributed systems consist of many independent pieces of software interacting via communications networks; many components are then typically connected to accomplish a useful task. Data that flow between the components may be commercially valuable or contain private information, meaning that it should be encrypted to prevent unauthorised access. Furthermore, components should authenticate their peers, ensuring that they are talking to the correct component; man-in-the-middle attacks should be impossible; and it should be difficult to interfere with the communication between any two components by inserting, removing, or re-ordering messages.
The TIME project has supported the development of a middleware where components communicate using long-lived TCP connections. The middleware currently lacks authentication and encryption and this project seeks to address this problem by implementing them.
The choice of appropriate encryption and authentication schemes will require some careful thought. For example, the middleware supports third parties diverting an existing connection between two components, perhaps because one or more components needs to be replicated, hardware replaced, and so on. What does this mean for authentication and encryption? How should the business logic of components see all of this, including expressing their authentication needs?
Work on this project should begin by examining the literature to see what existing solutions might exist (X.509 certificates and Kerberos are two good places to start). While a solution doesn't have to be based on an existing scheme, they should inform the design. The resulting scheme should be implemented as part of the SBUS / PIRATES middleware. (You might also like to look at the following paper.)
SBUS/PIRATES consists of about 25,000 lines of C++ and has been used on various flavours of Linux and MacOS, so you must be adept with C++ and Unix-style environments.
Implementation of an information-centric Linux-based protocol stack
Supervisor: Jon Crowcroft
Originator: Dirk Trossen
The FP7 projects PSIRP (www.psirp.org) and PURSUIT (www.fp7-pursuit.eu) have been designing and developing an information-centric internetworking architecture that is based on information labelling and a pub/sub service model on network layer. The projects' aim is a long-term effort to replace the current IP protocol stack with a novel approach that is centred on information dissemination rather than endpoint connectivity. The PSIRP project has been successfully finished in September 2010 with the PURSUIT project continuing its work until the end of 2012. Currently, a prototype implementation under FreeBSD has been developed during the project efforts, currently being deployed in a small-scale test bed.
The aim of this project is to develop a design and implement a Linux-based version of the node architecture. This implementation is to be tested within the established test bed both alongside but also separate from the existing FreeBSD implementation. The project efforts provide sufficient architecture design documentation for this task.
In addition to systems programming experience (C, UNIX and large projects), the candidate would be helped by having Linux-specific experience. You will work alongside senior project members, including the technical lead of the project, so you will be able to receive day-to-day advice and guidance.
Supervisor: Jon Crowcroft
Originator: Dirk Trossen
The NORS platform (www.sourceforge.net/projects/nors) provides a platform for wireless sensing researchers to implement mobile-centric sensing scenarios that exploit the wealth of information that directly resides on mobile devices as well as the information being able to collect from connected devices. NORS was developed under the Java MIDP framework, i.e., it is able to run on any mobile device that supports this Java framework. A notable exception of a major phone platform that does not support Java MIDP is Android (the same is true for iOS) although the platform allows for Java-based programming.
This project aims at porting the current NORS platform onto Android as a native application. This enables the direct exploitation of this increasingly available software platform. In addition to porting the main MIDP interfaces, certain supported sensor handlers (which implement the sensor-specific functionality) are to be investigated, e.g., for supporting accelerometer and other built-in sensor information. The port will be utilised in a TSB-funded project called PAL (www.palproject.org.uk) which utilises the wealth of gathered information from mobile devices (and attached sensors like ECG and heartrate monitors) for improving lifestyle management, e.g., for stress reduction.
In addition to systems programming experience (Java, C, and large projects), the candidate would be helped by having Android-specific experience. You will work alongside senior project members, including the technical lead of the project, so you will be able to receive day-to-day advice and guidance.
Decentralised Community Detection and Deployment of Forwarding Algorithms over Haggle
Originator: Eiko Yoneki
BUBBLE RAP introduced in  is a forwarding algorithm for Delay tolerant networks, which uses information of community and graph centrality. It consists of RANK and LABEL strategies. RANK delegates messages to high centrality nodes, while LABEL selects a next hop from any member of the destination community as a relay. The BUBBLE algorithm showed a significant improvement in forwarding efficiency over oblivious forwarding, and the other algorithm, which uses patterns of movement rather than the longer term social relationships that our scheme infers. BUBBLE works in a decentralised world without access to infrastructure. Therefore the distributed detection and dissemination of node popularity and node communities, and the use of these for forwarding decisions are crucial. This project aims at implementing distributed community detection in Haggle framework  and deploying BUBBLE forwarding algorithm. Integration work will be carried out using Xen based Haggle simulator including generation of a few different types of networks (e.g. small world, and emulation of real world trace). Performance testing in terms of packet delivery compared to prior schemes should be demonstrated, which is expected to show the strength of social based forwarding algorithms in certain types of networks.
A brief description of the work items is listed below.
- Implement distributed community detection (i) Using "SIMPLE" algorithm described in ; (ii) Using algorithm described in  with an extension of decentralised approach
- Implement BUBBLE forwarding algorithm  in Haggle
- Implement network generator (e.g. small world, inter-contact power-law etc.) within Xen Haggle simulator
- Integrate all above
- Performance testing in different network setting
 EU FP6 Haggle Project http://code.google.com/p/haggle/
 P. Hui, E.Yoneki, S. Chan, and J. Crowcroft. Distributed community detection in delay tolerant networks. Proc. MobiArch, 2007.
 P. Hui, J. Crowcroft, and E. Yoneki. BUBBLE Rap: Social Based Forwarding in Delay Tolerant Networks. MobiHoc, 2008
 I. Leung, P. Hui, P. Lio, and J. Crowcroft. Towards real time community detection in large networks. Phys. Rev. E, 79(6), 2009.
Modelling Spread of Rumours between Communities
Originator Eiko Yoneki
Our previous work  reported a preliminary investigation on interactions between networked social communities using the Ising model to analyse the spread of rumours. The inner opinion of a given community is forced to change through the introduction of a unique external source, and we analyse how the other communities react to this change. We model two conceptual external sources: namely, Strong-belief and Propaganda, by an infinitely strong inhomogeneous external field and a finite uniform external field, respectively. In the former case, the community changes independently from other communities, while in the latter case also according to interactions with other communities. We applied our model to synthetic networks as well as various real world data ranging from human physical contact networks to online social networks. The experimental results of applying the Ising model using real world data clearly demonstrate two distinct scenarios of phase transitions.
This project aims at validation of the above model to examine if such Strong-belief and Propaganda will be exhibited in real situations. Thus, the project work requires deploying a series of the experiments, e.g. an online and mobile phone based software to share opinions on specific topics within the given time, and monitor the change of individual opinion in time series by controlling information available to the individual. This will capture the dynamics of opinion change and investigate validation of our Ising model. The experiments can start off with the simplest condition and along our development of extending work such as coexisting two opinions (i.e. different opinions coexisting in the society), the software should be also extend for appropriate validation. The Ising model for understanding interactions of communities and subsequently modeling rumour propagation is a unique approach that will give significant input for modelling information propagation in social networks. The result of validation can be compared with voting model.
 M. Ostillia, E. Yoneki, I. Leung, J. Mendes, P. Lio, and J. Crowcroft: "Statistical mechanics of rumour spreading in network communities", International Conference on Computational Science, ICCS, 2010.
 M. Ostilli, J. F. F. Mendes, "Communication and correlation among communities". Phys. Rev. E 80 (011142); and M. Ostilli, J. F. F. Mendes, "Small-world of communities: communication and correlation of the meta-network". J. Stat. Mech. L08004 (2009).
 M. Ostilli, E. Yoneki, I. X. Y. Leung, J. F. F. Mendes, P. Lio, J. Crowcroft, "Ising model of rumour spreading in interacting communities", Tech.Rep. UCAM-CL-TR-767, University of Cambridge (2010).
Integration of Parallel Computation through Task Farming in Skywriting Haggle
Originator Eiko Yoneki
MapReduce  and Dryad/LINQ  have facilitated the exploitation of multiple machines in a computer cloud. In order to provide more expressive coordination language for different classes of computation, including iterative and recursive algorithms, a project called Skywriting  is in progress in our systems research group, applying a purely-functional script language for describing distributed computations.
Mobile networks formed by smart phones are time dependent dynamic networks with high node mobility, and Skywriting programming can be exploited in such dynamic network environment, which is powerful for cooperative task farming. For example, the map function could be exploited based on time-space graph of a social network to achieve an efficient distributed computing. The logical network topology could be obtained from various sources such as online social networks (e.g. Facebook, Orkut), and such a social graph could be transformed to a task graph, which indicates the order of operation. Efficient task graph generation based on given distributed computation is an interesting topic to explore. In our previous work , we have introduced and motivated crowd computing in disconnected mobile networks , which combines mobile devices and social interactions to achieve large-scale distributed computation. Mobile networks offer substantial aggregate bandwidth and processing power. The analysis shows that an upper bound on the amount of computation is possible in such networks.
This project investigates a practical task-farming algorithm in Skywriting programming that exploits social structure. The prototype of Skywriting runtime to the Android operating system over Haggle platform  has been implemented and the project can be built on top of this prototype and extended its runtime. Testing with the Android devices is essential, and using Xen-based Haggle simulator can be used for a large scale testing.
The intersection between networking and programming languages is becoming a popular research topic. The proposed project aims at building a new paradigm, where networking and programming are well integrated, with special emphasis on mobile networks.
 D. Murray and S. Hand, Scripting the cloud with Skywriting, USENIX HotCloud, 2010.
 D. Murray, E. Yoneki, J. Crowcroft and S. Hand, The Case for Crowd Computing. ACM MOBIHELD, 2010.
 EU FP6 Haggle platform http://code.google.com/p/haggle/
 J. Crowcroft, E. Yoneki, et al. Promoting Tolerance for Delay Tolerant Network Research. ACM CCR, 38(5):63-68, 2008.
 J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters. USENIX OSDI, 2004.
 Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K.Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. OSDI, 2008.
Opinion and Behaviour Dynamics in Social Networks
Originator: Eiko Yoneki (with Derek Murray)
Large-scale social network communities, where millions of users form a dynamic infrastructure to share multimedia content, have brought new entertainment experiences. Such Network-shared User Generated Content (UGC) is becoming increasingly popular due to the emergence of Web 2.0. YouTube is one of many UGC websites, where the popularity of content is driven by viewers and varies dynamically. This project investigates understanding information flow, decision making process, and behavioural dynamics based on rumour, social topology, and given information in such environments and modelling how people react with limited/unlimited information surroundings. Thus, the investigation covers behaviour modelling and analysis from an opinion dynamics perspective and can provide an answer to questions such as "do people change their opinion based on similar to page-ranking algorithm?" or "what factors have the largest influence?".
The project can be tackled in the following ways:
- Crawling Twitter.
By crawling followers in Twitter, you can build a directional graph as social networks. You can also extract information about the interaction graph by looking at individual tweets that contain "@<username>". There are a few interesting, but disjoint, semantics for this: it can be part of a conversation between users, it can be a way of sending an open message to another user (e.g. people write messages with @stephenfry and he receives (and gets upset about!) them even if he doesn't follow the user), or it can indicate that there is a non-virtual connection between the users (e.g. mrry: just went for burritos with @eikoy). It would be interesting to see what kind of social graph we can infer from this.
Then you have the spread of things like retweets and hashtags. For example, you could analyse the data to see whether certain users (or users with a certain characteristic in the social graph) are successful at spreading hashtags through the social graph. You could also see what about the graph causes a user's tweets to be retweeted (my speculation: people with a large number of followers and who follow a small number of people will have a high probability of retweeting, but there may be other interesting cases). Correlation between hashtags or degree of people or position of people in the graph could be interesting to explore. You could also look at the spread of links to YouTube videos.
You can also write a Facebook application and investigate similar ideas described above.
- Writing a role-playing game
Millions of people worldwide play interactive online role-playing games, forming complex and rich networks among their virtual characters. Such games could project the real world and predict what might occur. For example, an unexpected outbreak of an infective communicable disease (unplanned by the game creators) occurred when a role-playing game was played in the virtual world. This outbreak holds surprising similarities to real world epidemics .
You can write role-playing game application over mobile devices and web and obtain the information of behaviour of the players under the different conditions (e.g. provide other players activities or characteristics of the other players). The game could be "what's the most popular music?" or "what's the best mobile phone to buy?" or similar to any popular game in the market.
Reading Vinayak Dalmia's masters thesis on 'Investigating Herding Behaviour in an Experimental Setup' (Judge Business School) will be a good starting point; and papers  and  will be good general background reading.
References and Recommended Reading
 Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.
 Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.
 Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.
 Sastry, Yoneki and Crowcroft " Buzztraq: Predicting geographical access patterns of social cascades using social networks". EuroSys workshop on Social Network Systems, 2009.
Sampling Strategies for Measuring Epidemic Spread
Originator: Eiko Yoneki
The epidemic spread of infectious disease requires quantifying and modelling disease development in time and space and quantifying patterns of disease and sampling for disease in populations is challenging. The recent emergence of wireless technology provides a unique opportunity to collect precise human connectivity data. For example, people can carry tiny wireless sensors (< than 1 cm^2) that record dynamic information about other devices nearby. A post-facto analysis of this data will yield valuable insight into complex human interactions, which in turn will support meaningful modelling of understanding networks.
Interestingly infectious disease epidemiology and proximity-based communication in wireless networks share common research issues .
However, it is challenging to deploy an experiment to collect such human connectivity data considering not everybody can be equipped the device, which is capable to collect the information. Conducting such experiments is expensive, as each participant must be given a device (typically a mobile phone, or Bluetooth sensor), and it may also be difficult to find willing participants. How many devices (e.g. mobile phones) are necessary to operate an epidemic spread simulation in the city of Cambridge, where the population is 130,000. The problem can be interpreted as how to find giant component (i.e. connected sub-graph in human contact networks) from a given population. A challenge is that a human contact network is not a random network
There is a series of human contact data including CRAWDAD  and project Haggle , which is available for this project. If a specific data collection can provide significant information, such experiment can be carried out within the project. The following two strategies can be investigated:
- Investigate different methods of sampling a population so that the resulting sub-graph exhibits the same macroscopic properties. Methods may include randomly discarding nodes from a graph to discover the critical point at which these properties change.
- Investigate whether static social networks (such as the friendship graph from an online social network) can serve as a useful predictor of which subset of nodes would be representative.
Note that the general notion of "sampling strategy" is similar to that discussed in papers Milan Vojnovic (MSR Cambridge) .
References and Recommended Reading
 CRAWDAD: A Community Resource for Archiving Wireless Data
 Eubank, S., Kumar, V., and Marathe, M.: Epidemiology and wireless Communication: Tight Analogy or Loose Metaphor? LNCS 5151, 91-104, 2007.
 Kleinberg, J.: The wireless epidemic, Nature 449, 287-288, 2007.
 Dinkar Vasudevan and Milan Vojnovic, Ranking through Random Sampling, MSR-TR-2009-2015, January 2009.
 Varun Gupta and Milan Vojnovic, Supplement to "Sampling Strategies for Epidemic-Style Information Dissemination", MSR-TR-2009-70, 2009.
Originator: Eiko Yoneki
Building function/mechanism for migrating data objects over the Haggle platform . Haggle is built over a "layerless" networking architecture that incorporates event-driven and asynchronous operations, which reside in the device as a kernel component. Functional components implement the functional logic and interact only directly with the kernel (see  for further details about the Haggle implementation). A data object is an object defined in Haggle as an entity which can move between Haggle equipped devices. Each data object contains metadata, consisting of name and attribute pairs. Metadata operations like search or forwarding depend on the size and complexity of the network.
Individual nodes in Haggle (e.g. smart phones) are powerful enough to perform substantial amounts of computing. An example of computation would be to pass around a "Sync-Ball" from node to node as they meet, and into which any node may stick the result of some computation. The main task of the "Sync Ball" project is to design a data object which is able to operate at a Haggle node, collect/append data, and then migrate to other devices. Haggle already has an Android platform implementation, and Sync-Ball will be developed over it. A challenge is that the data object should be designed for multiple purposes and some level of simple programming language for such data object is desired or design the function as part of database like operation. For example, you can design a migrating (distributed) Excel, where computations are a set of formulae each computed over a column, and each node may append zero or more rows when it has the ball
If time allows, you can add parallel operation of data objects and writing also runtime to support it. It might also be interesting to consider issues of data integrity and availability (through redundancy). Potential applications for Sync-Ball:
- Synchronisation among devices: information to be synchronised is in the data object, devices receives/updates the data then sent out to the other devices.
- Twitter entries, which are accumulated through the migration within Haggle name space until the data object is sent out to the Internet based twitter.
- Environmental monitoring object: dispatch sync-ball gathering Co_2 values from various sensor devices in the area.
References and Recommended Reading
 Erik Nordstrom, Per Gunningberg, and Christian Rohner. A Search-based Network Architecture for Mobile Devices. Tech Report Uppsala University, 2009-003.
 E. Yoneki, I. Baltopoulos and J. Crowcroft " D^3N: Programming Distributed Computation in Pocket Switched Networks", ACM MOBIHELD at SIGCOMM, 2009.
 Xerox PARC's Bayou Project
Practial Homomorphic Encryption
Homomorphic encryption is the idea of performing arbitrary computations on an encrypted piece of data ("ciphertext") without decrypting it first. In other words, the operations manipulate the ciphertext to generate another, different, ciphertext, which when decrypted produces a plaintext that is equivalent to the original plaintext with the operations applied. For example, operation "add 37" on plaintext "5" gives "42". Under homomorphic encryption, we apply "add 37" to E(5) (where E is an encryption function) and get back E(42), which when decrypted gives us "42" (the correct answer). Obviuously, using homomorphic encryption secure computation could be performed in a potentially hostile environment (e.g. untrusted servers on the "cloud"). To do this, a fully homomorphic scheme only needs to support two operations: addition and multiplication. These are sufficient to bootstrap any computation that we can imagine.
Homomorphic encryption has long (since the 1980s) been considered to be an unsolved "holy grail" of cryptography. However, recently, several working schemes [1, 2, 3] have been published, and an implementation report  as well as some code  exist. Most of the work in this space has been in theoretical Computer Science though.
Your task in this project would be to:
- investigate how much of the process can possibly be parallelised (a question that has received very little attention so far) to speed it up. Good starting points may be the KeyGen phase and some of the work in the ReCrypt phase.
- look into the possibility of alternating phases of weak, but more performant encryption (by e.g. using smaller keys) and fully secure "phases", such that breaking the key is not possible in time before the next "strong security" phase given reasonable assumptions about the adversary's resources.
- potentially integrate homomorphic encryption with the NetOS groups parallel computation framework, Ciel cloud computing framework, thus enabling secure cloud computing.
References and Recommended Reading
 Craig Gentry. Computing Arbitrary Functions of Encrypted Data. In Communications of the ACM, Vol. 53 No. 3, Pages 97-105. PDF
 Marten van Dijk, Craig Gentry, Shai Halevi, Vinod Vaikuntanathan. Fully Homomorphic Encryption over the Integers. Cryptology ePrint Archive Report 2009/616. PDF
 Nigel P. Smart, F. Vercauteren. Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes. PDF
 Craig Gentry, Shai Halevi. Implementing Gentry's Fully-Homomorphic Encryption Scheme. Website
 Craig Gentry, Shai Halevi. Public Challenges for Fully-Homomorphic Encryption. Website
Social Network Analytics (SNA)
Originator: Jon Crowcroft
Online social networks have attracted a lot of analysis because they are accessible, and because they are big enough to portrait interesting characteristics, as evolving graphs and dynamic systems. Some of the graph theory has been common across other areas, such as the study of the evolution of the Internet, the hyperlinks in the Web and the graph of authors citing others in online bibliographies.
Sociologists and Anthropologists have studied human group behaviour for over 100 years , but have had to carry out extremely expensive surveys often involving weekly interviews and diary keeping by mere 10s of subjects.
Looking at the graphs of nodes in these systems we see that edges are made from
contacts (in context) communication (email, calls, IM, sms etc) references (url, citation, quote) declarations (friend/unfriending in FB)We can think of the graph then of being a hypergraph with typed edges...
What is less well understood is the relationships between these types of edges and their formation and deletion over time - are the correlated (does meeting someone preface friending them on facebook, or the other way around, mostly?) and are they causal.
There are increasing numbers of datasets available that cover the various edge types, and this project is to carry out an empirical experiment plus build a toolset to provide analytics based on data, that could be a backend for OSN services, to build next generation add-ons to what they do (a bit like google analytics)- e.g.
- Constructing tools that help people automate managing their social net especially with a view to deciding to add friends, and what access privileges to grant them.
- Designing a "forgetting" system that allows you to gracefully "un friend" People simply by not re-enforcing the links with them in an appropriate dimension/edge-type.
- Advert relevance filtering
Now anonymized all the above so that we can share it with other researchers without (much) loss of privacy (could use eigenvector/spectral graph trick, for some sort of differential privacy level).
Privacy Architectures for Social Online Networks (PASONs)
Originator: Jon Crowcroft
Online Social Networks (OSN) have notoriously poor privacy. Not only are users frequently mistakenly giving away information to other users which may be embarrassing, or worse, lead to identity theft or fraud, but also, users give almost complete trust away to the OSN provider.
This has to stop!
There's a lot of work in this space including a sigcomm paper, Persona from which proposes a complex key distribution system with fairly baroque group keying.
More structurally interesting is work on P2P provisioning of OSNs, cutting out the privileged role of the provider - e.g.
Sonja Buchegger, Doris Schiöberg, Le Hung Vu, Anwitaman Datta "PeerSoN: P2P Social Networking - Early Experiences and Insights." In Proceedings of SocialNets 2009, The 2nd Workshop on Social Network Systems, Nuernberg, Germany, March 31, 2009.
Finally, realizing that we need some way to pay for at least some of the infrastructure, some have proposed privacy preserving systems that still allow interest matching by advertisers...see
NOYB by Paul Francis et al
This project is to build a decentralised system that has
a) good group keying mechanisms b) good privacy preserving pub/sub mechanisms c) potential for decentralised provision of the service.including migration on and off the Web
A Sustainable Interweb Framework (ASIF)
Originator Jon Crowcroft
The Internet and the World Wide Web consume 2-4% of the energy of the developed world, and are not getting any more sustainable. This must change.
There are various component level things being done to improve the energy efficiency of computers, whether clients, servers, switches or routers. However, the system as a whole can be designed to make use of energy in a far more intelligent way.
What makes life interesting and challenging is that this is not simply yet another constrained joint optimisation problem, because of the nature of sustainable energy production.
Conventional powerstations take a while to bring online and take a while to shut down. Newer sustainable sources of energy (wind, wave, tidal, solar) are dependant on time-varying, and sometimes unpredictable natural sources. This means that the supply/demand matching problem is quite hard.
Energy can be saved by rate adaptation, and by turning off unused components. However, in general there is almost always residual traffic on the net. On the other hand, most traffic is accessing content, and this content is heavily replicated, so it is possible to take low demand sites offline completely, and then potentially, to reduce the traffic t them to zero. At the same time, one can migrate services to sites where energy production is currently surplus (e.g. power-station is online and not worth turning off, or sustainable resource is active (high winds).
Similarly, one can consider cooling large data centers, and move services (c.f. Remus/Xen migration) to sites which actually need heat energy (this is already done on a long time scale, where server farms are located in places like Iceland). One needs to consider service and its associated data in some coherent way too here.
We have access to some data sets of demand patterns from large CDNs and to some network traces, and this project is about building a simulation model to test out various optimization schemes that include energy, and then extending this model to include non-monotonically varying energy resources.
For References, see this background doc
NAT enabled Multipath TCP or RTP/UDP
The Trilogy project has been trying to promote resource pooling in the Internet. One neat result from recent years is that using a small number of randomly chosen paths instead of just one path, results in a traffic matrix that is much more robust to variations in link availability and can also simplify traffic engineering. However, this needs multipath-aware flows.
Most flows are TCP based today. There are a few user space TCP implementations which are really quite good and tractable.
For a flow to use multiple paths it must be able to mark packets as belonging to the same flow, but also as using different sub-paths. This is easy for multi-homed machines, but harder for single-homed machines - it might be possible to do this by cooperating with NAT traversal schemes.
This project is to multipath enable one of these TCP flows and to investigate the extent to which NAT traversal can enable this empirically. Part of the work would entail extensive measurement of multiple paths available n the Internet.
Communication of Geographic Data
Various applications need to deal with geographic data including maps, positions of objects, and properties of areas (a car park is not a lake). These data are used as inputs to algorithms that may do routing, estimate journey times, find locations nearby having certain properties, and so on.
Our middleware supports a data type representing a point on the earth but this is just a building block. What higher-level data structures would be desirable so as to support a wide variety of spatially-aware algorithms? There has been a huge amount of work in the Geographical Information Systems (GIS) world about this; how much are these data structures optimised for computation but poor for communication?
A distributed system is not useful if it is slow or too demanding of hardware resources. For this reason, monitoring of performance and resource consumption is critical. Doing this properly is hard, particularly if one wants to insulate component business logic from the infrastructure.
We want components to be able to monitor their own performance, such as response times for RPC messages; transit time for messages that are received, processed, and sent on; CPU and memory use by the business logic; disk used in message storage; and so on. Given that the middleware supports diversion of component connections, we have some of the mechanisms needed for components to be migrated or replicated if performance becomes unsatisfactory.
A Decentralized Delay Tolerant Dropbox
Contact: Jon Crowcroft
In the Haggle project and in Horizon we consider fully decentralised ad hoc systems design to be complimentary to the Cloud. In the cloud, one of the more useful facilties recently emerging is a free open backup/file exchange service called (in one implementation) dropbox. In some scenarios, we don't have access to the internet, or we want some level of independance and privacy from provider-based systmes. Then you can envisage a system that operates across a set of user-provided storage (like P2P) but also using ad hoc wireless networks. One might mirror data, suitably encrypted, to a cloud service (or even replicate that encrypted data) for persitence. A decentralised system might bootstrap security services through social net tools using facebook to do group management and key distribution. Keys might be created from cryptographic functions of identities. Then a suitable, delay/disruption tolerant mechanism for distributing and replicating blocks of files and dirdctorties amongst a set of participating systems could be designed (e.g. resting partly on the Haggle approach). Evaluation of such a system is tricky - firstly, does it work at all of course, and what are the replication costs. Usability and attack-proofness mighr be beyond the scope of this work. A prototype on laptops and (e.g. android) smart phones would be nice.
Some Sustainability ideas from Citrix
Contact: Jon Crowcroft and Julian Chesterfield
Here are a few ideas, which are open for discussion, from a visiting engineer/researcher from Citrix, including Novel storage location, power-aware filesystems; Disk/IO throughput performance; Network based storage motion; Xen-based inter-VM disk IO; File-based virtual device drivers; and Power efficient storage location algorithms. Some more specifics:
- Based on the principles of File-based object storage for Virtual machines, design and develop a system to monitor file access and sort nodes into order of popularity in order to build a fast storage motion manager for large datasets. The final deliverable should focus on one particular workload such as VM booting, or kernel build completion time whilst migrating the underlying storage across a high latency data link. Some measurement results to demonstrate performance and correctness should also be produced.
- Develop and measure the performance of a Fast fileIO driver for interdomain communication over the Xen V4V interface. This will require implementation of a frontend FS interface handler, and a backend FS interface handler with the definition of a simple message passing protocol in between the two such as XMLRPC or custom solution. Implementation likely to be on Linux as a kernel driver (could be implemented in userspace initially). Measurement would compare performance/memory utilisation against inter-domain network = based solution such as NFS.
- Energy measurement of storage access : may involve a number of project definitions, however themes would focus on quantifying storage energy consumption in real systems proportionally to the IO workload being produced. Characterising the differences in energy consumption for different RAID solutions, disk types, data location options on physical drives, SSD storage vs traditional hard drive storage.
- Design of an efficient storage caching harddrive: SSD storage is becoming increasingly more affordable but still remains likely to be significantly more costly than traditional hard drive media. SSDs however provide excellent IO access speeds, effectively providing persistent RAM-based storage performance (e.g. battery backed NVRAM). This project would design and implement a 2 tiered storage architecture providing a fast temporary data cache on SSD storage, with an asynchronous data copying mechanism to move data ont= o larger, cheaper storage. Such a system would therefore provide an interesting trade-off between high capacity cheap large storage disks, with a smaller, faster SSD drive to handle incoming IO load. The project would involve the design and development of an algorithm to identify the best cache population approach for minimizing realtime dependency on the slower storage, and ensuring data consistency between the 2 devices. The project would produce performance comparison results of the SSD-based caching solution vs traditional disk access.
Benefits to Student:
Both project suggestions 1 and 2 will provide detailed exposure to low-level filesystem semantics and Operating system virtualisation. The student will have the opportunity to learn about inter-domain memory management and IO performance in large distributed systems. The projects have strong relevance to future virtualisation environments and especially cloud workload deployments.
Projects 3 and 4 have relevance in the understanding of power and performance of storage, particularly in data centres for IO intensive applications. As the IT industry continues to strive towards more efficient and lower power solutions, such knowledge will be highly relevant.
Running CIEL on a Single-Chip Cloud Computer
The Computer Lab. will soon take delivery of an experimental "Single-chip Cloud Computer" (SCC) from Intel. The SCC contains 48 x86 cores on a single chip interconnected by 24 on-chip routers. The system provides no hardware support for cache-coherence, but shared message-buffers are provided to permit low-latency message-passing.
The goal of this project is to explore the flexibility and performance of different communication mechanisms on the SCC. You will build on Ciel, a distributed execution platform being developed in the SRG, enabling a modified version to run on the SCC. You will then explore how communication between Ciel tasks can be optimised using the SCC's message-passing capabilities. There are a number of existing applications that run on the Ciel platform that can be used for your evaluation.
Extensions could explore thread scheduling and thread and data migration optimisations.
- An overview of the SCC can be found in this Microprocessor Report article, and further details can be found on the Intel web site. Finally, a message-passing library called RCCE is also provided.
- More information about Ciel can be found at http://www.cl.cam.ac.uk/research/srg/netos/ciel/
- Other useful references: "rMPI: Message-Passing on communication-exposed multi-core processors", J. Psota and A. Agarwal, http://www.springerlink.com/content/n4u12093t15m8913/
Cambridge Bus Fleet as a Vehicular Delay Tolerant Network
Vehicular Delay-Tolerant Network (VDTN) is a network architecture for communication between vehicles based on the concept of Delay Tolerant Networks (DTN) . Such networks experience frequent, long-duration partitioning and may never have an end-to-end path at some times or even at all times. As such a VDTN handles non-real time applications at some cost, under unreliable conditions, enabling intermittent connectivity in time-changing (temporal) network of interactions (or potential contacts), using vehicles to carry data between terminal nodes. For example, it can be applied in rural and remote areas, in emergency scenarios, in ensuring road safety and improving traffic management, in streaming media communication between vehicles, infotainment, or telematics .
In a VDTN information delivery routes vary in time depending on the vehicular network topology dynamics and routing mechanism protocols used. The latter may well make good use of the former.
The aim of the project is to study statistical properties of the contact network of the fleet of public transport vehicles which are (hypothetically) equipped with infrastructure enabling proximity-based communication. The candidate will deal with real-world historic GPS public-transport (bus) data from the city of Cambridge. One feature of these data is that they are sparse - buses record their positions with a relatively low frequency. Recently, by using interpolation and map matching techniques we have shown how to reconstruct buses' dynamics and obtain more accurate information on their behaviour . The first challenge in this project is to obtain and compare statistical properties of the proximity-based contact networks derived from sparse data and those of higher accuracy. The outcome will have important implications on the choice of data mining/reduction strategies in developing VDTN using historical data. Next step in the project will be to create a framework which would simulate VDTN routing strategies using the generic paradigm developed in  and compare their performance for the two contact networks mentioned above. Furthermore, design questions such as where in the network to optimally place static relay nodes (nodes with store-and-forward capabilities) in order to improve information delivery can be investigated using created framework.
In this project the candidate will benefit from dealing with VDTN routing problem under time-changing network connectivity patterns, working with real-world public transport data, and using the wiki of maps OpenStreetMap.
Skills required: elements of wireless communication technologies, statistical modelling, and graph theory. Programming skills: MATLAB or C++ or Java (knowledge of R is an advantage, but is not absolutely necessary). References:
-  Fall, K. (2003) A Delay-Tolerant Network Architecture for Challenged Internets. In ACM SIGCOMM'03.
-  Cooperative Vehicle-Infrastructure Systems (2010) Cooperative Urban Mobility Handbook.
-  Andrei Bejan, Richard Gibbens, David Evans, Alastair Beresford, Jean Bacon, Adrian Friday. Statistical Modelling and Analysis of Sparse Bus Probe Data in Urban Areas. Proceedings of the 13th IEEE Conference on Intelligent Transportation Systems, Madeira Island, Portugal, September 2010, pp. 1256-1263.
-  Jain, S., Fall, K., Patra, R. (2004) Routing in a Delay Tolerant Network. SIGCOMM'04, Aug. 30-Sept.3, 2004, Portland, Oregon, USA.