Computer Laboratory

Old ACS project suggestions

Project suggestions from the Systems Research Group

Applications of information flow control within Event-based systems

Supervisor: Jean Bacon

Originator: David Eyers

A growing number of applications are being built using event-driven middleware. In particular, Complex Event Processing (CEP) systems have appeared in the financial services, healthcare and retail supply-chain domains. Distributed Event Based Systems (DEBS) focus instead on the benefits that content-based message routing and the decoupling of message senders and receivers can provide in distributed application deployments.

To date, there has been comparatively little consensus on how to implement security within event-driven middleware. We have recently developed a security model that applies information flow control in a manner that is tailored to particular types of event-driven systems.

This project would involve embedding and augmenting our information flow control model within a prototype application. We have successfully applied our model within the financial trading domain, and thus would be interested to focus on other applications for which we have collaborators: in particular in healthcare and systems that track RFID tags. The project will contrast the security techniques applied within the event-driven system with more traditional access control methodologies.

Encryption and authentication for component-based middleware

Supervisor: Alastair Beresford

Originator & Day-to-day supervisor: David Evans

Component-based distributed systems consist of many independent pieces of software interacting via communications networks; many components are then typically connected to accomplish a useful task. Data that flow between the components may be commercially valuable or contain private information, meaning that it should be encrypted to prevent unauthorised access. Furthermore, components should authenticate their peers, ensuring that they are talking to the correct component; man-in-the-middle attacks should be impossible; and it should be difficult to interfere with the communication between any two components by inserting, removing, or re-ordering messages.

The TIME project has supported the development of a middleware where components communicate using long-lived TCP connections. The middleware currently lacks authentication and encryption and this project seeks to address this problem by implementing them.

The choice of appropriate encryption and authentication schemes will require some careful thought. For example, the middleware supports third parties diverting an existing connection between two components, perhaps because one or more components needs to be replicated, hardware replaced, and so on. What does this mean for authentication and encryption? How should the business logic of components see all of this, including expressing their authentication needs?

Work on this project should begin by examining the literature to see what existing solutions might exist (X.509 certificates and Kerberos are two good places to start). While a solution doesn't have to be based on an existing scheme, they should inform the design. The resulting scheme should be implemented as part of the SBUS / PIRATES middleware. (You might also like to look at the following paper.)

SBUS/PIRATES consists of about 25,000 lines of C++ and has been used on various flavours of Linux and MacOS, so you must be adept with C++ and Unix-style environments.

Social Network Analytics (SNA)

Originator: Jon Crowcroft

Online social networks have attracted a lot of analysis because they are accessible, and because they are big enough to portrait interesting characteristics, as evolving graphs and dynamic systems. Some of the graph theory has been common across other areas, such as the study of the evolution of the Internet, the hyperlinks in the Web and the graph of authors citing others in online bibliographies.

Sociologists and Anthropologists have studied human group behaviour for over 100 years , but have had to carry out extremely expensive surveys often involving weekly interviews and diary keeping by mere 10's of subjects.

Looking at the graphs of nodes in these systems we see that edges are made from

	contacts (in context)
	communication (email, calls, IM, sms etc)
	references (url, citation, quote)
	declarations (friend/unfriending in FB)

We can think of the graph then of being a hypergraph with typed edges...

What is less well understood is the relationships between these types of edges and their formation and deletion over time - are the correlated (does meeting someone preface friending them on facebook, or the other way around, mostly?) and are they causal.

There are increasing numbers of datasets avaialble that cover the various edge types, and this project is to carry out an empircal experiment plus build a toolset to provide analytics based on data, that could be a backend for OSN services, to build next generation add-ons to what they do (a bit like google analytics)- e.g.

  • 1/ Constructing tools that help people automate managing their social net especialyl with a view to deciding to add friends, and what access privilages to grant them.
  • 2/ Designing a "forgetting" system that allows you to gracefully "unfriend" People simply by not re-enforcing the links with them in an appropriate dimension/edge-type.
  • 3/ Advert relevance filtering

References:

Social Networks Project Website and this background

Privacy Architectures for Social Online Networks (PASONs)

Originator: Jon Crowcroft

Online Social Networks (OSN) have notioriously poor privacy. Not only are users frequently mistakenly giving away information to other users which may be embarassing, or worse, lead to identity theft or fraud, but also, users give almost complete trust away to the OSN provider.

This has to stop!

There's a lot of work in this space including a SIGCOMM paper, Persona, which proposes a complex key distribution system with fairly baroque group keying.

More structurally interesting is work on P2P provisioning of OSNs, cutting out the privilaged role of the provider - e.g.

Sonja Buchegger, Doris Schiöberg, Le Hung Vu, Anwitaman Datta PeerSoN: P2P Social Networking - Early Experiences and Insights, in Proceedings of SocialNets 2009, The 2nd Workshop on Social Network Systems, Nuernberg, Germany, March 31, 2009.

Finally, realizing that we need some way to pay for at least some of the infrastructure, some have proposed privacy preserving systems that still allow interest matching by advertisers; see for example NOYB by Paul Francis et al.

This project is to build a decentralised system that has

a) good group keying mechanisms
b) good privacy preserving pub/sub mechanisms
c) potential for decentralised provision of the service.

including migration on and off the Web

A Sustainable Interweb Framework (ASIF)

Originator Jon Crowcroft

The Internet and the World Wide Web consume 2-4% of the energy of the developed world, and are not getting any more sustainable. This must change.

There are various component level things being done to improve the energy efficiency of computers, whether clients, servers, switches or routers. However, the system as a whole can be designed to make use of energy in a far more intelligent way.

What makes life interesting and challenging is that this is not simply yet another constrained joint optimisation problem, because of the nature of sustainable energy production.

Conventional powerstations take a while to bring onlne and take a while to shut down. Newer sustainable sources of energy (wind, wave, tidal, solar) are dependant on time-varying, and sometimes unpredictable natural sources. This means that the supply/demand matching problem is quite hard.

Energy can be saved by rate adaptation, and by turning off unused components. However, in general there is almost always residual traffic on the net. On the other hand, most traffic is accessing content, and this content is heavily replicated, so it is possible to take low demand sites offline completely, and then potentially, to reduce the traffic t othem to zero. At the same time, one can migrate services to sites where energy production is currently surplus (e.g. powerstation is online and not worth turning off, or sustainable resource is active (high winds).

Similarly, one can consider cooling large data centers, and move services (c.f. Remus/Xen migration) to sites which actually need heat energy (this is already done on a long time scale, where server farms are located in places like Iceland). One needs to consider service and its associated data in some coherent way too here.

We have access to some data sets of demand patterns from large CDNs and to some network traces, and this project is about building a simulation model to test out various optimisatio nschemes that include energy, and then extending this model to include non-monotonically varying energy resources.

For References, see this background doc

NAT enabled Multipath TCP

Originator: Jon Crowcroft

The Trilogy project has been trying to promote resource pooling in the Internet. One neat result from recent years is that using a small number of randomly chosen paths instead of just one path, results in a traffic matrix that is much more robust to variations in link availablility and can also simplify traffic engineering. However, this needs multipath-aware flows.

Most flows are TCP based today. There are a few user space TCP implementations which are really quite good and tractable.

For a flow to use multiple paths it must be able to mark packets as belonging to the same flow, but also as using different sub-paths. This is easy for multi-homed machines, but harder for single-homed machines - it might be possible to do this by cooperating with NAT traversal schemes.

This project is to multipath enable one of these TCP flows and to investigate the extent to which NAT traversal can enable this empirically. Part of the work would entail extensive measurement of multiple paths available n the Internet.

There is a wealth of background literature available for this work from the Trilogy project

Mafia Scheduling

Originator: Steven Hand

Traditionally, multiprocessor scheduling in general purpose systems has been a minor delta on uniprocessor scheduling: the focus typically has been on dealing with the trade-off between processor and/or cache affinity on the one hand, and load-balancing on the other.

High-performance computing, on the other hand, has often used specialized scheduling which treats an entire computation as a single unit.

This project will investigate a hybrid approach. In essence this assumes a scheduling workload of many processes (as in a general purpose system), each of whom is multi-threaded to some small degree. Treating each set of threads as a unit is traditionally called gang scheduling. This project will additionally consider interactions between gangs (both cooperative interactions, and competition for resources), something I've tentatively called mafia scheduling (mostly coz it's a good name ;-)

In practical terms, this will likely build on previous work in the space of dependent-scheduling (e.g. human-interface stuff), but with a more principled approach (e.g. Quincy for multi-core). The project will involve designing and implementing a novel scheduler (problably for either for Xen or Linux, but possibly in simulation), and performing a comparative evaluation with previous systems. A good student should expect to be able to publish the results of this work in a decent international conference.

Cultural Spreading on Social Networks

Originator: Cecilia Mascolo (with Salvatore Scellato and Liam McNamara)

The people you know and you spend time with are very likely to influence what music you listen to, what books you read, what movies you watch, what gadgets you buy and, eventually, what you like and dislike. Social influence is so strong that it largely determines what cultural traits we adopt. Moreover, how you choose your social acquaintances is also determined by how much your interests overlap with somebody else's. This interplay between social selection and social cultural influence has been under investigation for many years. How do cultural traits spread over a social network? How is the evolution of friendships related to personal tastes? Do we acquire our friends' traits, or do we just become friends with somebody we are similar to?

Only recently have we gained the capability to gather enough data to fully understand these phenomena thanks to the massive amount of people that share large bits of personal information on online social networking websites such as FaceBook, Twitter, MySpace and the like. This opens new vistas on the analysis of such socio-cultural processes, since we can analyze fine-grained timestamped interactions of people while simultaneously investigating their personal tastes: we need new models to grasp a better understanding of what drives people's daily choices in order to improve existing systems and design completely new applications that exploit and enhance these social interactions.

This project will involve:

  • collection of large scale socio-cultural network datasets (skills: Web coding, data mining)
  • analysis of real datasets and design of new statistical models of individual social behavior (skills: mathematical modeling, complex networks theory)
  • experiments with incentive policies and design of potential applications (such as a Facebook game/app) that implements these approaches to test them in the real world (skills: creativity)
  • [optionally] implementation of an application for a mobile device (i.e. Symbian/Android/iPhone) in order to run a small-scale experiment in situ (skills: mobile programming, lots of friends)

References

[1] Jon Kleinberg. The convergence of social and technological networks. Communications of ACM, 2008

[2] David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth Suri. Feedback effects between similarity and social influence in online communities. SIGKDD'08

[3] Kevin Lewis, Jason Kaufman, Marco Gonzalez, Andreas Wimmer, Nicholas Christakis. Tastes, ties, and time: A new social network dataset using Facebook.com Export. Social Networks, 30

[4] Miller Mcpherson, Lynn S. Lovin, James M. Cook. Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology, 27

[5] Eytan Bakshy, Brian Karrer, Lada Adamic. Social Influence and the Diffusion of User-Created Content. 10th ACM Conference On Electronic Commerce.

[6] Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian. Influence and correlation in social networks. SIGKDD'08

[7] Jean-Philippe Cointet, Camille Roth. Socio-semantic dynamics in a blog network. ICCSE'09

[8] Jie Tang, Jimeng Sun, Chi Wang, Zi Yang. Social influence analysis in large-scale networks. SIGKDD'09

[9] Petter Holme, M. E. J. Newman. Nonequilibrium phase transition in the coevolution of networks and opinions. Physical Review E, 2006

The Synchronised Network

Supervisor: Andrew W. Moore

Originator: Stewart Bryant (Cisco)

The aim of this project will be to conduct an objective comparison of two standards used to synchronise time among networks.

This work will involve the design and implementation of an interface for the NetFPGA board to permit the importing of a reference clock. Then the candidate will design and conduct a number of local-area and wide-area network experiments as part of this comparison.

Synchronised Clocks are critical to modern life: from security systems and international bank trading to reliable electrical systems and functioning mobile telephony.

Until recently the global telecommunications system has provided a high-precision clock however, evolving technological standards (a shift from SONET/SDH to Carrier-Grade Ethernet) means such a clock will no longer be universally available.

Several competing standards have arisen for keeping time synchronised between computer systems such as NTP and ieee1588, one arising from the Internet community, the other from the time-standards bodies.

It is not clear either current standard is suitable in their current form so there is ample opportunity to inform the evolution of this important work.

In addition to systems programming experience (C, UNIX and large projects), the candidate would ideally have some Verilog/VHDL skills. This project would suit someone who has done the "P33: Building an Internet Router" module.

This project has been formulated in conjunction with colleagues at Cisco and there will be opportunity to work alongside these advisors.

Network Emulation Elements

Supervisor: Andrew W. Moore

Originator: Andrew W. Moore

Accurate emulation of network components, such as link-emulation recreating traffic sources, and intermediate networks proves to be a problem at high speed. NetFPGA offers programmable hardware that operates at high speed, an obvious marriage of Network emulation elements on the NetFPGA would result in an outcome hugely valued in the wider community.

On particular challenge is to recognise that emulating even the shortest network links can require significant memory resources. One solution is to use several NetFPGA boards in tandem, who in combination provide the necessary properties.

In addition to systems programming experience (C, UNIX and large projects), the candidate would be ideally have some Verilog/VHDL skills. This project would suit someone who has done the "P33: Building an Internet Router" module.

As this project is part of a large international effort including colleagues at Stanford and Xilinx and the candidate would be invited to work within that collaboration.

Interconnect simulation

Supervisor: Andrew W. Moore

Originator: Andrew W. Moore / David Miller

The aim of this project is to design, implement and evaluate a simulator for small-distance interconnects (in particular PCI-Express). While there are a number of simulators in the community each has drawbacks and none is specifically suited to the combination of modularity and accuracy required.

This work is part of a research activity in the SRG that is examining new technologies, such as photonics, in the PCI-Express domain. To understand the impact of these new technologies the simulator will permit various "what-if" experiments for such new approaches - such a simulator will allow comparison of rudimentary PCI-Express transactions. First experiments would contrast different data-coding, comparing the latency

There is an activity of research in the SRG that is examining new technologies for small-distance interconnects (such as PCI-Express). This work is limited by the lack of a suitable simulator system. The aim of this project is to provide a modular simulator that can accurately recreate common circumstances of the PCI-Express interconnect. There has been work on interconnect simulators but this work has remained focused on systems in the High-Performance Computing domain.

Implementation of a suitable simulator would involve calibration against an implementation: Verilog sample implementations of the PCI Express will be available for this.

In addition to systems programming experience (C, UNIX and large projects), the candidate would be helped by having Verilog/VHDL experience. You will work will be alongside existing research fellows and PhD students so you will be able to receive day-to-day advice and guidance.

Content Centric Networking

Supervisor: Andrew W. Moore

Originator: Jon Crowcroft (with Andrew W. Moore / Van Jacobson)

The aim of this project is to design and implement on the NetFPGA a hardware instantiation of the Content Centric Networking architecture proposed by Van and his colleagues at XEROX/PARC. Part of this work would be to do a performance evaluation of the implementation.

The Content Centric Networking is an interesting project planned for the long-term as a "next-generation" Internet, one of the most significant changes is that it provides as a first-class object a number of elements that existing content-distribution networks (such as Akami) provide as an Internet add-on.

In order to do this there are significant changes in mechanisms for packet addressing and subsequently small changes

Working (software-centric) prototypes have been implemented however, this project would aim to identify and push to hardware the specific components needed for high performance implementations.

In addition to systems programming experience (C, UNIX and large projects), the candidate would ideally have some Verilog/VHDL skills. This project would suit someone who has done the "P33: Building an Internet Router" module.

References

There are various Content Centric Networking Resources; and you might also like to read Networking Named Content, to appear in CoNEXT'09, December 2009.

Power efficient coding for communications systems

Supervisor: Andrew W. Moore

Originator: Philip Watts / Andrew W. Moore

The aim of this project is to take several (physical-line) coding schemes along with several samples of real network data and model the energy consumption for given coding schemes when used to communicate data over simple photonic networks.

This project is in the context of work at Cambridge and elsewhere to attend to the ongoing growth in energy use. In the communications-systems space there has long been an optimization minimizing complexity of coding systems without regard to the wastage in such approaches. This work provides an important part in such work permitting direct comparison of (coding) complexity in exchange for overall energy consumption.

This project could be naturally extended with an attention to a wider range of coding schema including schemes that use multiple orthogonal coding approaches, as well as combinations of more sophisticated optical pathways work would initially begin with laser/optical-channel/detector models.

This work would build-on device models in Matlab, and VPI, a survey of complexity and energy usage figures from existing Verilog core implementations, and project-specific implementations of the coding schema.

You will work will be alongside existing research fellows and PhD students so you will be able to receive day-to-day advice and guidance.

Designing optimal data-coding for wavelength-parallel systems

Supervisor: Andrew W. Moore

Originator: Philip Watts / Andrew W. Moore

Approaching an optical pathway a naive computer-scientist might see that the ability to carry many parallel paths (encoded onto separate lambdas) as the equivalent of a parallel wire between two endpoints: providing lower latency and possibly higher data-transmission rates than the serially encoded situation.

However, the reality of the physical photonic system is that the transmission medium has inherently analogue properties more so when active components (SOAs) are in use and subsequently any data-encoding need to account for the nonlinear channel characteristics.

The aim of this project would be to demonstrate effective coding schemes for a photonic communications system that utilizes multiple wavelengths.

This work would model the signal-noise ratio for a number of wavelengths within a simple photonic communications system and proposing coding schemes that function within this environment.

This work would build-on device models in Matlab, and VPI, along with bespoke programming.

You will work will be alongside existing research fellows so you will be able to receive day-to-day advice and guidance.

Photonic Crossbar Switch Optimisation

Supervisor: Andrew W. Moore

Originator: Philip Watts

Photonic crossbar and Clos switches have recently been demonstrated with high port count. Cascading of these devices could provide high bandwidth switched interconnects for high performance computing and data centres.

The aim of this project will be to develop constrained optimisation algorithms for maximising performance and reducing power consumption in networks based on these devices.

This work would draw on the extensive literature on domain-constrained routing applying the constraint-domains to the small scale of the photonic crossbar. This work would build-on device models in Matlab and VPI and suit someone with a strong background in control-theory or physics/photonics.

You will work alongside existing research fellows so you will be able to receive day-to-day advice and guidance.

Chip-to-chip interconnects using polymer and silicon waveguides

Supervisor: Andrew W. Moore

Originator: Philip Watts

There has been great interest recently in silicon photonic devices for compact integration within computer chips. Independently, polymer waveguides have been developed for PCB and backplane connections. However, the means for connecting these two technologies to create high bandwidth multi-wavelength interconnects directly to the chip, avoiding electrical bottlenecks, is not obvious: silicon devices typically use high confinement single mode waveguides whereas polymer waveguides are typically multimode (although could be made single mode).

The project will investigate the state of the art in silicon and polymer photonic technologies and propose potential solutions examining factors such as maximum achievable bandwidth, optical losses and alignment accuracy. Depending on the direction taken by the student MATLAB/VPI models or Electro-magnetic modelling software would be used.

You will work alongside existing research fellows so you will be able to receive day-to-day advice and guidance.

Coding to overcome temperature sensitivity in integrated photonic components

Supervisor Andrew W. Moore

Originator: Philip Watts

Silicon ring resonators are attractive for integration within computer chips to provide huge bandwidth for on-chip and off-chip interconnect with low power consumption. They can provide modulation, switching and wavelength selection with compact size on the order of a few micron diameter. However a major drawback is high temperature sensitivity. Temperature control of individual elements may negate the power consumption benefits and complicates fabrication.

This project will investigate the possibility of applying a multiple wavelength coding algorithm to reduce or eliminate temperature sensitivity in ring resonator wavelength selectors. Based on simple device models in VPI or MATLAB, the trade off between temperature tolerance, coding overhead and the latency and power consumption introduced by the processing will be investigated.

This project would suit someone with a strong physics/photonics or control-theory background. You will work alongside existing research fellows so you will be able to receive day-to-day advice and guidance.

Low-cost, high-quality, traffic capture

Supervisor Andrew W. Moore

Originator: Andrew W. Moore

Network capture is fundamental to many aspects of network management: from IDS to network measurement. Yet the fundamental technologies for fast, accurate, high-quality capture are prohibitively expensive for many researchers and users.

The aim of this project is to implement a capture appliance capable of high-granularity timestamps and no (uncontrolled) packet loss but to do so based upon the low cost NetFPGA platform.

This work will involve the design and implementation of an interface for the NetFPGA board to permit the importing of a reference clock and a redesign of the device driver to permit high-speed bulk transfer of data from the NetFPGA to the host.

In addition to systems programming experience (C, UNIX and large projects), the candidate would ideally have some Verilog/VHDL skills. This project would suit someone who has done the "P33: Building an Internet Router" module.

As this project is part of a large international effort including colleagues at Stanford and Xilinx and the candidate would be invited to work within that collaboration.

Sync-Ball

Originator: Eiko Yoneki

Building function/mechanism for migrating data objects over the Haggle platform [1]. Haggle is built over a "layerless" networking architecture that incorporates event-driven and asynchronous operations, which reside in the device as a kernel component. Functional components implement the functional logic and interact only directly with the kernel (see [2] for further details about the Haggle implementation). A data object is an object defined in Haggle as an entity which can move between Haggle equipped devices. Each data object contains metadata, consisting of name and attribute pairs. Metadata operations like search or forwarding depend on the size and complexity of the network.

Individual nodes in Haggle (e.g. smart phones) are powerful enough to perform substantial amounts of computing. An example of computation would be to pass around a "Sync-Ball" from node to node as they meet, and into which any node may stick the result of some computation. The main task of the "Sync Ball" project is to design a data object which is able to operate at a Haggle node, collect/append data, and then migrate to other devices. Haggle already has an Android platform implementation, and Sync-Ball will be developed over it. A challenge is that the data object should be designed for multiple purposes and some level of simple programming language for such data object is desired or design the function as part of database like operation. For example, you can design a migrating (distributed) Excel, where computations are a set of formulae each computed over a column, and each node may append zero or more rows when it has the ball

If time allows, you can add parallel operation of data objects and writing also runtime to support it. It might also be interesting to consider issues of data integrity and availability (through redundancy). Potential applications for Sync-Ball:

  • Synchronisation among devices: information to be synchronised is in the data object, devices receives/updates the data then sent out to the other devices.
  • Twitter entries, which are accumulated through the migration within Haggle name space until the data object is sent out to the Internet based twitter.
  • Environmental monitoring object: dispatch sync-ball gathering Co_2 values from various sensor devices in the area.

References and Recommended Reading

[1] EU FP7 Haggle Project

[2] Erik Nordstrom, Per Gunningberg, and Christian Rohner. A Search-based Network Architecture for Mobile Devices. Tech Report Uppsala University, 2009-003.

[3] E. Yoneki, I. Baltopoulos and J. Crowcroft " D^3N: Programming Distributed Computation in Pocket Switched Networks", ACM MOBIHELD at SIGCOMM, 2009.

[4] Xerox PARC's Bayou Project

Opinion and behaviour dynamics in social networks

Originator: Eiko Yoneki (with Derek Murray)

Large-scale social network communities, where millions of users form a dynamic infrastructure to share multimedia content, have brought new entertainment experiences. Such Network-shared User Generated Content (UGC) is becoming increasingly popular due to the emergence of Web 2.0. YouTube is one of many UGC websites, where the popularity of content is driven by viewers and varies dynamically. This project investigates understanding information flow, decision making process, and behavioural dynamics based on rumour, social topology, and given information in such environments and modelling how people react with limited/unlimited information surroundings. Thus, the investigation covers behaviour modelling and analysis from an opinion dynamics perspective and can provide an answer to questions such as "do people change their opinion based on similar to page-ranking algorithm?" or "what factors have the largest influence?".

The project can be tackled in the following ways:

  1. Crawling Twitter.

    By crawling followers in Twitter, you can build a directional graph as social networks. You can also extract information about the interaction graph by looking at individual tweets that contain "@<username>". There are a few interesting, but disjoint, semantics for this: it can be part of a conversation between users, it can be a way of sending an open message to another user (e.g. people write messages with @stephenfry and he receives (and gets upset about!) them even if he doesn't follow the user), or it can indicate that there is a non-virtual connection between the users (e.g. mrry: just went for burritos with @eikoy). It would be interesting to see what kind of social graph we can infer from this.

    Then you have the spread of things like retweets and hashtags. For example, you could analyse the data to see whether certain users (or users with a certain characteristic in the social graph) are successful at spreading hashtags through the social graph. You could also see what about the graph causes a user's tweets to be retweeted (my speculation: people with a large number of followers and who follow a small number of people will have a high probability of retweeting, but there may be other interesting cases). Correlation between hashtags or degree of people or position of people in the graph could be interesting to explore. You could also look at the spread of links to YouTube videos.

    You can also write a Facebook application and investigate similar ideas described above.

  2. Writing a role-playing game

    Millions of people worldwide play interactive online role-playing games, forming complex and rich networks among their virtual characters. Such games could project the real world and predict what might occur. For example, an unexpected outbreak of an infective communicable disease (unplanned by the game creators) occurred when a role-playing game was played in the virtual world. This outbreak holds surprising similarities to real world epidemics [3].

    You can write role-playing game application over mobile devices and web and obtain the information of behaviour of the players under the different conditions (e.g. provide other players activities or characteristics of the other players). The game could be "what's the most popular music?" or "what's the best mobile phone to buy?" or similar to any popular game in the market.

Reading Vinayak Dalmia's masters thesis on 'Investigating Herding Behaviour in an Experimental Setup' (Judge Business School) will be a good starting point; and papers [1][2] and [4] will be good general background reading.

References and Recommended Reading

[1] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.

[2] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.

[3] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.

[4] Sastry, Yoneki and Crowcroft " Buzztraq: Predicting geographical access patterns of social cascades using social networks". EuroSys workshop on Social Network Systems, 2009.

Sampling strategies for measuring epidemic spread

Originator: Eiko Yoneki

The epidemic spread of infectious disease requires quantifying and modelling disease development in time and space and quantifying patterns of disease and sampling for disease in populations is challenging. The recent emergence of wireless technology provides a unique opportunity to collect precise human connectivity data. For example, people can carry tiny wireless sensors (< than 1 cm^2) that record dynamic information about other devices nearby. A post-facto analysis of this data will yield valuable insight into complex human interactions, which in turn will support meaningful modelling of understanding networks.

Interestingly infectious disease epidemiology and proximity-based communication in wireless networks share common research issues [3].

However, it is challenging to deploy an experiment to collect such human connectivity data considering not everybody can be equipped the device, which is capable to collect the information. Conducting such experiments is expensive, as each participant must be given a device (typically a mobile phone, or Bluetooth sensor), and it may also be difficult to find willing participants. How many devices (e.g. mobile phones) are necessary to operate an epidemic spread simulation in the city of Cambridge, where the population is 130,000. The problem can be interpreted as how to find giant component (i.e. connected sub-graph in human contact networks) from a given population. A challenge is that a human contact network is not a random network

There is a series of human contact data including CRAWDAD [2] and project Haggle [1], which is available for this project. If a specific data collection can provide significant information, such experiment can be carried out within the project. The following two strategies can be investigated:

  1. Investigate different methods of sampling a population so that the resulting sub-graph exhibits the same macroscopic properties. Methods may include randomly discarding nodes from a graph to discover the critical point at which these properties change.
  2. Investigate whether static social networks (such as the friendship graph from an online social network) can serve as a useful predictor of which subset of nodes would be representative.

Note that the general notion of "sampling strategy" is similar to that discussed in papers Milan Vojnovic (MSR Cambridge) [5][6].

References and Recommended Reading

[1] EU FP7 Haggle Project

[2] CRAWDAD: A Community Resource for Archiving Wireless Data

[3] Eubank, S., Kumar, V., and Marathe, M.: Epidemiology and wireless Communication: Tight Analogy or Loose Metaphor? LNCS 5151, 91-104, 2007.

[4] Kleinberg, J.: The wireless epidemic, Nature 449, 287-288, 2007.

[5] Dinkar Vasudevan and Milan Vojnovic, Ranking through Random Sampling, MSR-TR-2009-2015, January 2009.

[6] Varun Gupta and Milan Vojnovic, Supplement to "Sampling Strategies for Epidemic-Style Information Dissemination", MSR-TR-2009-70, 2009.