Systems Research Group – NetOS

ACS Projects (2011—2012)

NetOS

This page collects together various ACS project suggestions from the Network and Operating Systems part of the Systems Research Group. In all cases there is a contact e-mail address given; please get in touch if you want more information about the project.

Under construction: please keep checking back, as more ideas will hopefully be added to this page during the coming weeks.

The Language of Power and Influence on Twitter (with Diarmuid Ó Séaghdha)
Contact: Daniele Quercia

While macro-level studies of Twitter have concluded that "Twitter is
not a social network but a news media" (Kwak et al., 2010),
micro-level analyses of users and user-user interactions suggest that
the Twittersphere functions as a set of real communities according to
well-established sociological definitions of community (Cheng et al.,
2011, Quercia et al., 2011). This project would study social power
relationships on Twitter, with a focus on linguistic indicators.
Notions of influence on Twitter have been studied before (Liu et al,
2010) but there are still many potential aspects to investigate.
Recently, Bramsen et al. (2011) looked at identifying power
hierarchies as manifested in email data; here the sociolinguistic
motivation would be similar but the problem would be quite different
(and novel!). The NLP techniques involved would start with basic
lexical and topic-model analysis, but could extend to sophisticated
dialogue modelling (Ritter et al., 2010).

Philip Bramsen, Martha Escobar-Molano, Ami Patel and Rafael Alonso.
2011. Extracting Social Power Relationships from Natural Language. In
Proceedings of ACL 2011.
Justin Cheng, Daniel Romero, Brendan Meeder and Jon Kleinberg. 2011.
Predicting Reciprocity in Social Networks. In Proceedings of SocialCom
2011.
Haewoon Kwak, Changhyun Lee, Hosung Park and Sue Moon. 2010. What is
Twitter, a Social Network or a News Media?. In Proceedings of WWW
2010.
Lu Liu, Jie Tang, Jiawei Han, Meng, Jiang and Shiqiang Yang. 2010.
Mining Topic-level Influence in Heterogeneous Networks. In Proceedings
of CIKM 2010.
Daniele Quercia, Jonathan Ellis, Licia Capra and Jon Crowcroft. 2011.
In the Mood for Being Influential on Twitter. In Proceedings of
SocialCom 2011.
Alan Ritter, Colin Cherry and Bill Dolan. 2010. Unsupervised Modeling
of Twitter Conversations. In Proceedings of NAACL 2010.
Well-being in Cities
Contact: Daniele Quercia
The legibility of a city is "the ease with which [a city's] parts can be recognised and can be organised into a coherent pattern"â and it is associated with the emotional and physical well-being of city dwellers [1,2]. This project is about building a free and open-source web tool that measures the legibility of a city, similar to [3]. Users will be presented with scenes of their city (geo-tagged pictures on Flickr) and will asked whether they could recognise each scene. Based on their answers, legibility measures of different parts of the city will be computed and will be shown on a geographical map. These measures will be then compared to those obtained from already available data (from, e.g., underground trips, census well-being scores) and from to-be-crawled Foursquare data.
[1] http://tinyurl.com/42howyn
[2] http://www.somervillema.gov/sites/default/files/documents/Somerville_Well_Being_Report.pdf
[3] http://pulse.media.mit.edu/
Linguistic Analysis for Computational Social Science
Contact: Daniele Quercia
"Computational social science" [1] is a new discipline that aims at using large archives of naturalistically-created behavioural data (of, for example, emails, tweets, Facebook contacts) to answer social science questions. Collecting data on actual behaviour is seen by many as the gold standard of social science, and digital archives offer the possibility of doing so. Emails have been used to understand how corporations work [2,3], and online dating sites to understand racial preferences in dating [4]. However, in using web services, one faces few challenges, and this project is about one specific challenge: how to use Twitter to understand people beliefs about a variety of societal issues. These issues include privacy[5], serendipity [6], gender issues, and nationalism. One way of interpreting what people say on Twitter is to analyse the use of language on tweets. The approach taken in this project will be to perform automatic analysis of issue-specific Twitter text using NLP and machine learning techniques including topic modelling [7], statistical association measures [8] and supervised prediction. The results of this analysis will be compared to those obtained by previous social science studies using a grounded theory approach [9].
[1] Lazer et al. Computational Social Science. Science, February 2009.
[2] Tyler et al. Email as Spectroscopy: Automated Discovery of Community Structure within Organizations
[3] Kleinbaum et al. Communication (and Coordination?) in a Modern, Complex Organization.
[4] Feliciano et al. Gendered Racial Exclusion among White Internet Daters
[5] Nippert-Eng. Islands of Privacy. University of Chicago Press. September 2010
[6] Pariser. The Filter Bubble: What the Internet Is Hiding from You. Penguin, May 2011.
[7] Blei et al. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.
[8] Evert. The Statistics of Word Co-occurrences: Word Pairs and Collocations. PhD thesis, University of Stuttgart, 2005.
[9] Rubin et al. Facets of serendipity in everyday chance encounters: a grounded theory approach to blog analysis. September 2011
How Leaders Use Social Media
Contact: Daniele Quercia
Studies of how company leaders use social media sites have been recently proposed and have been mostly qualitative [1]. This project would extend previous work by studying how different types of leaders (e.g., company, political, and community leaders) use social media sites such as Facebook and Twitter. We will study the communication networks formed by these leaders using social networks metrics, and the content of their communication using topical analysis algorithms. This project will be carried upon Facebook data (CEOs around the world) and Twitter data (UK CEOs, leaders of nationalist groups, and London community leaders).
[1] Throwing Sheep in the Boardroom: How Online Social Networking Will Transform Your Life, Work and World
Early Adopters of Social Media Applications: An exploratory study
Contact: Daniele Quercia
Little is known about who the early adopters of social media applications are [1]. This project would characterise adopters of a real social networking application. We will consider a large number of users who adopted an application and study the relationship between their temporal adoption and their characteristics (based on, e.g., demographic data, geographic data, online behaviours/activities data). We will also test the extent to which traditional models of technology adoption can explain the adoption of this application.
[1] Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. PNAS 2009.
Nudging Music Serendipity
Contact: Daniele Quercia
We have recently proposed a recommendation framework named Auralist that produces recommendations that are not only reasonably accurate, but are also diverse, novel and serendipitous [1]. We run a user study and found that Auralist improves overall recommendation satisfaction, but serendipitous recommendations are not always found to be 'very useful', at least at first sight. It would be thus useful to nudge people in clicking on serendipitous recommendations. This project is about designing a new music recommender system that is based on new 'nudging' ideas in behavioural economics [2].
[1] Auralist: Introducing Serendipity into Music Recommendation
[2] Nudge: Improving Decisions About Health, Wealth, and Happiness
Blackboard implementation for Blackadder
Supervisor: Jon Crowcroft
Day-to-day supervisor George Parisis
Information-centric networking has been touted by several research efforts as an alternative to the current Internet architecture [1]. We recently designed and implemented Blackadder [2], a network node that could be part of such architecture. Blackadder is based on the Click router [3] and currently runs in Linux user and kernel space.

This project should extend Blackadder’s functionality so that when information is published from somewhere in the network or locally by other applications or kernel modules, Blackadder stores the respective data in a local blackboard. In the current implementation Blackadder notifies application for such items and forwards (by copying) the data to each application separately. A blackboard approach would allow for optimizing memory management. With such an approach, applications should only be notified about the existence of data in a shared and accessible memory region and, therefore, no copying or forwarding would be required. Special attention should be paid whenever vast amounts of published data are published by not following this strategy or by swapping data to the disk.

Ideally, at the end of the project we expect the working code to be integrated in the current prototype, for both supported modes of operation (user and kernel), as well as some initial evaluation of the approach. The results would be publishable in a system’s workshop or conference.

Strong programming skills in C/C++ as well as some experience in Linux kernel are required. Some familiarity with the Click router would also be good.

[1] PURSUIT project at http://www.fp7-pursuit.eu, 2010.
[2] Blackadder, https://github.com/georgeparisis/blackadder, 2011.
[3] Kohler, E., Morris, R., Chen, B., Jannoti, J., and Kaashoek, M. F. The Click modular router. ACM Trans. Comput. Syst. 18 (August 2000).
Blackadder for Android
Supervisor: Jon Crowcroft
Day-to-day supervisor George Parisis
Information-centric networking has been touted by several research efforts as an alternative to the current Internet architecture [1]. We recently designed and implemented Blackadder [2], a network node that could be part of such architecture. Blackadder is based on the Click router [3] and currently runs in Linux user and kernel space.

This project is about porting Blackadder to Android. As a first step Click router (or part of the necessary Click Elements) must be ported to Android. Then, some operating-system specific functionality must be added so that applications can access Blackadder and Blackadder can acces the network (possibly using different radio devices). Apart from that the library for communicating with the networking subsystem (a native c library in Android) has to be re-written. Finally, A Java wrapper for that library must be implemented.

At the end of the project a sample application (e.g. a video receiver, a chat client or twitter-like client) must be implemented in a way that utilizes the provided functionality. A nice workshop could be a venue for presenting this thesis.

Strong programming skills in C/C++ and Java as well as some experience in Android are essential. Some familiarity with the Click router would also be good.

[1] PURSUIT project at http://www.fp7-pursuit.eu, 2010.
[2] Blackadder, https://github.com/georgeparisis/blackadder, 2011.
[3] Kohler, E., Morris, R., Chen, B., Jannoti, J., and Kaashoek, M. F. The Click modular router. ACM Trans. Comput. Syst. 18 (August 2000).
Forwarding in Information-centric Networking
Contact: Dirk Trossen

Supervisor: Jon Crowcroft
Day-to-day supervisor: Dirk Trossen

Information or content-centric networking has been attracting increasing attention in the network community. More efficient dissemination of information at large scale, while utilising almost ubiquitous storage and computing resources more efficiently, is the core promise of the proposals. Forwarding information items within a transient communication relation is a core function that any solution in this area needs to implement at ever-increasing speed. Supporting multicast delivery to a (possibly) large number of receivers is a key requirement for any solution.

Forwarding solutions based on Bloom filters [1] have been proposed to support native multicast with minimal forwarding processing requirement [2]. However, the size of achievable multicast trees still remains at the lower end of the spectrum, albeit sufficient for many (in particular localised) communication scenarios. The central trade-off in designing such solutions lies in the amount of state that is kept in network elements compared to the achievable size of multicast groups.

This project should extend on the basic Bloom filter based mechanism towards an extensible header solution that utilises Bloom filters at several stages. Such multi-staged approach enables a configurable tradeoff between tree size and header size, while not requiring any network state. For this, the project will design a solution (based on existing work), implement the mechanism in an information-centric node prototype under Linux and perform a tradeoff analysis within an international test bed as well as within Planetlab. We expect the results to be publishable in a decent international conference.

References:
[1] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors", Communications of the ACM 13 (7): 422–426, 1970
[2] P. Jokela, A. Zahemszky, S. Arianfar, P. Nikander, C. Esteve, “LIPSIN: Line speed Publish/Subscribe Inter-Networking”, ACM SIGCOMM, August 2009
Robust messaging in a mobile environment
Contact: Jon Crowcroft and Toby Moncaster

Smart phones such as the iPhone or Android devices allow developers the
opportunity to offer users extremely powerful new applications. Often these
apps rely extensively on network connectivity, not a problem when the user
has connected to a WiFi network, or HSDPA. However problems do start to
arise once you start creating applications designed for professional
drivers such as couriers or taxi drivers. The biggest family of apps for
such drivers are productivity apps which usually do two tasks, finding work
for the driver and passing traffic information to and from the driver. They
may also include social networking elements, allowing drivers to chat to
each other or see where their friends are within the city.

These drivers operate in an extremely hostile network environment. As they
drive they may change network every few seconds, often briefly picking up
WiFi hotspots as they pass since their phones are designed to favour those.
At other times they will lose connectivity completely, for instance if they
go through an underpass or tunnel. Even when they have good connectivity to
3G or HSDPA, the design of those networks can add enormous delays to the
RTT. Worse still, many mobile networks use transparent (or not so
transparent) caching which can wreak havoc on any API messages between the
app and the server.

What is needed is a network protocol that can handle the three very
different types of traffic generated by such apps. These are:

* Synchronous messages relating to offering new work and responding to the
offers. There may also be a synchronous payment system. * Asynchronous
calls to the API (and their responses) allowing the server to monitor the
state of the app (location, whether the driver is on break, etc). *
Asynchronous social networking messages that might even utilise PubSub.
A real-time allocation engine for taxi despatch.
Contact: Jon Crowcroft and Toby Moncaster

Typically taxis come in two forms. Licensed taxis are allowed to pick
people up from taxi ranks and (in London and some other cities) can be
hailed on the street. Minicabs are only allowed to take advance bookings
from people (so they can't solicit for work directly). These advance
bookings come in two forms. ASAP bookings are for despatch as soon as
possible. Advance bookings may be made as much as a couple of days in
advance. For small taxi firms, scheduling such work is a relatively simple
matter that can be done on a whiteboard in the office. But when you have a
fleet of several hundred taxis covering a city as large as London things
become much harder.

One approach is to flood the city with the number of taxis equal to the
greatest expected peak demand at any location in the city, however this is
hugely inefficient. A better method would be to advance schedule bookings
over the 5-10 minute timescale, trying to predict when a currently busy
resource (taxi) will become free and feeding a new job to it at that time.
But what if the taxis are also able to take fares that hail them on the
streets? How can you deal with that added uncertainty over advance
bookings?
Building Naming Scheme in NDN over V2V Communication
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki/Lixia Zhang (with Karthik Nilakant)
Keywords: Named Data Networking, Content Centric Networking, Naming

This project explores an emerging new Internet architecture called "Named Data Networking" (NDN) originated by the project Content Centric Networking (CCN) by Van Jacobson in PARC. NDN provides network storage, mobility, and multi-path forwarding. In NDN, communication is driven by the data consumer who initiates sending out a packet, which contains the interest of the consumer by a hierarchically-structured name. It is similar to a URL in an HTTP GET request: the name in an Interest does not indicate the data itself, but may contain information that a receiving node interprets to dynamically generate a appropriate response. The response to an Interest is called a Data packet, which carries both the complete name and the data itself together with a digital signature created using a key of the data publisher. Interests are routed based on prefixes of their names, like routing IP packets on prefixes of their destination IP address. Data packet takes the reverse path of the Interest to be delivered to the consumers. One Interest retrieves one Data Packet, and a sequence of Interests is sent to retrieve the sequence of individually-named Data packets that make up a large piece of content such as a video.
The naming in the NDN architecture is not mature. In this project, you will explore naming schemes in NDN. The semantics of names is opaque to the network but depends on the application. The structure of names (e.g. hierarchically structured with an arbitrary number of components), discovery, and namespace navigation should be investigated and the vehicle-to-vehicle communication environment is targeted.
You can exploit location information (e.g. latitude and longitude by GPS or any geographical information that can be obtained from the map) to navigate the data flow depending on the consumers' interests. The meaning of names can be determined by local and global distribution requirements, and the content producer and consumer application conventions, as embodied in individual router's routing tables and application-level transport libraries.
You will use CCNx open source environment http://www.ccnx.org/software-download-information-request/download-releases/. If time allows, you can add a comparison study with another content centric networking paradigm Haggle from naming aspects.
References:
-Named Data Networking: http://www.named-data.org/ndn-proj.pdf
-CCNx: http://www.ccnx.org/
-NDN: http://www.named-data.org/
-Haggle: http://code.google.com/p/haggle/
Building Graph Query Function using Functional Programming (***** highly recommended)
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki
Keywords: Graph, Functional programming, NONSQL database

Demand to store and search of online data with graph structure is emerging. Such data range from online social networks to web links and it requires efficient query processing in scalable manner. In this project, you will build a graph query function (layer) to achieve efficient graph data processing. The graph query function builds on a lazy graph loaded from multiple sources and performs queries at the same speed or faster than a relational database. The function should be written in Clojure or another functional language to support practical lazy loading from data source to an in-memory graph database. Efficient representations for graph queries should be also investigated. The evaluation includes comparisons with existing Graph database (e.g. Redis, Neo4j) and the query capability comparing with SQL. You can start the project by using our prototype Crackle. You can also expand the applicability of Crackle, which in turn should improve the expressivity of queries in Clojure. Crackle has already additional data source connectors to connect to Facebook and Twitter, thereby forming the substrate to support Crackle into these two social networks. Another interesting extension could be online graph construction. To this end Crackle has adapted it to explore a game tree for the tic-tac-toe game, effectively expanding Crackle’s reach to memorisation of games trees for games such as chess.
References:
-Microsoft Research, “Trinity project: Distributed graph database,” http://research.microsoft.com/en-us/projects/trinity/
-Neo Technology, “Java graph database,” http://neo4j.org/
Integration of Data-flow programming with Overlay/Crowd through Task Farming
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)
Keywords: Data flow programming, Crowd, Haggle

MapReduce [5] and Dryad/LINQ [6] have facilitated the exploitation of multiple machines in a computer cloud. In order to provide more expressive coordination language for different classes of computation, including iterative and recursive algorithms, a project called Ciel [7],Skywriting [1] has been developed in our systems research group, applying a purely-functional script language for describing distributed computations.
This project investigates extending Ciel/Skywriting including an overlay or crowd that represents a series of distributed or mobile nodes. Thus, a worker invoked by Ciel could be a crowd of mobile phones that form opportunistic networks, i.e. a Haggle opportunistic network platform. This requires an extension to the current task spawning architecture in Ciel to add capability to create a new master task for the crowd. In an opportunistic network environment, tasks will be executed in a delay-tolerant manner, and this aspect should be integrated in the design of Ciel engine for crowd.
As an option if time allows, you could exploit the social structure to execute tasks in the crowd. In general, mobile networks formed by smart phones are time dependent and dynamic with high node mobility. Skywriting programming can be exploited in such dynamic network environments, which is powerful for cooperative task farming. For example, the map function could be exploited based on time-space graph of a social network to achieve an efficient distributed computing. The logical network topology could be obtained from various sources such as online social networks (e.g. Facebook, Orkut), and a social graph could be transformed to a task graph, indicating the order of operations. Efficient task graph generation based on a given distributed computation is powerful. Mobile networks offer substantial aggregate bandwidth and processing power. The intersection between networking and programming languages is becoming popular.
The proposed project aims at building a new paradigm, where networking and programming are well integrated, with special emphasis on mobile networks.The prototype of Ciel engine has been implemented in Haggle opportunistic platform [3], and you can start building your project on top of the prototype. The evaluation can be done in both using Android devices and Xen-based Haggle simulator.
References:
[1] D. Murray and S. Hand, Scripting the cloud with Skywriting, USENIX HotCloud, 2010.
[2] D. Murray, E. Yoneki, J. Crowcroft and S. Hand, The Case for Crowd Computing. ACM MOBIHELD, 2010.
[3] EU FP6 Haggle platform http://code.google.com/p/haggle/
[4] J. Crowcroft, E. Yoneki, et al. Promoting Tolerance for Delay Tolerant Network Research. ACM CCR, 38(5):63-68, 2008.
[5] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters. USENIX OSDI, 2004.
[6] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K.Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. OSDI, 2008.
[7] D. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy and S. Hand.
Ciel: a universal execution engine for distributed data-flow computing, NSDI, 2011.
Opportunistic Dropbox
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)
Keywords: Opportunistic Networks, Dropbox, Haggle, P2P

Dropbox [1] is a Web-based file hosting service that uses cloud computing to enable users to store and share files and folders with others across the Internet using file synchronization. The Dropbox client enables users to drop any file into a designated folder that is then synced with Dropbox's servers. The other user's devices that run the Dropbox client will be automatically synchronised when they connect to the Internet. Dropbox uses a typical client/server model and also assumes that an Internet connection exists. What if we want to set up Dropbox in a decentralised way? And can we also have Dropbox if we do not have an Internet connection?
This project aims at designing and implementing Dropbox in opportunistic networks that will provide the function of Dropbox in a delay tolerant manner. An opportunistic network is a type of delay tolerant network [2], where the latency of the communications is highly variable, due to the mobility of the nodes or the variation in connectivity between the nodes. Contact opportunities are unpredictable and make communication inherently asynchronous, rather than synchronous as is the case with the Internet architecture. However, if low-latency is not necessary, some applications may take advantage from opportunistic networks. Haggle [3-5] is an example of such an opportunistic network which adopts an approach similar to the publish/subscribe systems [6][7]. Haggle allows mobile devices to exchange content directly among themselves whenever they happen to come in close range. The content to exchange is based on the interests that users declare. Haggle supports both Bluetooth and WiFi connectivity.
The project involves designing Dropbox over the Haggle environment, including naming and security aspects on sharing files among participants.
References:
[1] http://www.dropbox.com/
[2] Fall, K. A delay-tolerant network architecture for challenged internets. In ACM SIGCOMM'03 (August 2003).
[3] http://www.haggleproject.org/
[4] http://code.google.com/p/haggle/
[5] E. Nordström, P. Gunningberg, C. Rohner, “A Search-based Network Architecture for Mobile Devices”, Uppsala University Technical Report 2009-003, 2009.
[6] Eugster, P. T., Felber, P. A., Guerraoui, R., and Kermarrec, A.-M. The many faces of publish/subscribe. ACM Computing Surveys 35, 2 (June 2003)
[7] V Jacobson, D.K. Smetters, J.D. Thornton, M.F. Plass, N.H. Briggs, R.L. Braynard: Networking Named Content, CoNEXT, 2009.
Sync-Ball
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki
Keywords: Mobile Agent, Opportunistic networks, Haggle

In this project, you will build function/mechanism for migrating data objects over the Haggle platform [1]. Haggle is built over a "layerless" networking architecture that incorporates event-driven and asynchronous operations, which reside in the device as a kernel component. Functional components implement the functional logic and interact only directly with the kernel (see [2] for further details about the Haggle implementation). A data object is an object defined in Haggle as an entity which can move between Haggle equipped devices. Each data object contains metadata, consisting of name and attribute pairs. Metadata operations like search or forwarding depend on the size and complexity of the network.
Individual nodes in Haggle (e.g. smart phones) are powerful enough to perform substantial amounts of computing. An example of computation would be to pass around a "Sync-Ball" from node to node as they meet, and into which any node may stick the result of some computation. The main task of the "Sync Ball" project is to design a data object which is able to operate at a Haggle node, collect/append data, and then migrate to other devices. Haggle already has an Android platform implementation, and Sync-Ball can be developed over it. A challenge is that the data object should be designed for multiple purposes and simple programming languages for such data object is desired or design the function as part of database like operation. For example, you can design a migrating (distributed) Excel, where computations are a set of formulae each computed over a column, and each node may append zero or more rows when it has the ball.
If time allows, you can add parallel operation of data objects and writing also runtime to support it. It might also be interesting to consider issues of data integrity and availability (through redundancy). Potential applications for Sync-Ball:
- Synchronisation among devices: information to be synchronised is in the data object, devices receives/updates the data then sent out to the other devices.
- Twitter entries, which are accumulated through the migration within Haggle name space until the data object is sent out to the Internet based twitter.
- Environmental monitoring object: dispatch sync-ball gathering Co_2 values from various sensor devices in the area.
References:
[1] EU FP6 Haggle Project
[2] Erik Nordstrom, Per Gunningberg, and Christian Rohner. A Search-based Network Architecture for Mobile Devices. Tech Report Uppsala University, 2009-003.
[3] E. Yoneki, I. Baltopoulos and J. Crowcroft " D^3N: Programming Distributed Computation in Pocket Switched Networks", ACM MOBIHELD at SIGCOMM, 2009.
[4] Xerox PARC's Bayou Project
Decentralised Community Detection and Deployment of Forwarding Algorithms over Haggle
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki
Keywords: Community Detection, Haggle

BUBBLE RAP introduced in [3] is a forwarding algorithm for Delay tolerant networks, which uses information of community and graph centrality. It consists of RANK and LABEL strategies. RANK delegates messages to high centrality nodes, while LABEL selects a next hop from any member of the destination community as a relay. The BUBBLE algorithm showed a significant improvement in forwarding efficiency over oblivious forwarding, and the other algorithm, which uses patterns of movement rather than the longer term social relationships that our scheme infers. BUBBLE works in a decentralised world without access to infrastructure. Therefore the distributed detection and dissemination of node popularity and node communities, and the use of these for forwarding decisions are crucial. This project aims at implementing distributed community detection in Haggle framework [1] and deploying BUBBLE forwarding algorithm. Integration work will be carried out using Xen based Haggle simulator including generation of a few different types of networks (e.g. small world, and emulation of real world trace). Performance testing in terms of packet delivery compared to prior schemes should be demonstrated, which is expected to show the strength of social based forwarding algorithms in certain types of networks.
A brief description of the work items is listed below.
- Implement distributed community detection (i) Using "SIMPLE" algorithm described in [2]; (ii) Using algorithm described in [4] with an extension of decentralised approach
- Implement BUBBLE forwarding algorithm [3] in Haggle
- Implement network generator (e.g. small world, inter-contact power-law etc.) within Xen Haggle simulator
- Integrate all above
- Performance testing in different network setting

References:
[1] EU FP6 Haggle Project http://code.google.com/p/haggle/
[2] P. Hui, E.Yoneki, S. Chan, and J. Crowcroft. Distributed community detection in delay tolerant networks. Proc. MobiArch, 2007.
[3] P. Hui, J. Crowcroft, and E. Yoneki. BUBBLE Rap: Social Based Forwarding in Delay Tolerant Networks. MobiHoc, 2008
[4] I. Leung, P. Hui, P. Lio, and J. Crowcroft. Towards real time community detection in large networks. Phys. Rev. E, 79(6), 2009.
Modularity Preserved Synthetic Dynamic Network Generator
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki
Keywords: Clustering, Network Topology Generation

Random networks, scale-free networks, or small-world networks have been used for simulation of epidemic spread and social network analysis, among other applications. Such networks represent human contact networks or social networks, and in most cases a single network view is used in an aggregated form. Real human contact networks are dynamic, and such a single network topology may not present the dynamic feature of networks. We have used time series of network traces from the real world to simulate the dynamic contact networks in our previous research on routing in delay-tolerant networks. However, those network traces are an instance in a specific environment. Ideally creating synthetic network topology in time series will help to evaluate a large-scale and complex network structure. Although several attempts have been made to achieve building synthetic networks, it is still a challenge to generate the topology of a real dynamic network. In this project, you would focus on generating a dynamic network topology that preserves modularity. Thus, you will pursue a rewiring strategy without losing the module/cluster architecture of the network. You can start your investigation using our prototype Randomise-Matrix algorithm, which preserves the single level of modules and may be extended to a more hierarchical structure. Using online social networks such as Twitter data for validating the algorithm is desirable.

References:
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs: simple building blocks of complex networks, Science. 2002 25;298(5594):824-7.
Modelling Spread of Rumours between Communities
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki
Keywords: Opinion Dynamics, Community

Our previous work [1] reported a preliminary investigation on interactions between networked social communities using the Ising model to analyse the spread of rumours. The inner opinion of a given community is forced to change through the introduction of a unique external source, and we analyse how the other communities react to this change. We model two conceptual external sources: namely, Strong-belief and Propaganda, by an infinitely strong inhomogeneous external field and a finite uniform external field, respectively. In the former case, the community changes independently from other communities, while in the latter case also according to interactions with other communities. We applied our model to synthetic networks as well as various real world data ranging from human physical contact networks to online social networks. The experimental results of applying the Ising model using real world data clearly demonstrate two distinct scenarios of phase transitions.
This project aims at validation of the above model to examine if such Strong-belief and Propaganda will be exhibited in real situations. Thus, the project work requires deploying a series of the experiments, e.g. an online and mobile phone based software to share opinions on specific topics within the given time, and monitor the change of individual opinion in time series by controlling information available to the individual. This will capture the dynamics of opinion change and investigate validation of our Ising model. The experiments can start off with the simplest condition and along our development of extending work such as coexisting two opinions (i.e. different opinions coexisting in the society), the software should be also extending for appropriate validation. The Ising model for understanding interactions of communities and subsequently modelling rumour propagation is a unique approach that will give significant input for modelling information propagation in social networks. The result of validation can be compared with voting model.
References:
[1] M. Ostillia, E. Yoneki, I. Leung, J. Mendes, P. Lio, and J. Crowcroft: "Statistical mechanics of rumour spreading in network communities", International Conference on Computational Science, ICCS, 2010.
[2] M. Ostilli, J. F. F. Mendes, "Communication and correlation among communities". Phys. Rev. E 80 (011142); and M. Ostilli, J. F. F. Mendes, "Small-world of communities: communication and correlation of the meta-network". J. Stat. Mech. L08004 (2009).
Opinion and Behaviour Dynamics in Social Networks
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki (with Fehmi Ben Abdesslem)
Keywords: Information Cascade, Opinion Dynamics, Twitter

Large-scale social network communities, where millions of users form a dynamic infrastructure to share multimedia content have brought new entertainment experiences. Such Network-shared User Generated Content (UGC) is becoming increasingly popular due to the emergence of Web 2.0. YouTube is one of many UGC websites, where the popularity of content is driven by viewers and varies dynamically. This project investigates understanding information flow, decision making process, and behavioural dynamics based on rumour, social topology, and given information in such environments and modelling how people react with limited/unlimited information surroundings. Thus, the investigation covers behaviour modelling and analysis from an opinion dynamics perspective and can provide an answer to questions such as "do people change their opinion based on similar to page-ranking algorithm?" or "what factors have the largest influence?".
The project can be tackled in the following ways:
1. Crawling Twitter.
By crawling followers in Twitter, you can build a directional graph as social networks. You can also extract information about the interaction graph by looking at individual tweets that contain "@

". There are a few interesting, but disjoint, semantics for this: it can be part of a conversation between users, it can be a way of sending an open message to another user (e.g. people write messages with @stephenfry and he receives (and gets upset about!) them even if he doesn't follow the user), or it can indicate that there is a non-virtual connection between the users (e.g. mrry: just went for burritos with @eikoy). It would be interesting to see what kind of social graph we can infer from this.
Then you have the spread of things like retweets and hashtags. For example, you could analyse the data to see whether certain users (or users with a certain characteristic in the social graph) are successful at spreading hashtags through the social graph. You could also see what about the graph causes a user's tweets to be retweeted (my speculation: people with a large number of followers and who follow a small number of people will have a high probability of retweeting, but there may be other interesting cases). Correlation between hashtags or degree of people or position of people in the graph could be interesting to explore. You could also look at the spread of links to YouTube videos.
You can also write a Facebook application and investigate similar ideas described above.
2. Writing a role-playing game
Millions of people worldwide play interactive online role-playing games, forming complex and rich networks among their virtual characters. Such games could project the real world and predict what might occur. For example, an unexpected outbreak of an infective communicable disease (unplanned by the game creators) occurred when a role-playing game was played in the virtual world. This outbreak holds surprising similarities to real world epidemics [3].
You can write role-playing game application over mobile devices and web and obtain the information of behaviour of the players under the different conditions (e.g. provide other players activities or characteristics of the other players). The game could be "what's the most popular music?" or "what's the best mobile phone to buy?" or similar to any popular game in the market.
Reading Vinayak Dalmia's masters thesis on 'Investigating Herding Behaviour in an Experimental Setup' (Judge Business School) will be a good starting point; and papers [1][2] and [4] will be good general background reading.
References:
[1] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.
[2] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.
[3] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.
[4] Sastry, Yoneki and Crowcroft " Buzztraq: Predicting geographical access patterns of social cascades using social networks". EuroSys workshop on Social Network Systems, 2009.

Sampling Strategies for Measuring Epidemic Spread
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki
Keywords: Sampling

The epidemic spread of infectious disease requires quantifying and modelling disease development in time and space and quantifying patterns of disease and sampling for disease in populations is challenging. The recent emergence of wireless technology provides a unique opportunity to collect precise human connectivity data. For example, people can carry tiny wireless sensors (less than 1 square centimeter) that record dynamic information about other devices nearby. A post-facto analysis of this data will yield valuable insight into complex human interactions, which in turn will support meaningful modelling of understanding networks.
Interestingly infectious disease epidemiology and proximity-based communication in wireless networks share common research issues [3].
However, it is challenging to deploy an experiment to collect such human connectivity data considering not everybody can be equipped the device, which is capable to collect the information. Conducting such experiments is expensive, as each participant must be given a device (typically a mobile phone, or Bluetooth sensor), and it may also be difficult to find willing participants. How many devices (e.g. mobile phones) are necessary to operate an epidemic spread simulation in the city of Cambridge, where the population is 130,000. The problem can be interpreted as how to find giant component (i.e. connected sub-graph in human contact networks) from a given population. A challenge is that a human contact network is not a random network
There is a series of human contact data including CRAWDAD [2] and project Haggle [1], which is available for this project. If a specific data collection can provide significant information, such experiment can be carried out within the project. The following two strategies can be investigated:
1. Investigate different methods of sampling a population so that the resulting sub-graph exhibits the same macroscopic properties. Methods may include randomly discarding nodes from a graph to discover the critical point at which these properties change.
2. Investigate whether static social networks (such as the friendship graph from an online social network) can serve as a useful predictor of which subset of nodes would be representative.
Note that the general notion of "sampling strategy" is similar to that discussed in papers Milan Vojnovic (MSR Cambridge) [5][6].
References:
[1] EU FP6 Haggle Project
[2] CRAWDAD: A Community Resource for Archiving Wireless Data
[3] Eubank, S., Kumar, V., and Marathe, M.: Epidemiology and wireless Communication: Tight Analogy or Loose Metaphor? LNCS 5151, 91-104, 2007.
[4] Kleinberg, J.: The wireless epidemic, Nature 449, 287-288, 2007.
[5] Dinkar Vasudevan and Milan Vojnovic, Ranking through Random Sampling, MSR-TR-2009-2015, January 2009.
[6] Varun Gupta and Milan Vojnovic, Supplement to "Sampling Strategies for Epidemic-Style Information Dissemination", MSR-TR-2009-70, 2009.

Delay-Tolerant NDN
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)
Keywords: Named Data Networking (NDN), Content Centric Networking (CCN)

The CCNx project is currently under active development. It implements the content-centric networking framework presented in [1,2,3]. One of the observations of the original paper was that the underlying store-and-forward methodology lends itself particularly well to delay or disruption-prone networks. However, CCNx does not currently perform routing for such a network (relying instead on network broadcast, or static user-defined routes). In contrast, the Haggle project provides PROPHET-based probabilistic routing.
To provide this type of functionality for CCNx, a new “control” model would need to be written. In this project, desirable attributes of such a module would be the ability to switch between multiple protocols, and the ability to tune the module programmatically will be investigated. Once the module has been completed, it can be put to the test by performing performance evaluations against Haggle or other frameworks.
References:
1. Named Data Networking: http://www.named-data.org/ndn-proj.pdf
2. CCNx: http://www.ccnx.org/
3. NDN: http://www.named-data.org/
4. Haggle: http://code.google.com/p/haggle/

Hybrid Dropbox
Contact: Eiko Yoneki
Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)
Keywords: Cloud Service, Opportunistic Networks, Dropbox

A variety of personal storage/backup applications have surfaced over recent years, providing a cloud-based synchronisation capability between multiple devices. In certain settings, access to the cloud may be restricted or not possible, which in turn restricts the availability of these services. We have developed an application called “Mistify”, which allows personal files to be replicated amongst local devices (such as other people’s smartphones). Typically, such devices are temporarily networked by local wireless connections, forming a “mist”. The Haggle framework can be used to publish and search for content in a mist, in a similar manner to interacting with the cloud. When the cloud becomes unavailable, the mist provides a secondary storage tier that can be used to keep data safe until cloud connectivity is restored. While much prior research has been focused on purely centralised or purely decentralised storage, hybrid approaches represent an area of interest.
The present prototype would need to be extended in a variety of ways to allow comparison with similar research projects. Some focus areas include:
-Joining or splitting files into manageable chunks for storage in the mist.
-Erasure coding.
-Using infrastructure for community detection and key distribution.
-Adapting the replication strategy to changing climactic conditions (sic).
-Making use of version vectors, and implementing an “eventual consistency” model.
-(Most importantly,) evaluating these new and existing features, making use of a Xen-based network simulator and/or a controlled user trial.

References:
[1] http://www.dropbox.com/
[2] http://code.google.com/p/haggle/

NetFPGA 10G Xen interface
Contact: Andrew W. Moore
Originator/Supervisor: Andrew W. Moore
Keywords: NetFPGA, Virtual Machines, Xen,

The NetFPGA 10G project is an exciting upgrade for the NetFPGA platform providing an interface card that is enabled with 4 x 10Gbps interfaces. Xen is the Cambridge-grown super success story of Virtualization. This project will be to deliver a NetFPGA10G Xen device driver and while on that 'journey' explore how rudimentary decisions in design and function-placement affect the ability of the driver providing optimal performance once integrated into Xen. While a clear pathway has been considered candidates are encouraged to explore the solution space independently where plausible.
References:
1. Xen: http://xen.org/
2. NetFPGA: http: //www.netfpga.org/

Activity-based segmentation of urban environments in Location-based Social Networks
Contact: Cecilia Mascolo
Supervisor: Cecilia Mascolo

Day to day supervisor: Anastasios Noulas
Keywords: Geo-Social Networks, Mobility, Data Mining, Clustering

A major problem rising in various Location-based services, including recommender systems, is the lack of well-defined areas/neighborhoods that could be used as geographical proxies to suggest activities or provide other sources of geo-tagged information. Usually, the segmentation of a city to areas takes place in two ways: either through a definition of areas using arbitrary geometrical shapes and sizes or through information provided by governmental organizations such as city councils. Both methods however are not sufficient for services provided by mobile and online applications. In this project, we seek the division of a data-centric methodology to cluster the areas of a city in order to allow the emergence of organic neighborhood shapes, based on the activity of mobile users. As a first step a city may be split to geographical micro-areas of a few decades of square meters. Those micro-areas will then be subsequently represented by various layers of data. Afterwards a spatial clustering technique will be implemented that will allow the mapping of micro-areas to city neighborhoods. A primary idea of the project will be to allow a degree of fluidity at the emerging area shapes and clustering memberships: as different features of representation will be used, different areas are expected to emerge. The aim will be to experimentally evaluate the work on a variety of data types, including user mobility patterns, types of and activity at places and textual data generated by users.

The Mobility of Personality
Contact: Cecilia Mascolo
Supervisors: Cecilia Mascolo (Computer Lab), Jason Rentfrow (Department of Psychology)

Day to day supervisor: Anastasios Noulas
Keywords: Mobility, Personality, Geography

The continuous development of pervasive technologies and the rise of location-based social networks offer the opportunity to study at new scales and dimensions human urban movements. Besides the knowledge of the whereabouts of millions of individuals in cities however, other layers of information, such as the personality traits of habitants at urban neighbourhoods have become known in the past. The so-called study of the Geography of Personality has been brought forward by social scientists with the goal of characterizing geographical areas with dominant spectrums of local personalities. Theories suggest that the emergence of those geographical pools of personality types, is due to past migration patterns of individuals. For instance, states or neighbourhoods with predominantly extravert populations were more likely to attract similar-minded individuals. In this project we aim to empirically investigate whether there exists a relationship between the mobility patterns of individuals and personality. On the urban scale we would also like to explore whether pairs of neighbourhoods in a city that feature similar personality traits are more likely to conduct higher mobility flows of individuals between them. An end-goal would be to devise a personality-based mobility model of individuals that would allow social scientists, or even urban planners, to gain deeper insights on the currently observed personality clusters in cities, but also to make future predictions on the occurrence of new ones. The analysis described will be carried upon a mobility dataset collected from public Foursquare checkins observed in a number of cities around the globe. Moreover, census and personality data will be used to achieve the project's goals.

Department of Computer Science and Technology

ACS Projects (2011—2012)

NetOS