MPhil, Part III, and Part II Project Suggestions (2012-2013)

Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below.

1. Raspberry-BSP: Cheap and Safe Bulk Synchronous Processing on Wimpy Nodes

Originator/Supervisor: Eiko Yoneki (with Amitabha Roy)

Keywords: Graph Processing, Data Parallel, Bulk Synchronous Processing

This project is inspired by FAWN [1][2] and aims to replicate the benefit of an array of low power processors backed by persistent storage to graph mining. The aim is to build a software stack for Raspberry-PI that allows a set of such devices, each equipped with either an SD card or a USB flash drive to act as a node for Bulk Synchronous Processing of graphs (see Pregel [3] for an example of bulk synchronous processing). All the necessary data structures will reside on Flash with the small 256MB RAM on the Raspberry-PI acting as a scratchpad.

This means that a failure of the Raspberry-PI unit can be tolerated by simply swapping the flash memory over to another good unit. However this will need careful integration of a 2-phase commit protocol with bulk synchronous processing (BSP). One of the other goals of this project is to integrate a prefetching algorithm we have developed for graphs stored on flash drives with the software stack built for this project. The final aim will be to show that when processing time, energy consumption and cost are taken together this solution is competitive to running BSP on a cluster of PCs.

References:

[1] D. Andersen, J. Franklin, A. Phanishayee, L. Tan and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, SOSP, 2009. http://www.cs.cmu.edu/~fawnproj/papers/fawn-sosp2009.pdf

[2] D. Andersen, J. Franklin, A. Phanishayee, L. Tan and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, Communications of the ACM, July 2011.

[3] G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.

2. Graph Compression

Originator/Supervisor: Eiko Yoneki (with Amitabha Roy)

Keywords: Graph Compression

This project explores graph compression mechanisms as part of a project looking into high performance semi-external memory graph processing (see [1] to get an idea of semi-external memory approaches). The graph compression work will build on an in-house graph representation format that we have developed that allows succinct representation of graphs that show hierarchical structure. The project will explore ways to improve the representation yielding smaller graphs on disk that are less costly to traverse. A key element of the representation scheme is a recursive graph partitioning step that minimises the number of edges between partitions. This is a rich space for exploration of suitable algorithms. We are concerned primarily with experimentally evaluating IO costs on large graphs and measuring the compression performance. However a student with a theoretical background might consider designing algorithms with provable bounds on compression performance, which would be a big plus. If time allows you can also implement efficient transformation tool based on the developed graph compression algorithm using parallel processing tools (e.g. map/reduce).

References:

[1] R. Pearce, M. Gokhale and N. Amato: Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory, ACM/IEEE High Performance Computing, Networking, Storage and Analysis, 2010. http://dl.acm.org/citation.cfm?id=1884675

3. Prefetcher for Graph Traversal

Originator/Supervisor: Eiko Yoneki (with Amitabha Roy)

Keywords: Graph processing, I/O optimisation, Graph structure

This project takes part of our ongoing project on high performance semi-external memory graph processing (see [1] to get an idea of semi-external memory approaches). This project will build on an in-house prefetching algorithm we have developed for mining large graphs stored on Solid State Drives. A solid state drive provides the ability to service many random reads in parallel, provided that the requesting application is able to deliver requests at a rate that keeps the drive sufficiently busy. Thus pre-fetching multiple pages based on the current request in an “intelligent” manner can greatly improve performance. Our current pre-fetcher implementations are aligned with the graph traversal approach, (that is the “graph iterator” being employed). The challenge in this project is to design a prefetcher that is algorithm-agnostic. The starting point for this project will be the existing infrastructure and data sets and some initial ideas from the supervisors.

Further to this, the prefetcher implementation could be extended to utilise multiple storage devices in parallel. The aim in this setting would be to out-perform standard storage parallelisation techniques such as hardware or software RAID. The performance of this approach would also depend on the method of dividing graph data between available storage devices.

References:

4. Optimising I/O in Scale-Free Graph Processing

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Graph processing, I/O optimisation, Graph structure

An issue with “real-world” graphs is that these typically exhibit “scale-free” characteristics (a degree distribution following a power law). This usually means that such graphs contain a large number of very poorly connected vertices, and a small number of very well-connected (high degree) verticies. Our work has found that processing such data is challenging, especially where access to secondary storage is required. Related work in this area has sought to employ various static optimisations, for example to deal with a high-degree vertex. Another approach may be to actively measure the graph, either in a pre-processing stage or during processing, to provide some guidance on how the underlying graph storage system can be tuned to fit the graph’s structure. A system employing such features could be evaluated in a range of different scenarios: the type of graphs to be analysed, the processing algorithm, availability of main memory, and across a range of storage media.

5. Integration of Data-flow programming with Overlay/Crowd through Task Farming

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Data flow programming, Crowd, Haggle

MapReduce [5] and Dryad/LINQ [6] have facilitated the exploitation of multiple machines in a computer cloud. In order to provide more expressive coordination language for different classes of computation, including iterative and recursive algorithms, a project called Ciel [7],Skywriting [1] has been developed in our systems research group, applying a purely-functional script language for describing distributed computations.

This project investigates extending Ciel/Skywriting including an overlay or crowd that represents a series of distributed or mobile nodes. Thus, a worker invoked by Ciel could be a crowd of mobile phones that form opportunistic networks, i.e. a Haggle opportunistic network platform. This requires an extension to the current task spawning architecture in Ciel to add capability to create a new master task for the crowd. In an opportunistic network environment, tasks will be executed in a delay-tolerant manner, and this aspect should be integrated in the design of Ciel engine for crowd.

As an option if time allows, you could exploit the social structure to execute tasks in the crowd. In general, mobile networks formed by smart phones are time dependent and dynamic with high node mobility. Skywriting programming can be exploited in such dynamic network environments, which is powerful for cooperative task farming. For example, the map function could be exploited based on time-space graph of a social network to achieve an efficient distributed computing. The logical network topology could be obtained from various sources such as online social networks (e.g. Facebook, Orkut), and a social graph could be transformed to a task graph, indicating the order of operations. Efficient task graph generation based on a given distributed computation is powerful. Mobile networks offer substantial aggregate bandwidth and processing power. The intersection between networking and programming languages is becoming popular.

The proposed project aims at building a new paradigm, where networking and programming are well integrated, with special emphasis on mobile networks. The prototype of Ciel engine has been implemented in Haggle opportunistic platform [3], and you can start building your project on top of the prototype. The evaluation can be done in both using Android devices and Xen-based Haggle simulator.

References:

[1] D. Murray and S. Hand, Scripting the cloud with Skywriting, USENIX HotCloud, 2010.

[2] D. Murray, E. Yoneki, J. Crowcroft and S. Hand, The Case for Crowd Computing. ACM MOBIHELD, 2010.

[3] EU FP6 Haggle platform http://code.google.com/p/haggle/

[4] J. Crowcroft, E. Yoneki, et al. Promoting Tolerance for Delay Tolerant Network Research. ACM CCR, 38(5):63-68, 2008.

[5] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters. USENIX OSDI, 2004.

[6] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K.Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. OSDI, 2008.

[7] D. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy and S. Hand.

Ciel: a universal execution engine for distributed data-flow computing, NSDI, 2011.

6. Sync-Ball

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Mobile Agent, Opportunistic networks, Haggle

In this project, you will build function/mechanism for migrating data objects over the Haggle platform [1]. Haggle is built over a "layerless" networking architecture that incorporates event-driven and asynchronous operations, which reside in the device as a kernel component. Functional components implement the functional logic and interact only directly with the kernel (see [2] for further details about the Haggle implementation). A data object is an object defined in Haggle as an entity which can move between Haggle equipped devices. Each data object contains metadata, consisting of name and attribute pairs. Metadata operations like search or forwarding depend on the size and complexity of the network.

Individual nodes in Haggle (e.g. smart phones) are powerful enough to perform substantial amounts of computing. An example of computation would be to pass around a "Sync-Ball" from node to node as they meet, and into which any node may stick the result of some computation. The main task of the "Sync Ball" project is to design a data object which is able to operate at a Haggle node, collect/append data, and then migrate to other devices. Haggle already has an Android platform implementation, and Sync-Ball can be developed over it. A challenge is that the data object should be designed for multiple purposes and simple programming languages for such data object is desired or design the function as part of database like operation. For example, you can design a migrating (distributed) Excel, where computations are a set of formulae each computed over a column, and each node may append zero or more rows when it has the ball.

If time allows, you can add parallel operation of data objects and writing also runtime to support it. It might also be interesting to consider issues of data integrity and availability (through redundancy). Potential applications for Sync-Ball:

● Synchronisation among devices: information to be synchronised is in the data object, devices receives/updates the data then sent out to the other devices.

● Twitter entries, which are accumulated through the migration within Haggle name space until the data object is sent out to the Internet based twitter.

● Environmental monitoring object: dispatch sync-ball gathering Co_2 values from various sensor devices in the area.

References:

[1] EU FP6 Haggle Project

[2] Erik Nordstrom, Per Gunningberg, and Christian Rohner. A Search-based Network Architecture for Mobile Devices. Tech Report Uppsala University, 2009-003.

[3] E. Yoneki, I. Baltopoulos and J. Crowcroft " D^3N: Programming Distributed Computation in Pocket Switched Networks", ACM MOBIHELD at SIGCOMM, 2009.

[4] Xerox PARC's Bayou Project

7. Human Contact / Mobility Traces in Hybrid Networks

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Mobility Trace

A number of prior studies have collected data on ad-hoc network connectivity between devices carried by human subjects. Such trace data can then be used as the basis for simulations of ad-hoc or delay-tolerant network protocols. However, typically these mobility traces consider network infrastructure and ad-hoc networking in isolation. Modern smartphones feature a variety of network interfaces each with varying characteristics such as range, bandwidth and reliability. A mobility trace that includes connectivity data over a wider range of interfaces provides an opportunity for simulated evaluation of multipath and hybrid delay-tolerant networking schemes.

8. Building Naming Scheme in NDN over V2V Communication

Originator/Supervisor: Eiko Yoneki

Keywords: Named Data Networking, Content Centric Networking, Naming

This project explores an emerging new Internet architecture called "Named Data Networking" (NDN) originated by the project Content Centric Networking (CCN) by Van Jacobson in PARC. NDN provides network storage, mobility, and multi-path forwarding. In NDN, communication is driven by the data consumer who initiates sending out a packet, which contains the interest of the consumer by a hierarchically-structured name. One Interest retrieves one data packet, and a sequence of Interests is sent to retrieve the sequence of individually-named Data packets that make up a large piece of content such as a video.

The naming in the NDN architecture is not mature. In this project, you will explore naming schemes in NDN. The semantics of names is opaque to the network but depends on the application. The structure of names (e.g. hierarchically structured with an arbitrary number of components), discovery, and namespace navigation should be investigated and the vehicle-to-vehicle communication environment is targeted.

You can exploit location information (e.g. latitude and longitude by GPS or any geographical information that can be obtained from the map) to navigate the data flow depending on the consumers' interests. The meaning of names can be determined by local and global distribution requirements, and the content producer and consumer application conventions, as embodied in individual router's routing tables and application-level transport libraries.

You will use CCNx open source environment http://www.ccnx.org/software-download-information-request/download-releases/. If time allows, you can add a comparison study with another content centric networking paradigm Haggle from naming aspects.

References:

● Named Data Networking: http://www.named-data.org/ndn-proj.pdf

● CCNx: http://www.ccnx.org/

● NDN: http://www.named-data.org/

● Haggle: http://code.google.com/p/haggle/

9. Delay-Tolerant NDN

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Named Data Networking (NDN), Content Centric Networking (CCN)

The CCNx project is currently under active development. It implements the content-centric networking framework presented in [1,2,3]. One of the observations of the original paper was that the underlying store-and-forward methodology lends itself particularly well to delay or disruption-prone networks. However, CCNx does not currently perform routing for such a network (relying instead on network broadcast, or static user-defined routes). In contrast, the Haggle project provides PROPHET-based probabilistic routing.

To provide this type of functionality for CCNx, a new “control” model would need to be written. In this project, desirable attributes of such a module would be the ability to switch between multiple protocols, and the ability to tune the module programmatically will be investigated. Once the module has been completed, it can be put to the test by performing performance evaluations against Haggle or other frameworks.

References:

Named Data Networking: http://www.named-data.org/ndn-proj.pdf
CCNx: http://www.ccnx.org/
NDN: http://www.named-data.org/
Haggle: http://code.google.com/p/haggle/

10. Hybrid Dropbox

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Cloud Service, Opportunistic Networks, Dropbox

A variety of personal storage/backup applications have surfaced over recent years, providing a cloud-based synchronisation capability between multiple devices. In certain settings, access to the cloud may be restricted or not possible, which in turn restricts the availability of these services. We have developed an application called “Mistify”, which allows personal files to be replicated amongst local devices (such as other people’s smartphones). Typically, such devices are temporarily networked by local wireless connections, forming a “mist”. The Haggle framework can be used to publish and search for content in a mist, in a similar manner to interacting with the cloud. When the cloud becomes unavailable, the mist provides a secondary storage tier that can be used to keep data safe until cloud connectivity is restored. While much prior research has been focused on purely centralised or purely decentralised storage, hybrid approaches represent an area of interest.

The present prototype would need to be extended in a variety of ways to allow comparison with similar research projects. Some focus areas include:

● Joining or splitting files into manageable chunks for storage in the mist.

● Erasure coding.

● Using infrastructure for community detection and key distribution.

● Adapting the replication strategy to changing climactic conditions (sic).

● Making use of version vectors, and implementing an “eventual consistency” model.

● (Most importantly,) evaluating these new and existing features, making use of a Xen-based network simulator and/or a controlled user trial.

References:

[1] http://www.dropbox.com/

[2] http://code.google.com/p/haggle/

11. Decentralised Community Detection and Deployment of Forwarding Algorithms in Opportunistic Networks

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Community Detection, Opportunistic Networks, Haggle

BUBBLE RAP introduced in [3] is a forwarding algorithm for Delay tolerant networks, which uses information of community and graph centrality. It consists of RANK and LABEL strategies. RANK delegates messages to high centrality nodes, while LABEL selects a next hop from any member of the destination community as a relay. BUBBLE works in a decentralised world without access to infrastructure. Therefore the distributed detection and dissemination of node popularity and node communities, and the use of these for forwarding decisions are crucial. This project aims at implementing distributed community detection in Haggle framework [1] and deploying BUBBLE forwarding algorithm. Integration work will be carried out using Xen based Haggle simulator including generation of a few different types of networks (e.g. small world, and emulation of real world trace). Performance testing in terms of packet delivery compared to prior schemes should be demonstrated.

A brief description of the work items is listed below.

● Implement distributed community detection

● Implement BUBBLE forwarding algorithm [3] in Haggle

● Implement network generator (e.g. small world, inter-contact power-law etc.) within Xen Haggle simulator

● Integrate all above

● Performance testing in different network setting

References:

[1] EU FP6 Haggle Project http://code.google.com/p/haggle/

[2] P. Hui, E.Yoneki, S. Chan, and J. Crowcroft. Distributed community detection in delay tolerant networks. Proc. MobiArch, 2007.

[3] P. Hui, J. Crowcroft, and E. Yoneki. BUBBLE Rap: Social Based Forwarding in Delay Tolerant Networks. MobiHoc, 2008

[4] I. Leung, P. Hui, P. Lio, and J. Crowcroft. Towards real time community detection in large networks. Phys. Rev. E, 79(6), 2009.

12. Modularity Preserved Synthetic Dynamic Network Generator

Originator/Supervisor: Eiko Yoneki

Keywords: Clustering, Network Topology Generation

Random networks, scale-free networks, or small-world networks have been used for simulation of epidemic spread and social network analysis, among other applications. Such networks represent human contact networks or social networks, and in most cases a single network view is used in an aggregated form. Real human contact networks are dynamic, and such a single network topology may not present the dynamic feature of networks. We have used time series of network traces from the real world to simulate the dynamic contact networks in our previous research on routing in delay-tolerant networks. However, those network traces are an instance in a specific environment. Ideally creating synthetic network topology in time series will help to evaluate a large-scale and complex network structure. Although several attempts have been made to achieve building synthetic networks, it is still a challenge to generate the topology of a real dynamic network. In this project, you would focus on generating a dynamic network topology that preserves modularity. Thus, you will pursue a rewiring strategy without losing the module/cluster architecture of the network. You can start your investigation using our prototype Randomise-Matrix algorithm, which preserves the single level of modules and may be extended to a more hierarchical structure. Using online social networks such as Twitter data for validating the algorithm is desirable.

References:

● R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs: simple building blocks of complex networks, Science. 2002 25;298(5594):824-7.

13. Modelling Spread of Rumours between Communities

Originator/Supervisor: Eiko Yoneki

Keywords: Opinion Dynamics, Community

Our previous work [1] reported a preliminary investigation on interactions between networked social communities using the Ising model to analyse the spread of rumours. The inner opinion of a given community is forced to change through the introduction of a unique external source, and we analyse how the other communities react to this change. We model two conceptual external sources: namely, Strong-belief and Propaganda, by an infinitely strong inhomogeneous external field and a finite uniform external field, respectively. In the former case, the community changes independently from other communities, while in the latter case also according to interactions with other communities. We applied our model to synthetic networks as well as various real world data ranging from human physical contact networks to online social networks. The experimental results of applying the Ising model using real world data clearly demonstrate two distinct scenarios of phase transitions.

This project aims at validation of the above model to examine if such Strong-belief and Propaganda will be exhibited in real situations. Thus, the project work requires deploying a series of the experiments, e.g. an online and mobile phone based software to share opinions on specific topics within the given time, and monitor the change of individual opinion in time series by controlling information available to the individual. This will capture the dynamics of opinion change and investigate validation of our Ising model. The experiments can start off with the simplest condition and along our development of extending work such as coexisting two opinions (i.e. different opinions coexisting in the society), the software should be also extending for appropriate validation. The Ising model for understanding interactions of communities and subsequently modelling rumour propagation is a unique approach that will give significant input for modelling information propagation in social networks. The result of validation can be compared with voting model.

References:

[1] M. Ostillia, E. Yoneki, I. Leung, J. Mendes, P. Lio, and J. Crowcroft: "Statistical mechanics of rumour spreading in network communities", International Conference on Computational Science, ICCS, 2010.

[2] M. Ostilli, J. F. F. Mendes, "Communication and correlation among communities". Phys. Rev. E 80 (011142); and M. Ostilli, J. F. F. Mendes, "Small-world of communities: communication and correlation of the meta-network". J. Stat. Mech. L08004 (2009).

14. Opinion and Behaviour Dynamics in Social Networks

Originator/Supervisor: Eiko Yoneki

Keywords: Information Cascade, Opinion Dynamics, Twitter

Large-scale social network communities, where millions of users form a dynamic infrastructure to share multimedia content have brought new entertainment experiences. Such Network-shared User Generated Content (UGC) is becoming increasingly popular due to the emergence of Web 2.0. YouTube is one of many UGC websites, where the popularity of content is driven by viewers and varies dynamically. This project investigates understanding information flow, decision making process, and behavioural dynamics based on rumour, social topology, and given information in such environments and modelling how people react with limited/unlimited information surroundings. Thus, the investigation covers behaviour modelling and analysis from an opinion dynamics perspective and can provide an answer to questions such as "do people change their opinion based on similar to page-ranking algorithm?" or "what factors have the largest influence?".

In this project you would write a role-playing game to understand the above problem. Millions of people worldwide play interactive online role-playing games, forming complex and rich networks among their virtual characters. Such games could project the real world and predict what might occur. For example, an unexpected outbreak of an infective communicable disease (unplanned by the game creators) occurred when a role-playing game was played in the virtual world. This outbreak holds surprising similarities to real world epidemics [3].

You can write role-playing game application over mobile devices and web and obtain the information of behaviour of the players under the different conditions (e.g. provide other players activities or characteristics of the other players). The game could be "what's the most popular music?" or "what's the best mobile phone to buy?" or similar to any popular game in the market. Reading Vinayak Dalmia's masters thesis on 'Investigating Herding Behaviour in an Experimental Setup' (Judge Business School) will be a good starting point; and papers [1][2] and [4] will be good general background reading.

References:

[1] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.

[2] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.

[3] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.

[4] Sastry, Yoneki and Crowcroft " Buzztraq: Predicting geographical access patterns of social cascades using social networks". EuroSys workshop on Social Network Systems, 2009.

15. Sampling Strategies for Measuring Epidemic Spread

Originator/Supervisor: Eiko Yoneki

Keywords: Sampling

The epidemic spread of infectious disease requires quantifying and modelling disease development in time and space and quantifying patterns of disease and sampling for disease in populations is challenging. The recent emergence of wireless technology provides a unique opportunity to collect precise human connectivity data. For example, people can carry tiny wireless sensors (< than 1 cm^2) that record dynamic information about other devices nearby. A post-facto analysis of this data will yield valuable insight into complex human interactions, which in turn will support meaningful modelling of understanding networks.

Interestingly infectious disease epidemiology and proximity-based communication in wireless networks share common research issues [3].

However, it is challenging to deploy an experiment to collect such human connectivity data considering not everybody can be equipped the device, which is capable to collect the information. Conducting such experiments is expensive, as each participant must be given a device (typically a mobile phone, or Bluetooth sensor), and it may also be difficult to find willing participants. How many devices (e.g. mobile phones) are necessary to operate an epidemic spread simulation in the city of Cambridge, where the population is 130,000. The problem can be interpreted as how to find giant component (i.e. connected sub-graph in human contact networks) from a given population. A challenge is that a human contact network is not a random network

There is a series of human contact data including CRAWDAD [2] and project Haggle [1], which is available for this project. If a specific data collection can provide significant information, such experiment can be carried out within the project. The following two strategies can be investigated:

Investigate different methods of sampling a population so that the resulting sub-graph exhibits the same macroscopic properties. Methods may include randomly discarding nodes from a graph to discover the critical point at which these properties change.
Investigate whether static social networks (such as the friendship graph from an online social network) can serve as a useful predictor of which subset of nodes would be representative.

Note that the general notion of "sampling strategy" is similar to that discussed in papers Milan Vojnovic (MSR Cambridge) [5][6].

References:

[1] EU FP6 Haggle Project

[2] CRAWDAD: A Community Resource for Archiving Wireless Data

[3] Eubank, S., Kumar, V., and Marathe, M.: Epidemiology and wireless Communication: Tight Analogy or Loose Metaphor? LNCS 5151, 91-104, 2007.

[4] Kleinberg, J.: The wireless epidemic, Nature 449, 287-288, 2007.

[5] Dinkar Vasudevan and Milan Vojnovic, Ranking through Random Sampling, MSR-TR-2009-2015, January 2009.

[6] Varun Gupta and Milan Vojnovic, Supplement to "Sampling Strategies for Epidemic-Style Information Dissemination", MSR-TR-2009-70, 2009.

16. The impact of social influence on information propagation in social networks

Supervisor: Eiko Yoneki (with Mehdi Hosseini)

Keywords: Information Diffusion, Twitter, Behaviour Model

Social networks like Facebook and Twitter have brought new entertainment experiences. People or companies publish information in a social network through the network’s topology and by sharing them with their neighbours. The published information can either be quickly ignored or widely spread in the network. The extent to which information is propagated highly depends on social influences that the members have on each other.

This project investigates the impact of social influences on social networks. Specifically, by monitoring members' activity, this project will develop (i) data-driven models that explain various behaviours and influences happening in a social network, (ii) metrics that quantify various forms of social influences e.g. conformity, reactance, peer pressure, sales and marking, (iii) methods that identify influential members and their characteristics, (iv) gamification experiments for collecting the required data.

In Twitter, for instance, a simple type of influence is the ability to get others to retweet a message. Proposed solutions should answer questions such as: (i) what is the probability that a member will be influenced by a Twitter message and retweet it? (ii) What is the type of influence? (iii) Who are the influential members?

In this project, you will work on Twitter or Facebook data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for statistical computing, e.g. R, Matlab, Java.

References:

[1] Meeyoung Cha, Hamed Haddadi, Fabrício Benevenuto, P. Krishna Gummadi: Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM 2010.

[2] Evan T.R. Rosenman: Retweets, but Not Just Retweets: Quantifying and Predicting Influence on Twitter. Thesis, Harvard University.

[3] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.

[4] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.

[5] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.

[6] Sastry, Yoneki and Crowcroft " Buzztraq: Predicting geographical access patterns of social cascades using social networks". EuroSys workshop on Social Network Systems, 2009.

17. Eye Tracking for Behavioural Study

Supervisor: Eiko Yoneki

Keywords: Behaviour Study and Model, Perception, Cognitive Science

In this project you would use eye tracking device to extract unconscious behaviour of people when they use online social network tools. Example questions to solve are how people make a decision to re-tweet in Twitter or post a wall comments n Facebook.

We will provide an eye tracking experiment device, which can automatically record the web browser and monitor’s information from the participant’s eye movement. You would design the experimental setting up and goal of the experiment. Our on-going photo rating experiments in our EU FP7 RECOGNITION project can be used as an experimental material to complement the extracting data. You can also write your experimental application or online social networks (e.g. Facebook) to extract unconscious social behaviour in comparison to feedback information (e.g. how the other people rated, what is the average of rating) from the participant’s social clique.

As an extension, you could write an economic gaming application to capture the participant’s behaviour on decision making for competitive tasks. The work is based on the EU FP7 RECOGNITION project http://www.recognition-project.eu/. This role is suitable for the student, who is interested in data analytics and experimental work.

References:

[1] Roman Bednarik , Markku Tukiainen, An eye-tracking methodology for characterizing program comprehension processes, Proceedings of the 2006 symposium on Eye tracking research & applications, March 27-29, 2006, San Diego, California

[2] Hayes, T. R., Petrov, A. A., & Sederberg, P. B. (2011) A novel method for analyzing sequential eye movements reveals strategic inuence on Raven's Advanced Progressive Matrices. Journal of Vision, 11 (10:10), 1-11, http://www.journalofvision.org/11/10/10/.

18. Trust Relationships and Content Filtering

Contact: Eiko Yoneki (with Arman Idani)
Keywords: Social Trust, Content Filtering

Online social networks have become extremely popular in recent years. One of the challenges that people are now facing is to keep up with the amount of content that is available to them on their preferred social networks (status updates, tweets, photos, news, etc). They have increased to a point which is nearly impossible for the average user to review all of them. In this project you would work on automatic filtering of unwanted content. One of the ways to achieve the goal is through the use of trust relationships. Trust relationships simply mean the extent that a user on a network trust other users or services. By monitoring the relationship between members you will investigate which parameters contribute towards trust between different users and how they can be quantified. The references below are good starting points. The solution should be able to answer research questions such as: (i) How do users choose which content, and from which other user, to care about? (ii) Can a quantifiable measurement of trust relationship help to predict if a content is of no desire to the user? (iii) What makes a user trustworthy? In this project you will work on Facebook or Twitter data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for statistical computing, e.g. R, MATLAB , Java.

19. Trust Relationships and Propagation of Trust in Online Social Communities

Contact: Eiko Yoneki (with Arman Idani)
Keywords: Social Trust, Community, Information Propagation

Communities in social networks are groups of tightly coupled nodes (users). In communities, users normally have a lot of mutual friends with whom they might interact differently. The extent that a user trusts other users, trust relationship, is usually measured by monitoring their interaction. In this project you would investigate trust relationships in online social communities and its relationship to the properties of the community, an how trust propagates among the members of a community. References below are good starting points. The solution should be able to answer research questions such as: (i) How does trust relationships within a community differs from trust relationships outside of the community? (ii) To what extent does the trust relationship among members of a community depend on its properties? (iii) How a user's role in the community correlate with their trustworthiness, is there a relationship? In this project you will work on Facebook or Twitter data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for social network analysis and statistical computing, e.g. R, MATLAB, Python.

References:
[1] Yarden Katz and Jennifer Golbeck. 2006. Social network-based trust in prioritized default logic. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1345-1350.
[2] R. Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the 13th international conference on World Wide Web (WWW '04). ACM, New York, NY, USA, 403-412.
[3] Christian Borgs, Jennifer Chayes, Adam Tauman Kalai, Azarakhsh Malekian, and Moshe Tennenholtz. 2010. A novel approach to propagating distrust. In Proceedings of the 6th international conference on Internet and network economics (WINE'10), Amin Saberi (Ed.). Springer-Verlag, Berlin, Heidelberg, 87-105.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for any question.

Computer Laboratory

MPhil, Part III, and Part II Project Suggestions (2012-2013)

Contact Email