MPhil, Part III, and Part II Project Suggestions (2012-2013)
|
Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below.
1.
Raspberry-BSP: Cheap and Safe Bulk
Synchronous Processing on Wimpy Nodes Originator/Supervisor:
Eiko Yoneki
(with Amitabha Roy) Keywords: Graph Processing, Data Parallel, Bulk
Synchronous Processing This project is inspired by FAWN [1][2] and aims to
replicate the benefit of an array of low power processors backed by persistent
storage to graph mining. The aim is to build a software stack for Raspberry-PI
that allows a set of such devices, each equipped with either an SD card or a USB
flash drive to act as a node for Bulk Synchronous Processing of graphs (see
Pregel [3] for an example of bulk synchronous processing). All the necessary
data structures will reside on Flash with the small 256MB RAM on the
Raspberry-PI acting as a scratchpad. This means that a failure of the Raspberry-PI unit
can be tolerated by simply swapping the flash memory over to another good unit.
However this will need careful integration of a 2-phase commit protocol with
bulk synchronous processing (BSP). One of the other goals of this project is to
integrate a prefetching algorithm we have developed for graphs stored on flash
drives with the software stack built for this project. The final aim will be to
show that when processing time, energy consumption and cost are taken together
this solution is competitive to running BSP on a cluster of PCs. References: [1] D. Andersen, J. Franklin, A. Phanishayee, L. Tan
and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, SOSP, 2009.
http://www.cs.cmu.edu/~fawnproj/papers/fawn-sosp2009.pdf [2] D. Andersen, J. Franklin, A. Phanishayee, L. Tan
and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, Communications of the ACM,
July 2011. [3] G. Malewicz, M. Austern, A. Bik, J. Dehnert, I.
Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph
Processing, SIGMOD, 2010. Originator/Supervisor:
Eiko Yoneki
(with Amitabha Roy) Keywords: Graph Compression This project explores graph compression mechanisms as
part of a project looking into high performance semi-external memory graph
processing (see [1] to get an idea of semi-external memory approaches). The
graph compression work will build on an in-house graph representation format
that we have developed that allows succinct representation of graphs that show
hierarchical structure. The project will explore ways to improve the
representation yielding smaller graphs on disk that are less costly to traverse.
A key element of the representation scheme is a recursive graph partitioning
step that minimises the number of edges between partitions. This is a rich space
for exploration of suitable algorithms. We are concerned primarily with
experimentally evaluating IO costs on large graphs and measuring the compression
performance. However a student with a theoretical background might consider
designing algorithms with provable bounds on compression performance, which
would be a big plus. If time allows you can also implement efficient
transformation tool based on the developed graph compression algorithm using
parallel processing tools (e.g. map/reduce). References: [1] R. Pearce, M. Gokhale and N. Amato:
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External
Memory, ACM/IEEE High Performance Computing, Networking, Storage and
Analysis, 2010.
http://dl.acm.org/citation.cfm?id=1884675
3.
Prefetcher for Graph Traversal Originator/Supervisor:
Eiko Yoneki
(with Amitabha Roy) Keywords: Graph processing, I/O optimisation, Graph
structure This project takes part of our ongoing project on
high performance semi-external memory graph processing (see [1] to get an idea
of semi-external memory approaches). This project will build on an in-house
prefetching algorithm we have developed for mining large graphs stored on Solid
State Drives. A solid state drive provides the ability to service many random
reads in parallel, provided that the requesting application is able to deliver
requests at a rate that keeps the drive sufficiently busy. Thus pre-fetching
multiple pages based on the current request in an “intelligent” manner can
greatly improve performance. Our current pre-fetcher implementations are aligned
with the graph traversal approach, (that is the “graph iterator” being
employed). The challenge in this project is to design a prefetcher that is
algorithm-agnostic. The starting point for this project will be the existing
infrastructure and data sets and some initial ideas from the supervisors. Further to this, the prefetcher implementation could
be extended to utilise multiple storage devices in parallel. The aim in this
setting would be to out-perform standard storage parallelisation techniques such
as hardware or software RAID. The performance of this approach would also depend
on the method of dividing graph data between available storage devices. References: [1] R. Pearce, M. Gokhale and N. Amato:
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External
Memory, ACM/IEEE High Performance Computing, Networking, Storage and
Analysis, 2010.
http://dl.acm.org/citation.cfm?id=1884675
4.
Optimising I/O in Scale-Free Graph
Processing Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Graph processing, I/O optimisation, Graph
structure An issue with “real-world” graphs is that these
typically exhibit “scale-free” characteristics (a degree distribution following
a power law). This usually means that such graphs contain a large number of very
poorly connected vertices, and a small number of very well-connected (high
degree) verticies. Our work has found that processing such data is challenging,
especially where access to secondary storage is required. Related work in this
area has sought to employ various static optimisations, for example to deal with
a high-degree vertex. Another approach may be to actively measure the graph,
either in a pre-processing stage or during processing, to provide some guidance
on how the underlying graph storage system can be tuned to fit the graph’s
structure. A system employing such features could be evaluated in a range of
different scenarios: the type of graphs to be analysed, the processing
algorithm, availability of main memory, and across a range of storage media.
5.
Integration of Data-flow
programming with Overlay/Crowd through Task Farming Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Data flow programming, Crowd, Haggle MapReduce [5] and Dryad/LINQ [6] have facilitated the
exploitation of multiple machines in a computer cloud. In order to provide more
expressive coordination language for different classes of computation, including
iterative and recursive algorithms, a project called Ciel [7],Skywriting [1] has
been developed in our systems research group, applying a purely-functional
script language for describing distributed computations. This project investigates extending Ciel/Skywriting
including an overlay or crowd that represents a series of distributed or mobile
nodes. Thus, a worker invoked by Ciel could be a crowd of mobile phones that
form opportunistic networks, i.e. a Haggle opportunistic network platform. This
requires an extension to the current task spawning architecture in Ciel to add
capability to create a new master task for the crowd. In an opportunistic
network environment, tasks will be executed in a delay-tolerant manner, and this
aspect should be integrated in the design of Ciel engine for crowd. As an option if time allows, you could exploit the
social structure to execute tasks in the crowd. In general, mobile networks
formed by smart phones are time dependent and dynamic with high node mobility.
Skywriting programming can be exploited in such dynamic network environments,
which is powerful for cooperative task farming. For example, the map function
could be exploited based on time-space graph of a social network to achieve an
efficient distributed computing. The logical network topology could be obtained
from various sources such as online social networks (e.g. Facebook, Orkut), and
a social graph could be transformed to a task graph, indicating the order of
operations. Efficient task graph generation based on a given distributed
computation is powerful. Mobile networks offer substantial aggregate bandwidth
and processing power. The intersection between networking and programming
languages is becoming popular. The proposed project aims at building a new paradigm,
where networking and programming are well integrated, with special emphasis on
mobile networks. The prototype of Ciel engine has been implemented in Haggle
opportunistic platform [3], and you can start building your project on top of
the prototype. The evaluation can be done in both using Android devices and
Xen-based Haggle simulator. References: [1] D. Murray and S. Hand, Scripting the cloud with
Skywriting, USENIX HotCloud, 2010. [2] D. Murray, E. Yoneki, J. Crowcroft and S. Hand,
The Case for Crowd Computing. ACM MOBIHELD, 2010. [3] EU FP6 Haggle platform
http://code.google.com/p/haggle/ [4] J. Crowcroft, E. Yoneki, et al. Promoting
Tolerance for Delay Tolerant Network Research. ACM CCR, 38(5):63-68, 2008. [5] J. Dean and S. Ghemawat, MapReduce: simplified
data processing on large clusters. USENIX OSDI, 2004. [6] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U.
Erlingsson, P. K.Gunda, and J. Currey. DryadLINQ: A System for General-Purpose
Distributed Data-Parallel Computing Using a High-Level Language. OSDI, 2008. [7] D. Murray, M. Schwarzkopf, C. Smowton, S. Smith,
A. Madhavapeddy and S. Hand. Ciel: a universal execution engine for distributed
data-flow computing, NSDI, 2011.
6.
Sync-Ball Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Mobile Agent, Opportunistic networks,
Haggle In this project, you will build function/mechanism
for migrating data objects over the Haggle platform [1]. Haggle is built over a
"layerless" networking architecture that incorporates event-driven and
asynchronous operations, which reside in the device as a kernel component.
Functional components implement the functional logic and interact only directly
with the kernel (see [2] for further details about the Haggle implementation). A
data object is an object defined in Haggle as an entity which can move between
Haggle equipped devices. Each data object contains metadata, consisting of name
and attribute pairs. Metadata operations like search or forwarding depend on the
size and complexity of the network. Individual nodes in Haggle (e.g. smart phones) are
powerful enough to perform substantial amounts of computing. An example of
computation would be to pass around a "Sync-Ball" from node to node as they
meet, and into which any node may stick the result of some computation. The main
task of the "Sync Ball" project is to design a data object which is able to
operate at a Haggle node, collect/append data, and then migrate to other
devices. Haggle already has an Android platform implementation, and Sync-Ball
can be developed over it. A challenge is that the data object should be designed
for multiple purposes and simple programming languages for such data object is
desired or design the function as part of database like operation. For example,
you can design a migrating (distributed) Excel, where computations are a set of
formulae each computed over a column, and each node may append zero or more rows
when it has the ball. If time allows, you can add parallel operation of
data objects and writing also runtime to support it. It might also be
interesting to consider issues of data integrity and availability (through
redundancy). Potential applications for Sync-Ball: ●
Synchronisation among devices: information to be
synchronised is in the data object, devices receives/updates the data then sent
out to the other devices. ●
Twitter entries, which are accumulated through the
migration within Haggle name space until the data object is sent out to the
Internet based twitter. ●
Environmental monitoring object: dispatch sync-ball
gathering Co_2 values from various sensor devices in the area. References: [2] Erik Nordstrom, Per Gunningberg, and Christian
Rohner. A Search-based Network Architecture for Mobile Devices. Tech Report
Uppsala University, 2009-003. [3] E. Yoneki, I. Baltopoulos and J. Crowcroft "
D^3N: Programming Distributed Computation in Pocket Switched Networks", ACM
MOBIHELD at SIGCOMM, 2009. [4] Xerox PARC's
Bayou Project
Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Mobility Trace A number of prior studies have collected data on
ad-hoc network connectivity between devices carried by human subjects. Such
trace data can then be used as the basis for simulations of ad-hoc or
delay-tolerant network protocols. However, typically these mobility traces
consider network infrastructure and ad-hoc networking in isolation. Modern
smartphones feature a variety of network interfaces each with varying
characteristics such as range, bandwidth and reliability. A mobility trace
that includes connectivity data over a wider range of interfaces provides
an opportunity for simulated evaluation of multipath and hybrid delay-tolerant
networking schemes.
8.
Building Naming Scheme in NDN over
V2V Communication Originator/Supervisor:
Eiko Yoneki Keywords: Named Data Networking, Content Centric
Networking, Naming This project explores an emerging new Internet
architecture called "Named Data Networking" (NDN) originated by the project
Content Centric Networking (CCN) by Van Jacobson in PARC. NDN provides network
storage, mobility, and multi-path forwarding. In NDN, communication is driven by
the data consumer who initiates sending out a packet, which contains the
interest of the consumer by a hierarchically-structured name. One Interest
retrieves one data packet, and a sequence of Interests is sent to retrieve the
sequence of individually-named Data packets that make up a large piece of
content such as a video. The naming in the NDN architecture is not mature. In
this project, you will explore naming schemes in NDN. The semantics of names is
opaque to the network but depends on the application. The structure of names
(e.g. hierarchically structured with an arbitrary number of components),
discovery, and namespace navigation should be investigated and the
vehicle-to-vehicle communication environment is targeted. You can exploit location information (e.g. latitude
and longitude by GPS or any geographical information that can be obtained from
the map) to navigate the data flow depending on the consumers' interests. The
meaning of names can be determined by local and global distribution
requirements, and the content producer and consumer application conventions, as
embodied in individual router's routing tables and application-level transport
libraries. You will use CCNx open source environment
http://www.ccnx.org/software-download-information-request/download-releases/.
If time allows, you can add a comparison study with another content centric
networking paradigm Haggle from naming aspects. References: ●
Named Data Networking:
http://www.named-data.org/ndn-proj.pdf ●
CCNx:
http://www.ccnx.org/ ●
NDN:
http://www.named-data.org/ ●
Haggle:
http://code.google.com/p/haggle/
Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Named Data Networking (NDN), Content
Centric Networking (CCN) The CCNx project is currently under active
development. It implements the content-centric networking framework presented in
[1,2,3]. One of the observations of the original paper was that the underlying
store-and-forward methodology lends itself particularly well to delay or
disruption-prone networks. However, CCNx does not currently perform routing for
such a network (relying instead on network broadcast, or static user-defined
routes). In contrast, the Haggle project provides PROPHET-based probabilistic
routing. To provide this type of functionality for CCNx, a new
“control” model would need to be written. In this project, desirable attributes
of such a module would be the ability to switch between multiple protocols, and
the ability to tune the module programmatically will be investigated. Once the
module has been completed, it can be put to the test by performing performance
evaluations against Haggle or other frameworks. References:
10.
Hybrid Dropbox Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Cloud Service, Opportunistic Networks,
Dropbox A variety of personal storage/backup applications
have surfaced over recent years, providing a cloud-based synchronisation
capability between multiple devices. In certain settings, access to the
cloud may be restricted or not possible, which in turn restricts the
availability of these services. We have developed an application called
“Mistify”, which allows personal files to be replicated amongst local devices
(such as other people’s smartphones). Typically, such devices are temporarily
networked by local wireless connections, forming a “mist”. The Haggle framework
can be used to publish and search for content in a mist, in a similar manner to
interacting with the cloud. When the cloud becomes unavailable, the mist
provides a secondary storage tier that can be used to keep data safe until cloud
connectivity is restored. While much prior research has been focused on purely
centralised or purely decentralised storage, hybrid approaches represent an area
of interest. The present prototype would need to be extended in a
variety of ways to allow comparison with similar research projects. Some focus
areas include: ●
Joining or splitting files into manageable chunks for
storage in the mist. ●
Erasure coding. ●
Using infrastructure for community detection and key
distribution. ●
Adapting the replication strategy to changing climactic
conditions (sic). ●
Making use of version vectors, and implementing an
“eventual consistency” model. ●
(Most importantly,) evaluating these new and existing
features, making use of a Xen-based network simulator and/or a controlled user
trial. References: [2]
http://code.google.com/p/haggle/
11.
Decentralised Community
Detection and Deployment of Forwarding Algorithms in Opportunistic Networks Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Community Detection, Opportunistic
Networks, Haggle BUBBLE RAP introduced in [3] is a forwarding
algorithm for Delay tolerant networks, which uses information of community and
graph centrality. It consists of RANK and LABEL strategies. RANK delegates
messages to high centrality nodes, while LABEL selects a next hop from any
member of the destination community as a relay. BUBBLE works in a decentralised
world without access to infrastructure. Therefore the distributed detection and
dissemination of node popularity and node communities, and the use of these for
forwarding decisions are crucial. This project aims at implementing distributed
community detection in Haggle framework [1] and deploying BUBBLE forwarding
algorithm. Integration work will be carried out using Xen based Haggle simulator
including generation of a few different types of networks (e.g. small world, and
emulation of real world trace). Performance testing in terms of packet delivery
compared to prior schemes should be demonstrated. A brief description of the work items is listed
below. ●
Implement distributed community detection ●
Implement BUBBLE forwarding algorithm [3] in Haggle ●
Implement network generator (e.g. small world,
inter-contact power-law etc.) within Xen Haggle simulator ●
Integrate all above ●
Performance testing in different network setting References: [1] EU FP6 Haggle Project
http://code.google.com/p/haggle/ [2] P. Hui, E.Yoneki, S. Chan, and J. Crowcroft.
Distributed community detection in delay tolerant networks. Proc. MobiArch,
2007. [3] P. Hui, J. Crowcroft, and E. Yoneki. BUBBLE Rap:
Social Based Forwarding in Delay Tolerant Networks. MobiHoc, 2008 [4] I. Leung, P. Hui, P. Lio, and J. Crowcroft.
Towards real time community detection in large networks. Phys. Rev. E, 79(6),
2009.
12.
Modularity Preserved Synthetic
Dynamic Network Generator Originator/Supervisor:
Eiko Yoneki Keywords: Clustering, Network Topology Generation Random networks, scale-free networks, or small-world
networks have been used for simulation of epidemic spread and social network
analysis, among other applications. Such networks represent human contact
networks or social networks, and in most cases a single network view is used in
an aggregated form. Real human contact networks are dynamic, and such a single
network topology may not present the dynamic feature of networks. We have used
time series of network traces from the real world to simulate the dynamic
contact networks in our previous research on routing in delay-tolerant networks.
However, those network traces are an instance in a specific environment. Ideally
creating synthetic network topology in time series will help to evaluate a
large-scale and complex network structure. Although several attempts have been
made to achieve building synthetic networks, it is still a challenge to generate
the topology of a real dynamic network. In this project, you would focus on
generating a dynamic network topology that preserves modularity. Thus, you will
pursue a rewiring strategy without losing the module/cluster architecture of the
network. You can start your investigation using our prototype
Randomise-Matrix algorithm, which preserves the single level of modules
and may be extended to a more hierarchical structure. Using online social
networks such as Twitter data for validating the algorithm is desirable. References: ● R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U.
Alon, Network motifs: simple building blocks of complex networks, Science. 2002
25;298(5594):824-7.
13.
Modelling Spread of Rumours between
Communities Keywords: Opinion Dynamics, Community Our previous work [1] reported a preliminary
investigation on interactions between networked social communities using the
Ising model to analyse the spread of rumours. The inner opinion of a given
community is forced to change through the introduction of a unique external
source, and we analyse how the other communities react to this change. We model
two conceptual external sources: namely, Strong-belief and Propaganda, by an
infinitely strong inhomogeneous external field and a finite uniform external
field, respectively. In the former case, the community changes independently
from other communities, while in the latter case also according to interactions
with other communities. We applied our model to synthetic networks as well as
various real world data ranging from human physical contact networks to online
social networks. The experimental results of applying the Ising model using real
world data clearly demonstrate two distinct scenarios of phase transitions. This project aims at validation of the above model to
examine if such Strong-belief and Propaganda will be exhibited in real
situations. Thus, the project work requires deploying a series of the
experiments, e.g. an online and mobile phone based software to share opinions on
specific topics within the given time, and monitor the change of individual
opinion in time series by controlling information available to the individual.
This will capture the dynamics of opinion change and investigate validation of
our Ising model. The experiments can start off with the simplest condition and
along our development of extending work such as coexisting two opinions (i.e.
different opinions coexisting in the society), the software should be also
extending for appropriate validation. The Ising model for understanding
interactions of communities and subsequently modelling rumour propagation is a
unique approach that will give significant input for modelling information
propagation in social networks. The result of validation can be compared with
voting model. References: [1] M. Ostillia, E. Yoneki, I. Leung, J. Mendes, P.
Lio, and J. Crowcroft: "Statistical mechanics of rumour spreading in network
communities", International Conference on Computational Science, ICCS, 2010. [2] M. Ostilli, J. F. F. Mendes, "Communication and
correlation among communities". Phys. Rev. E 80 (011142); and M. Ostilli, J. F.
F. Mendes, "Small-world of communities: communication and correlation of the
meta-network". J. Stat. Mech. L08004 (2009).
14.
Opinion and Behaviour Dynamics in
Social Networks Originator/Supervisor:
Eiko Yoneki Keywords: Information Cascade, Opinion Dynamics,
Twitter Large-scale social network communities, where
millions of users form a dynamic infrastructure to share multimedia content have
brought new entertainment experiences. Such Network-shared User Generated
Content (UGC) is becoming increasingly popular due to the emergence of Web 2.0.
YouTube is one of many UGC websites, where the popularity of content is driven
by viewers and varies dynamically. This project investigates understanding
information flow, decision making process, and behavioural dynamics based on
rumour, social topology, and given information in such environments and
modelling how people react with limited/unlimited information surroundings.
Thus, the investigation covers behaviour modelling and analysis from an opinion
dynamics perspective and can provide an answer to questions such as "do people
change their opinion based on similar to page-ranking algorithm?" or "what
factors have the largest influence?". In this project you would write a role-playing game
to understand the above problem. Millions of people worldwide play interactive
online role-playing games, forming complex and rich networks among their virtual
characters. Such games could project the real world and predict what might
occur. For example, an unexpected outbreak of an infective communicable disease
(unplanned by the game creators) occurred when a role-playing game was played in
the virtual world. This outbreak holds surprising similarities to real world
epidemics [3]. You can write role-playing game application over
mobile devices and web and obtain the information of behaviour of the players
under the different conditions (e.g. provide other players activities or
characteristics of the other players). The game could be "what's the most
popular music?" or "what's the best mobile phone to buy?" or similar to any
popular game in the market. Reading Vinayak Dalmia's masters thesis on
'Investigating Herding Behaviour in an Experimental Setup' (Judge Business
School) will be a good starting point; and papers [1][2] and [4] will be good
general background reading. References: [1] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet:
Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010. [2] Christo Wilson, Bryce Boe, Alessandra Sala,
Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks
and their Implications, ACM EuroSys 2009. [3] Balicer, R.:Modeling Infectious Diseases
Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007. [4] Sastry, Yoneki and Crowcroft " Buzztraq:
Predicting geographical access patterns of social cascades using social
networks". EuroSys workshop on Social Network Systems, 2009.
15.
Sampling Strategies for Measuring
Epidemic Spread Originator/Supervisor:
Eiko Yoneki Keywords: Sampling The epidemic spread of infectious disease requires
quantifying and modelling disease development in time and space and quantifying
patterns of disease and sampling for disease in populations is challenging. The
recent emergence of wireless technology provides a unique opportunity to collect
precise human connectivity data. For example, people can carry tiny wireless
sensors (< than 1 cm^2) that record dynamic information about other devices
nearby. A post-facto analysis of this data will yield valuable insight into
complex human interactions, which in turn will support meaningful modelling of
understanding networks. Interestingly infectious disease epidemiology and
proximity-based communication in wireless networks share common research issues
[3]. However, it is challenging to deploy an experiment to
collect such human connectivity data considering not everybody can be equipped
the device, which is capable to collect the information. Conducting such
experiments is expensive, as each participant must be given a device (typically
a mobile phone, or Bluetooth sensor), and it may also be difficult to find
willing participants. How many devices (e.g. mobile phones) are necessary to
operate an epidemic spread simulation in the city of Cambridge, where the
population is 130,000. The problem can be interpreted as how to find giant
component (i.e. connected sub-graph in human contact networks) from a given
population. A challenge is that a human contact network is not a random network There is a series of human contact data including
CRAWDAD [2] and project Haggle [1], which is available for this project. If a
specific data collection can provide significant information, such experiment
can be carried out within the project. The following two strategies can be
investigated:
Note that the general notion of "sampling strategy"
is similar to that discussed in papers Milan Vojnovic (MSR Cambridge) [5][6]. References: [2]
CRAWDAD: A
Community Resource for Archiving Wireless Data [3] Eubank, S., Kumar, V., and Marathe, M.:
Epidemiology and wireless Communication: Tight Analogy or Loose Metaphor? LNCS
5151, 91-104, 2007. [4] Kleinberg, J.: The wireless epidemic, Nature 449,
287-288, 2007. [5] Dinkar Vasudevan and Milan Vojnovic, Ranking
through Random Sampling, MSR-TR-2009-2015, January 2009. [6] Varun Gupta and Milan Vojnovic, Supplement to
"Sampling Strategies for Epidemic-Style Information Dissemination",
MSR-TR-2009-70, 2009.
16.
The impact of social influence on
information propagation in social networks Supervisor:
Eiko Yoneki
(with Mehdi Hosseini) Keywords: Information Diffusion, Twitter, Behaviour
Model Social networks like Facebook and Twitter have
brought new entertainment experiences. People or companies publish information
in a social network through the network’s topology and by sharing them with
their neighbours. The published information can either be quickly ignored or
widely spread in the network. The extent to which information is propagated
highly depends on social influences that the members have on each other. This project investigates the impact of social
influences on social networks. Specifically, by monitoring members' activity,
this project will develop (i) data-driven models that explain various behaviours
and influences happening in a social network, (ii) metrics that quantify various
forms of social influences e.g. conformity, reactance, peer pressure, sales and
marking, (iii) methods that identify influential members and their
characteristics, (iv) gamification experiments for collecting the required data. In Twitter, for instance, a simple type of influence
is the ability to get others to retweet a message. Proposed solutions should
answer questions such as: (i) what is the probability that a member will be
influenced by a Twitter message and retweet it? (ii) What is the type of
influence? (iii) Who are the influential members? In this project, you will work on Twitter or Facebook
data. You need to know a programming language, e.g. Java, C, C#, Python, for
collecting data and one programming language for statistical computing, e.g. R,
Matlab, Java. References: [1] Meeyoung Cha, Hamed Haddadi, Fabrício Benevenuto,
P. Krishna Gummadi: Measuring User Influence in Twitter: The Million Follower
Fallacy. ICWSM 2010. [2] Evan T.R. Rosenman: Retweets, but Not Just
Retweets: Quantifying and Predicting Influence on Twitter. Thesis, Harvard
University. [3] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet:
Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010. [4] Christo Wilson, Bryce Boe, Alessandra Sala,
Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks
and their Implications, ACM EuroSys 2009. [5] Balicer, R.:Modeling Infectious Diseases
Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007. [6] Sastry, Yoneki and Crowcroft " Buzztraq:
Predicting geographical access patterns of social cascades using social
networks". EuroSys workshop on Social Network Systems, 2009.
17.
Eye Tracking for Behavioural Study Supervisor:
Eiko Yoneki Keywords: Behaviour Study and Model, Perception,
Cognitive Science In this project you would use eye tracking device to
extract unconscious behaviour of people when they use online social network
tools. Example questions to solve are how people make a decision to re-tweet in
Twitter or post a wall comments n Facebook. We will provide an eye tracking experiment device,
which can automatically record the web browser and monitor’s information from
the participant’s eye movement. You would design the experimental setting up and
goal of the experiment. Our on-going photo rating experiments in our EU FP7
RECOGNITION project can be used as an experimental material to complement the
extracting data. You can also write your experimental application or online
social networks (e.g. Facebook) to extract unconscious social behaviour in
comparison to feedback information (e.g. how the other people rated, what
is the average of rating) from the participant’s social clique. As an extension, you could write an economic gaming
application to capture the participant’s behaviour on decision making for
competitive tasks. The work is based on the EU FP7 RECOGNITION project
http://www.recognition-project.eu/. This role is suitable for
the student, who is interested in data analytics and experimental work. References: [1] Roman Bednarik , Markku Tukiainen, An
eye-tracking methodology for characterizing program comprehension processes,
Proceedings of the 2006 symposium on Eye tracking research & applications, March
27-29, 2006, San Diego, California [2] Hayes, T. R., Petrov, A. A., & Sederberg, P. B. (2011) A novel method for analyzing sequential eye movements reveals strategic inuence on Raven's Advanced Progressive Matrices. Journal of Vision, 11 (10:10), 1-11, http://www.journalofvision.org/11/10/10/.
18.
Trust Relationships and Content Filtering Keywords: Social Trust, Content Filtering Online social networks have become extremely popular in recent years. One of the challenges that people are now facing is to keep up with the amount of content that is available to them on their preferred social networks (status updates, tweets, photos, news, etc). They have increased to a point which is nearly impossible for the average user to review all of them. In this project you would work on automatic filtering of unwanted content. One of the ways to achieve the goal is through the use of trust relationships. Trust relationships simply mean the extent that a user on a network trust other users or services. By monitoring the relationship between members you will investigate which parameters contribute towards trust between different users and how they can be quantified. The references below are good starting points. The solution should be able to answer research questions such as: (i) How do users choose which content, and from which other user, to care about? (ii) Can a quantifiable measurement of trust relationship help to predict if a content is of no desire to the user? (iii) What makes a user trustworthy? In this project you will work on Facebook or Twitter data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for statistical computing, e.g. R, MATLAB , Java. References:[1] Yarden Katz and Jennifer Golbeck. 2006. Social network-based trust in prioritized default logic. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1345-1350. [2] Ugur Kuter and Jennifer Golbeck. 2010. Using probabilistic confidence models for trust inference in Web-based social networks. ACM Trans. Internet Technol. 10, 2, Article 8 (June 2010), 23 pages.
19. Trust Relationships and Propagation of Trust in Online Social Communities Contact: Eiko Yoneki (with Arman Idani)Keywords: Social Trust, Community, Information Propagation Communities in social networks are groups of tightly coupled nodes (users). In communities, users normally have a lot of mutual friends with whom they might interact differently. The extent that a user trusts other users, trust relationship, is usually measured by monitoring their interaction. In this project you would investigate trust relationships in online social communities and its relationship to the properties of the community, an how trust propagates among the members of a community. References below are good starting points. The solution should be able to answer research questions such as: (i) How does trust relationships within a community differs from trust relationships outside of the community? (ii) To what extent does the trust relationship among members of a community depend on its properties? (iii) How a user's role in the community correlate with their trustworthiness, is there a relationship? In this project you will work on Facebook or Twitter data. You need to know a programming language, e.g. Java, C, C#, Python, for collecting data and one programming language for social network analysis and statistical computing, e.g. R, MATLAB, Python. References:[1] Yarden Katz and Jennifer Golbeck. 2006. Social network-based trust in prioritized default logic. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1345-1350. [2] R. Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the 13th international conference on World Wide Web (WWW '04). ACM, New York, NY, USA, 403-412. [3] Christian Borgs, Jennifer Chayes, Adam Tauman Kalai, Azarakhsh Malekian, and Moshe Tennenholtz. 2010. A novel approach to propagating distrust. In Proceedings of the 6th international conference on Internet and network economics (WINE'10), Amin Saberi (Ed.). Springer-Verlag, Berlin, Heidelberg, 87-105. Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for any question. |