MPhil, Part III, and Part II Project Suggestions (2013-2014)
|
Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below.
1.
RasPi-Net: Building a Distributed
Raspberry Pi Network together with Satellite Communication Originator/Supervisor:
Eiko Yoneki
(with Abraham Martin) Keywords: Raspberry Pi, Delay Tolerant Networks,
Satellite Communication, Stream Processing This project aims to build a decentralised Raspberry
Pi network (Opportunistic Networks), which can be deployed in wild and remote
regions. The gateway Raspberry Pi will be integrated with a satellite
communication devices, where the light version of Delay Tolerant Network (DTN)
bundle protocol will be embedded. RasPi-Net could consist of 10-100 nodes. As an
example, a remote sensing application could be written either in RasPi or Smart
phones that can connect to RasPi. Collected data could be processed within
Ras-Pi Net to reduce data size that streams over the satellite communication to
the base location. The crowd sourcing application can run on top of RasPi-Net,
too. You could write a specific application to help in a disaster situation or
build a typical streaming platform. Example – Communication in disaster situation. It is
usual in emergency situations that other means of communication are inoperative
(GSM, DSL, etc) in the gateway node. In this project you will have to develop a
mobile application (Android and/or iOS) that is able to communicate with the
RasPi-Net via SMS using the satellite connection and a SMS web service like
Twilio [2]. The communication between the Raspberry-Pi and the mobile phone will
use Wi-Fi or a 3G femtocell which provide the best range of communication for
mobile devices. This application can be extended to include Twitter or Facebook
messages using their corresponding APIs. References: [1] Delay Tolerant Network Bundle Protocol:
http://tools.ietf.org/html/rfc6255 [2] RockBlock Technology:http://rockblock.rock7mobile.com/ [3] Twilio
http://www.twilio.com/ 2. RaspberryPi Bulk Synchronous Processing: Distributed Data (Graph) Processing Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant/ Abraham Martin) Keywords: Graph Processing, Data Parallel, Bulk
Synchronous Processing This project aims to replicate the benefit of an
array of low power processors backed by persistent storage and graph mining. The
aim is to build a software stack for Raspberry-Pi that allows a set of such
devices, each equipped with either an SD card or a USB flash drive to act as a
node for Bulk Synchronous Processing of graphs (see Pregel [2] for an example of
bulk synchronous processing). All the necessary data structures will reside on
flash with the small 256MB RAM on the Raspberry-PI acting as a scratchpad. This means that a failure of the Raspberry-Pi unit
can be tolerated by simply swapping the flash memory over to another good unit.
However this will need careful integration of a 2-phase commit protocol with
bulk synchronous processing (BSP). One of the other goals of this project is to
integrate a prefetching algorithm we have developed for graphs stored on flash
drives with the software stack built for this project. The final aim will be to
show that when processing time, energy consumption and cost are taken together
this solution is competitive to running BSP on a cluster of PCs. Additionally
you can explore the impact of data locality in such environments. References: [1] D. Andersen, J. Franklin, A. Phanishayee, L. Tan
and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, Communications of the ACM,
July 2011. [2] G. Malewicz, M. Austern, A. Bik, J. Dehnert, I.
Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph
Processing, SIGMOD, 2010.
3. Mobility model in Amusement Parks
for Diseases Spread Study Originator/Supervisor:
Eiko Yoneki
(with Abraham Martin) Keywords: Mobility, Diseases, Android scraping Diseases are easier to spread in closed environments
where a lot of people are together. Examples of this can be a shopping centre,
large working offices, schools, or amusement park. Having a movement model of
large and crowded scenarios like an amusement park is difficult to get using
usual tracking method. One simple approach to get this information is using the
live queues times from the park [1][2][3]. This information together with the
map of the park can provide mobility information between attractions. You will
have to create different scrappers (web, android and/or iOS) to get the queue
information from different parks for several days. With this data together with
the topology of the park (extracted from the park map) you will be able to
create visualizations, diseases spread simulations, etc. References: [1]
http://www.ridetimes.co.uk/ [2]
https://itunes.apple.com/gb/app/thorpe-park-official/id374862991?mt=8 [3]
https://play.google.com/store/apps/details?id=portaventura.android
4.
Graph Compression Originator/Supervisor:
Eiko Yoneki
(with Weixiong Rao) Keywords: Graph Compression This project explores graph compression mechanisms as
part of a project looking into high performance semi-external memory graph
processing (see [1] to get an idea of semi-external memory approaches). The
graph compression work will build on an in-house graph representation format
that we have developed that allows succinct representation of graphs that show
hierarchical structures. The project will explore ways to improve the
representation yielding smaller graphs on disk that are less costly to traverse.
A key element of the representation scheme is a recursive graph partitioning
step that minimises the number of edges between partitions. This is a rich space
for exploration of suitable algorithms. We are concerned primarily with
experimentally evaluating IO costs on large graphs and measuring the compression
performance. However a student with a theoretical background might consider
designing algorithms with provable bounds on compression performance, which
would be a big plus. If time allows you could also implement an efficient
transformation tool based on the developed graph compression algorithm using
parallel processing tools (e.g. map/reduce). References: [1] R. Pearce, M. Gokhale and N. Amato:
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External
Memory, ACM/IEEE High Performance Computing, Networking, Storage and
Analysis, 2010.
http://dl.acm.org/citation.cfm?id=1884675
5. Develop Scale-Out SSD based Graph
Traversal Platform Originator/Supervisor:
Eiko Yoneki
(with Karthik Nilakant) Keywords: Graph processing, I/O optimisation, Graph
structure This project contributes to our ongoing work on high
performance semi-external memory graph processing (see [1]). We have developed
an in-house prefetching algorithm for mining large graphs stored on Solid State
Drives. A solid state drive provides the ability to service many random reads in
parallel, provided that the requesting application is able to deliver requests
at a rate that keeps the drive sufficiently busy. Thus pre-fetching multiple
pages based on the current request in an “intelligent” manner can greatly
improve performance. Our current pre-fetcher implementations are aligned with
the graph traversal approach, (that is the “graph iterator” being employed). Furthermore, the prefetcher implementation could be
extended to utilise multiple storage devices in parallel. The aim in this
setting would be to out-perform standard storage parallelisation techniques such
as hardware or software RAID. The performance of this approach would also depend
on the method of dividing graph data between available storage devices. Your
task is developing scale-out version of our current work. You could also extend
it in distributed graph traversals to study how a large graph (e.g. over 2
billion vertices) could get benefits from scale-out version, even though the
current version can handle large graphs as a single node. You could also explore
generating a large graph for testing by combining various entities of social
network data (e.g. what happens when every "Like", "Comment" and "Photo" on
Facebook are treated as vertices inside a graph -- not just "People" and "Pages"
- such graphs would be huge). References: [1] E. Yoneki and A. Roy: Scale-up Graph Processing:
A Storage-centric View. ACM SIGMOD - GRADES, 2013. [2] GraphLab:
http://graphlab.org/
6. Building Graph Query
Function using Functional Programming Originator/Supervisor: Eiko Yoneki (with
Weixiong Rao) Keywords: Graph, Functional programming Demand to store and search of online data
with graph structure is emerging. Such data range from online social networks to
web links and it requires efficient query processing in a scalable manner. In
this project, you will build a graph query function (layer) to achieve efficient
graph data processing. The graph query function builds on a lazy graph loaded
from multiple sources and performs queries at the same speed or faster than a
relational database. The function should be written in Clojure or another
functional language to support practical lazy loading from data source to an
in-memory graph database. Efficient representations for graph queries should be
also investigated. The evaluation includes comparisons with existing Graph
Database and the query capability comparing with SQL. You can start the project
by our existing work in this area. References: [1] Microsoft Research, “Trinity project:
Distributed graph database,”
http://research.microsoft.com/en-us/projects/trinity/ [2] Neo Technology, “Java graph database,”
http://neo4j.org/ 7. Opinion and Behaviour
Dynamics with Role Game Originator/Supervisor:
Eiko Yoneki
(with Andre Ribeiro) Keywords: Information Cascade, Opinion Dynamics,
Twitter The internet made communication (and content
creation) dramatically faster, cheaper and easier. Some argue that this is an
evolution of media technology we use to solve problems. This project
investigates understanding the communicative behavioural dynamics, information
flow and decision making constrained by the media and information people have
available. Thus, the investigation covers behaviour modelling and analysis from
an opinion dynamics perspective and can provide an answer to questions such as
"do people change their opinion or share information based on something similar
to page-ranking or some game-theoretical algorithm?" or "what factors have the
largest influence?” In this project you would write a role-playing game
to understand the above problem. Millions of people worldwide play interactive
online role-playing games, forming complex and rich networks among their virtual
characters. Such games could project the real world and predict what might
occur. For example, an unexpected outbreak of an infective communicable disease
(unplanned by the game creators) occurred when a role-playing game was played in
the virtual world. This outbreak holds surprising similarities to real world
epidemics [3]. You can write role-playing game application over
mobile devices and web and get insight over the communicative behaviour of the
players under the different conditions (when they choose to communicate? with
whom? For what reason?). The game could be "what's the most popular music?" or
"what's the best mobile phone to buy?" or similar to any popular game in the
market. Reading Vinayak Dalmia's masters thesis on 'Investigating Herding
Behaviour in an Experimental Setup' (Judge Business School) will be a good
starting point; and papers [1][2] and [4] will be good general background
reading. References: [1] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet:
Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010. [2] Christo Wilson, Bryce Boe, Alessandra Sala,
Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks
and their Implications, ACM EuroSys 2009. [3] Balicer, R.:Modeling Infectious Diseases
Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007. [4] Sastry, Yoneki and Crowcroft " Buzztraq:
Predicting geographical access patterns of social cascades using social
networks". EuroSys workshop on Social Network Systems, 2009. 8. The impact of social influence on Information Diffusion - Understanding Social Influence by analysing EEG and Eye Tracking from Cognitive Perspectives Supervisor:
Eiko Yoneki
(with Arman Idani) Keywords: Information Diffusion, Behaviour Model,
Perception, Cognitive Science Social networks like Facebook and Twitter have
brought new entertainment experiences. People or companies publish information
in a social network through the network’s topology and by sharing them with
their neighbours. The published information can either be quickly ignored or
widely spread in the network. The network structure takes an important role for
diffusion processes. However, the extent to which information is propagated
highly depends on social influences that the members have on each other when
each member makes a decision on further propagation of the content. This project investigates the impact of social
influences on information diffusion process from cognitive perspectives.
First, you would analyse the data of Eye tracking and EEG, which we have
collected our experiments and model decision making process from social
influence including perception, and you could extend to analyse online social
network activities for validation of the model. You could compare the different
states of EEG activities with psychological models of decision making. You could
also write your experimental application or online social networks (e.g.
Facebook) to extract unconscious social behaviour in comparison to feedback
information (e.g. how the other people rated, what is the average of rating)
from the participant’s social clique. The analysis requires an understanding of statistical
modelling/machine learning as well as digital signal processing. Prior knowledge
and experience in speech recognition is very useful. References:
[1] Evan T.R. Rosenman: Retweets, but Not Just
Retweets: Quantifying and Predicting Influence on Twitter. Thesis, Harvard
University. [2] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet:
Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010. [3] Balicer, R.:Modeling Infectious Diseases
Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007. [4] Roman Bednarik , Markku Tukiainen, An
eye-tracking methodology for characterizing program comprehension processes,
Proceedings of the 2006 symposium on Eye tracking research & applications, March
27-29, 2006, San Diego, California [5] Hayes, T. R., Petrov, A. A., & Sederberg, P. B.
(2011) A novel method for analyzing sequential eye movements reveals strategic
inuence on Raven's Advanced Progressive Matrices. Journal of Vision, 11 (10:10),
1-11,
http://www.journalofvision.org/11/10/10/. [6] Janis, I. L., & Mann, L. (1977). Decision making:
A psychological analysis of conflict, choice, and commitment. Free Press. [7] Klein, G. A., Orasanu, J. E., Calderwood, R. E.,
& Zsambok, C. E. (1993). Decision making in action: Models and methods. 9. Trust Relationships and Content Filtering Contact: Eiko Yoneki (with Arman Idani) Keywords: Social Trust, Content Filtering It is now challenging for most users to keep up with
the increased amount of information in their news feed in social networks such
as Facebook or Twitter where people have a lot of first-hop connections. In this
project you will investigate the relevance of social trust, defined as the
confidence users have in their first-hop connections in different subjects, in
their reaction towards the contents that they see in their feed. The aim is to
answer questions such as (i) How do users choose which content to care about,
and if it is relevant who is sharing the content with them? (ii) Can the
influence of users on each other's behaviour be inferred? (iii) What makes a
user trustworthy? In this project you need prior knowledge of machine
learning and statistical modelling (ideally experience with inference and
learning of graphical models or structural equation modelling), a programming
language for data collection (Java or Python) and knowledge of statistics (e.g.
R, SPSS, MATLAB). References: [1] Yarden Katz and Jennifer Golbeck. 2006. Social
network-based trust in prioritized default logic. In proceedings of the 21st
national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony
Cohn (Ed.), Vol. 2. AAAI Press 1345-1350. [2] Ugur Kuter and Jennifer Golbeck. 2010. Using
probabilistic confidence models for trust inference in Web-based social
networks. ACM Trans. Internet Technol. 10, 2, Article 8 (June 2010), 23 pages. Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for any question. |