MPhil, Part III, and Part II Project Suggestions (2013-2014)

Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below.

1. RasPi-Net: Building a Distributed Raspberry Pi Network together with Satellite Communication

Originator/Supervisor: Eiko Yoneki (with Abraham Martin)

Keywords: Raspberry Pi, Delay Tolerant Networks, Satellite Communication, Stream Processing

This project aims to build a decentralised Raspberry Pi network (Opportunistic Networks), which can be deployed in wild and remote regions. The gateway Raspberry Pi will be integrated with a satellite communication devices, where the light version of Delay Tolerant Network (DTN) bundle protocol will be embedded. RasPi-Net could consist of 10-100 nodes. As an example, a remote sensing application could be written either in RasPi or Smart phones that can connect to RasPi. Collected data could be processed within Ras-Pi Net to reduce data size that streams over the satellite communication to the base location. The crowd sourcing application can run on top of RasPi-Net, too. You could write a specific application to help in a disaster situation or build a typical streaming platform.

Example – Communication in disaster situation. It is usual in emergency situations that other means of communication are inoperative (GSM, DSL, etc) in the gateway node. In this project you will have to develop a mobile application (Android and/or iOS) that is able to communicate with the RasPi-Net via SMS using the satellite connection and a SMS web service like Twilio [2]. The communication between the Raspberry-Pi and the mobile phone will use Wi-Fi or a 3G femtocell which provide the best range of communication for mobile devices. This application can be extended to include Twitter or Facebook messages using their corresponding APIs.

References:

[1] Delay Tolerant Network Bundle Protocol: http://tools.ietf.org/html/rfc6255

[2] RockBlock Technology:http://rockblock.rock7mobile.com/

[3] Twilio http://www.twilio.com/

2. RaspberryPi Bulk Synchronous Processing: Distributed Data (Graph) Processing

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant/ Abraham Martin)

Keywords: Graph Processing, Data Parallel, Bulk Synchronous Processing

This project aims to replicate the benefit of an array of low power processors backed by persistent storage and graph mining. The aim is to build a software stack for Raspberry-Pi that allows a set of such devices, each equipped with either an SD card or a USB flash drive to act as a node for Bulk Synchronous Processing of graphs (see Pregel [2] for an example of bulk synchronous processing). All the necessary data structures will reside on flash with the small 256MB RAM on the Raspberry-PI acting as a scratchpad.

This means that a failure of the Raspberry-Pi unit can be tolerated by simply swapping the flash memory over to another good unit. However this will need careful integration of a 2-phase commit protocol with bulk synchronous processing (BSP). One of the other goals of this project is to integrate a prefetching algorithm we have developed for graphs stored on flash drives with the software stack built for this project. The final aim will be to show that when processing time, energy consumption and cost are taken together this solution is competitive to running BSP on a cluster of PCs. Additionally you can explore the impact of data locality in such environments.

References:

[1] D. Andersen, J. Franklin, A. Phanishayee, L. Tan and V. Vasudevan: FAWN: A Fast Array of Wimpy Nodes, Communications of the ACM, July 2011.

[2] G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski: Pregel: A System for Large-Scale Graph Processing, SIGMOD, 2010.

3. Mobility model in Amusement Parks for Diseases Spread Study

Originator/Supervisor: Eiko Yoneki (with Abraham Martin)

Keywords: Mobility, Diseases, Android scraping

Diseases are easier to spread in closed environments where a lot of people are together. Examples of this can be a shopping centre, large working offices, schools, or amusement park. Having a movement model of large and crowded scenarios like an amusement park is difficult to get using usual tracking method. One simple approach to get this information is using the live queues times from the park [1][2][3]. This information together with the map of the park can provide mobility information between attractions. You will have to create different scrappers (web, android and/or iOS) to get the queue information from different parks for several days. With this data together with the topology of the park (extracted from the park map) you will be able to create visualizations, diseases spread simulations, etc.

References:

[1] http://www.ridetimes.co.uk/

[2] https://itunes.apple.com/gb/app/thorpe-park-official/id374862991?mt=8

[3] https://play.google.com/store/apps/details?id=portaventura.android

4. Graph Compression

Originator/Supervisor: Eiko Yoneki (with Weixiong Rao)

Keywords: Graph Compression

This project explores graph compression mechanisms as part of a project looking into high performance semi-external memory graph processing (see [1] to get an idea of semi-external memory approaches). The graph compression work will build on an in-house graph representation format that we have developed that allows succinct representation of graphs that show hierarchical structures. The project will explore ways to improve the representation yielding smaller graphs on disk that are less costly to traverse. A key element of the representation scheme is a recursive graph partitioning step that minimises the number of edges between partitions. This is a rich space for exploration of suitable algorithms. We are concerned primarily with experimentally evaluating IO costs on large graphs and measuring the compression performance. However a student with a theoretical background might consider designing algorithms with provable bounds on compression performance, which would be a big plus. If time allows you could also implement an efficient transformation tool based on the developed graph compression algorithm using parallel processing tools (e.g. map/reduce).

References:

[1] R. Pearce, M. Gokhale and N. Amato: Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory, ACM/IEEE High Performance Computing, Networking, Storage and Analysis, 2010. http://dl.acm.org/citation.cfm?id=1884675

5. Develop Scale-Out SSD based Graph Traversal Platform

Originator/Supervisor: Eiko Yoneki (with Karthik Nilakant)

Keywords: Graph processing, I/O optimisation, Graph structure

This project contributes to our ongoing work on high performance semi-external memory graph processing (see [1]). We have developed an in-house prefetching algorithm for mining large graphs stored on Solid State Drives. A solid state drive provides the ability to service many random reads in parallel, provided that the requesting application is able to deliver requests at a rate that keeps the drive sufficiently busy. Thus pre-fetching multiple pages based on the current request in an “intelligent” manner can greatly improve performance. Our current pre-fetcher implementations are aligned with the graph traversal approach, (that is the “graph iterator” being employed).

Furthermore, the prefetcher implementation could be extended to utilise multiple storage devices in parallel. The aim in this setting would be to out-perform standard storage parallelisation techniques such as hardware or software RAID. The performance of this approach would also depend on the method of dividing graph data between available storage devices. Your task is developing scale-out version of our current work. You could also extend it in distributed graph traversals to study how a large graph (e.g. over 2 billion vertices) could get benefits from scale-out version, even though the current version can handle large graphs as a single node. You could also explore generating a large graph for testing by combining various entities of social network data (e.g. what happens when every "Like", "Comment" and "Photo" on Facebook are treated as vertices inside a graph -- not just "People" and "Pages" - such graphs would be huge).

References:

[1] E. Yoneki and A. Roy: Scale-up Graph Processing: A Storage-centric View. ACM SIGMOD - GRADES, 2013.

[2] GraphLab: http://graphlab.org/

6. Building Graph Query Function using Functional Programming

Originator/Supervisor: Eiko Yoneki (with Weixiong Rao)

Keywords: Graph, Functional programming

Demand to store and search of online data with graph structure is emerging. Such data range from online social networks to web links and it requires efficient query processing in a scalable manner. In this project, you will build a graph query function (layer) to achieve efficient graph data processing. The graph query function builds on a lazy graph loaded from multiple sources and performs queries at the same speed or faster than a relational database. The function should be written in Clojure or another functional language to support practical lazy loading from data source to an in-memory graph database. Efficient representations for graph queries should be also investigated. The evaluation includes comparisons with existing Graph Database and the query capability comparing with SQL. You can start the project by our existing work in this area.

References:

[1] Microsoft Research, “Trinity project: Distributed graph database,” http://research.microsoft.com/en-us/projects/trinity/

[2] Neo Technology, “Java graph database,” http://neo4j.org/

7. Opinion and Behaviour Dynamics with Role Game

Originator/Supervisor: Eiko Yoneki (with Andre Ribeiro)

Keywords: Information Cascade, Opinion Dynamics, Twitter

The internet made communication (and content creation) dramatically faster, cheaper and easier. Some argue that this is an evolution of media technology we use to solve problems. This project investigates understanding the communicative behavioural dynamics, information flow and decision making constrained by the media and information people have available. Thus, the investigation covers behaviour modelling and analysis from an opinion dynamics perspective and can provide an answer to questions such as "do people change their opinion or share information based on something similar to page-ranking or some game-theoretical algorithm?" or "what factors have the largest influence?”

In this project you would write a role-playing game to understand the above problem. Millions of people worldwide play interactive online role-playing games, forming complex and rich networks among their virtual characters. Such games could project the real world and predict what might occur. For example, an unexpected outbreak of an infective communicable disease (unplanned by the game creators) occurred when a role-playing game was played in the virtual world. This outbreak holds surprising similarities to real world epidemics [3].

You can write role-playing game application over mobile devices and web and get insight over the communicative behaviour of the players under the different conditions (when they choose to communicate? with whom? For what reason?). The game could be "what's the most popular music?" or "what's the best mobile phone to buy?" or similar to any popular game in the market. Reading Vinayak Dalmia's masters thesis on 'Investigating Herding Behaviour in an Experimental Setup' (Judge Business School) will be a good starting point; and papers [1][2] and [4] will be good general background reading.

References:

[1] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.

[2] Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy and Ben Y. Zhao, User Interactions in Social Networks and their Implications, ACM EuroSys 2009.

[3] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.

[4] Sastry, Yoneki and Crowcroft " Buzztraq: Predicting geographical access patterns of social cascades using social networks". EuroSys workshop on Social Network Systems, 2009.

8. The impact of social influence on Information Diffusion - Understanding Social Influence by analysing EEG and Eye Tracking from Cognitive Perspectives

Supervisor: Eiko Yoneki (with Arman Idani)

Keywords: Information Diffusion, Behaviour Model, Perception, Cognitive Science

Social networks like Facebook and Twitter have brought new entertainment experiences. People or companies publish information in a social network through the network’s topology and by sharing them with their neighbours. The published information can either be quickly ignored or widely spread in the network. The network structure takes an important role for diffusion processes. However, the extent to which information is propagated highly depends on social influences that the members have on each other when each member makes a decision on further propagation of the content.

This project investigates the impact of social influences on information diffusion process from cognitive perspectives. First, you would analyse the data of Eye tracking and EEG, which we have collected our experiments and model decision making process from social influence including perception, and you could extend to analyse online social network activities for validation of the model. You could compare the different states of EEG activities with psychological models of decision making. You could also write your experimental application or online social networks (e.g. Facebook) to extract unconscious social behaviour in comparison to feedback information (e.g. how the other people rated, what is the average of rating) from the participant’s social clique.

The analysis requires an understanding of statistical modelling/machine learning as well as digital signal processing. Prior knowledge and experience in speech recognition is very useful.

References:

[1] Evan T.R. Rosenman: Retweets, but Not Just Retweets: Quantifying and Predicting Influence on Twitter. Thesis, Harvard University.

[2] Boyd, Golder, and Lotan, Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter, HICSS-43, 2010.

[3] Balicer, R.:Modeling Infectious Diseases Dissemination Through Online Role-Playing Games. Epidemiology, 18(2), 2007.

[4] Roman Bednarik , Markku Tukiainen, An eye-tracking methodology for characterizing program comprehension processes, Proceedings of the 2006 symposium on Eye tracking research & applications, March 27-29, 2006, San Diego, California

[5] Hayes, T. R., Petrov, A. A., & Sederberg, P. B. (2011) A novel method for analyzing sequential eye movements reveals strategic inuence on Raven's Advanced Progressive Matrices. Journal of Vision, 11 (10:10), 1-11, http://www.journalofvision.org/11/10/10/.

[6] Janis, I. L., & Mann, L. (1977). Decision making: A psychological analysis of conflict, choice, and commitment. Free Press.

[7] Klein, G. A., Orasanu, J. E., Calderwood, R. E., & Zsambok, C. E. (1993). Decision making in action: Models and methods.

9. Trust Relationships and Content Filtering

Contact: Eiko Yoneki (with Arman Idani)

Keywords: Social Trust, Content Filtering

It is now challenging for most users to keep up with the increased amount of information in their news feed in social networks such as Facebook or Twitter where people have a lot of first-hop connections. In this project you will investigate the relevance of social trust, defined as the confidence users have in their first-hop connections in different subjects, in their reaction towards the contents that they see in their feed. The aim is to answer questions such as (i) How do users choose which content to care about, and if it is relevant who is sharing the content with them? (ii) Can the influence of users on each other's behaviour be inferred? (iii) What makes a user trustworthy?

In this project you need prior knowledge of machine learning and statistical modelling (ideally experience with inference and learning of graphical models or structural equation modelling), a programming language for data collection (Java or Python) and knowledge of statistics (e.g. R, SPSS, MATLAB).

References:

[1] Yarden Katz and Jennifer Golbeck. 2006. Social network-based trust in prioritized default logic. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1345-1350.

[2] Ugur Kuter and Jennifer Golbeck. 2010. Using probabilistic confidence models for trust inference in Web-based social networks. ACM Trans. Internet Technol. 10, 2, Article 8 (June 2010), 23 pages.

Contact Email

Please email to eiko.yoneki@cl.cam.ac.uk for any question.

Computer Laboratory

MPhil, Part III, and Part II Project Suggestions (2013-2014)

Contact Email