MPhil, Part III, and Part II Project Suggestions (2016-2017)
|
Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. 1. Graph Processing Optimisation using Structured Bayesian Optimisation Originator/Supervisor:
Eiko Yoneki Keywords: Machine Learning, Graph Processing, Optimisation Graph processing can take advantage of processor
heterogeneity (i.e. CPU/GPU…) to adapt to structural data patterns. The overall
aim of graph processing can be seen as scheduling irregular tasks to optimise
data-parallel heterogeneous graph processing, and in this project, optimal
dispatching graph elements to appropriate computation units will be explored
using our recent work on the extension of Bayesian Optimisation. The
optimisation platform will be provided, and the task in this project is building
a graph specific auto-tuner on top of it, which requires a good knowledge of
graph algorithms and understanding processing patterns and computer system’s
task scheduling. [1] K. Nilakant and E. Yoneki: On the Efficacy of
APUs for Heterogeneous Graph Computation. EuroSys - SMFA, 2014 (http://www.cl.cam.ac.uk/~ey204/pubs/2014_SYSTOR.pdf). [2] Dalibard, V., Yoneki, E.: Tuning Computer Systems
with Structured Bayesian Optimization (http://www.cl.cam.ac.uk/~ey204/pubs/Abstract/2016_EUROSYS_Abstract.pdf).
2. Distributed Stochastic Gradient Descent on Naiad Originator/Supervisor:
Eiko Yoneki Keywords: Data flow programming, Machine Learning,
Parallel Computing in Distributed Systems Naiad is a distributed system framework that allows
for automatic parallelisation of programs in a distributed environment [1]. In
this project, you will use the RUST implementation of NAIAD and build a
distributed implementation of Stochastic Gradient Descent (SGD) [2], which is
common algorithms used in Machine Learning. The project will provide efficient
parallel processing for SGD in optimised fashion, where different iterations of
the algorithm for different data points can be run in parallel. An efficient
distributed implementation using NAIAD can take advantage of the incremental and
iterative computation. You would also write an application to evaluate the
project. [1] D. Murray, F. McSherry, R. Isaacs, M. Isard, P.
Barham, and M. Abadi. Naiad: a timely dataflow system. SOSP, 2013. [2] Kevin P Murphy. Machine learning: a probabilistic
perspective. MIT press, 2012.
3.
Building Graph Query Function using Functional
Programming Originator/Supervisor: Eiko Yoneki Keywords: Graph, Functional programming Demand to store and search of online data with
graph structure is emerging. Such data range from online social networks to web
links and it requires efficient query processing in a scalable manner. In this
project, you will build a graph query function (layer) to achieve efficient
graph data processing. The graph query function builds on a lazy graph loaded
from multiple sources and performs queries at the same speed or faster than a
relational database. The function should be written in Clojure or another
functional language to support practical lazy loading from data source to an
in-memory graph database. Efficient representations for graph queries should be
also investigated. The evaluation includes comparisons with existing Graph
Database and the query capability comparing with SQL. You can start the project
by our existing work in this area. References: [1] Microsoft Research, Trinity project:
Distributed graph database,
http://research.microsoft.com/en-us/projects/trinity/ [2] Neo Technology, Java graph database,
http://neo4j.org/ 4.
Dynamic Task
Scheduling on Heterogeneous CPU/GPU Environment using Machine Learning
Technique for Parallel Processing Originator/Supervisor:
Eiko Yoneki (with Anton
Lokhmotov@dividiti.com) Keywords: GPU Clusters, Heterogeneous
many/multi-core, Parallel Computing, OpenCL, Task Scheduling In this project, various aspects of parallel
processing will be explored using a new generation of CPU/GPU integrated board,
where more than one GPU clusters are placed on a chip. We use ARM based
Mali-T628 MP6, in Exynos 5422 [1] [2], which has two clusters, of four (MP4) and
two cores (MP2). Using OpenCL, tasks can be dispatched to GPU and CPU code in
parallel. This new GPUs makes it possible to cluster the GPU nodes for different
scale of parallel processing. We use a simulator on top of the
hardware to experiment various task scheduling strategies explored by the
machine learning methodologies for prediction of workload, vector instructions,
and mixture of model parallelism and data parallelism. Application running on top could be image analysis or
graph processing. Graph processing can take advantage of processor
heterogeneity to adapt to structural data patterns. Efficient
scheduling underlies the vision of a heterogeneous runtime platform for graph
computation. [1]
http://www.anandtech.com/show/8234/arms-mali-midgard-architecture-explored [2]
www.samsung.com/global/business/semiconductor/product/application/detail?productId=7978&iaId=2341
5.
Approximate
Algorithms Determining Local Clustering Coefficients Anonymously Originator/Supervisor:
Eiko Yoneki (with Amitabha Roy
in Intel Labs) Keywords: Sampling, Approximation, Privacy, Cluster
Coefficient Anonymous social networks are a new phenomenon in
an increasingly privacy conscious world. A natural question to ask in this
setting is whether we can continue to apply known principles of network science
in such settings where neighbourhood information is concealed both between nodes
and external observers. This project is to work on approximate algorithms that
determines clustering coefficient in such settings. Clustering coefficients
provide a way to relatively order nodes by importance in a network, determine
densely connected neighbourhoods and distinguishing social graphs from web
graphs. Algorithms to measure clustering coefficients have hitherto required
global knowledge of graph links. This requires individual nodes in the graph to
give up the identity of their neighbours. This project investigates an algorithm
for estimating the clustering coefficient of a vertex by exchanging only
anonymised set summaries of neighbours, from which it is difficult to reverse
engineer individual links. The bulk of the project will consist of working on
and improving sampling techniques to arrive at accurate local clustering
coefficients without exchanging explicit neighbour lists in social networks . [1] P. Flajolet, Eric Fusy, O. Gandouet, and et al.
Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In
Proceedings of the International Conference on Analysis of Algorithms, 2007.
6.
RasPi-Net: Building Stream Data Processing Platform
over RasPiNET Originator/Supervisor:
Eiko Yoneki Keywords: Raspberry Pi, Delay Tolerant Networks,
Satellite Communication, Stream Processing We have built a decentralised Raspberry Pi network
(RasPiNET [1]), which can be deployed in wild and remote regions as a standalone
network. The gateway Raspberry Pi nodes are integrated with satellite
communication devices, where the light version of Delay Tolerant Network (DTN)
bundle protocol is embedded. RasPiNET could consist of 10-100 nodes. As an
example, a remote sensing application could be written either in RasPi or Smart
phones that can connect to RasPi. Collected data could be processed within
RasPiNET to reduce data size that streams over the satellite communication to
the base location. The crowd sourcing application can run on top of RasPiNET,
too. The goal of this project is building a stream processing platform in both
directions: from data collection from RasPiNET nodes to the data processing
nodes possibly via a satellite gateway and from bulk of data delivery to the
satellite gateway node to disseminate necessary information to RasPiNET nodes. A
good filtering function and RasPiNET in-network data aggregation could be
developed. References: [1] E. Yoneki:
RasPiNET: Decentralised Communication and
Sensing Platform with Satellite Connectivity. ACM CHANTS, 2014. [2] Delay Tolerant Network Bundle Protocol:
http://tools.ietf.org/html/rfc6255 [3] RockBlock Technology:http://rockblock.rock7mobile.com/
7.
Clustering Entities across
Multiple Documents in Massive Scale Originator/Supervisor:
Eiko Yoneki Keywords: Clustering, Graph Partitioning, Random
Walk, Distributed Algorithms Many large-scale distributed problems including the
optimal storage of large sets of graph structured data over several hosts
- a key
problem in today’s Cloud infrastructure. However, in very large-scale
distributed scenarios, state-of-the-art algorithms are not directly applicable,
because frequent global operations over the entire graph are difficult. In [1],
balanced graph partitioning is achieved by a fully distributed an algorithm,
called Ja-be-Ja, that uses local search and simulated annealing techniques for
graph partitioning annealing techniques for graph partitioning. The algorithm is
massively parallel: each node is processed independently, and only the direct
neighbours of the nodes and a small subset of random nodes in the graph need to
be known locally. Strict synchronisation is not required. These features allow
Ja-be-Ja to be easily adapted to any distributed graph processing system. This
project starts understanding Ja-be-Ja as a starting point, and investigates
further performance improvement. A case study: a graph-based approach to
coreference resolution, where a graph representation of the documents and their
context is used and applying a community detection algorithm based in [1] can
speed up the task of coreference resolution by a very large degree. [1] Fatemeh Rahimian, Amir H. Payberah, Sarunas
Girdzijauskas, Mark Jelasity and Seif Haridi: JabeJa: A Distributed
Algorithm for Balanced Graph Partitioning, IEEE International Conference
on Self-Adaptive and Self-Organizing Systems (SASO), 2013. [2] Fatemeh Rahimian, Amir H. Payberah, Sarunas
Girdzijauskas, and Seif Haridi: Distributed Vertex-Cut Partitioning,
DAIS, 2014.
8.
Graph Compression in the Semi-External Memory Environment Originator/Supervisor: Eiko Yoneki Keywords: Graph Compression This project explores graph compression mechanisms as
part of a project looking into high performance semi-external memory graph
processing (see [1] to get an idea of semi-external memory approaches). The
graph compression work will build on an in-house graph representation format
that we have developed that allows succinct representation of graphs that show
hierarchical structures. The project will explore ways to improve the
representation yielding smaller graphs on disk that are less costly to traverse.
A key element of the representation scheme is a recursive graph partitioning
step that minimises the number of edges between partitions. This is a rich space
for exploration of suitable algorithms. We are concerned primarily with
experimentally evaluating I/O costs on large graphs and measuring the
compression performance. However a student with a theoretical background might
consider designing algorithms with provable bounds on compression performance,
which would be a big plus. If time allows you could also implement an efficient
transformation tool based on the developed graph compression algorithm using
parallel processing tools (e.g. map/reduce). References: [1] R. Pearce, M. Gokhale and N. Amato: Multithreaded
Asynchronous Graph Traversal for In-Memory and Semi-External Memory, ACM/IEEE
High Performance Computing, Networking, Storage and Analysis, 2010.
http://dl.acm.org/citation.cfm?id=1884675
Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for any question. |