MPhil, Part III, and Part II Project Suggestions (2017-2018)
|
Please contact Eiko Yoneki (email: eiko.yoneki@cl.cam.ac.uk) if you are interested in any project below. 1. Dynamic Task Scheduling on Heterogeneous CPU/GPU Environment using ML for Parallel Processing Contact:
Eiko Yoneki In this project, various aspects of parallel
processing will be explored using a new generation of CPU/GPU integrated board.
We use NVIDIA’s Jetson X2. This new GPUs makes it possible to cluster the GPU
nodes for different scale of parallel processing. Various applications running
over X2 will be investigated including graph processing and neural network
training using the Tensorflow framework. Graph processing can take advantage of
processor heterogeneity to adapt to structural data patterns. The overall aim of
graph processing can be seen as scheduling irregular tasks to optimise
data-parallel heterogeneous graph processing, by analysing the graph at runtime
and dispatching graph elements to appropriate computation. Various task
scheduling strategies over NN applications could be explored such as mixture of
model parallelism and data parallelism. In comparison to the above approach, several types of
GPU machines in cluster computing environment will be explored. Furthermore, our
recent work, BOAT [1], could be used for optimisation of complex parameter
space. [1] V. Dalibard, M. Schaarschmidt, and E. Yoneki:
BOAT: Building Auto-Tuners with Structured Bayesian Optimization, WWW, 2017.
https://github.com/VDalibard/BOAT. 2. Distributed Stochastic Gradient Descent on Naiad Contact:
Eiko Yoneki Naiad is a distributed system framework that allows
for automatic parallelisation of programs in a distributed environment [1]. In
this project, you will use the RUST implementation of NAIAD and build a
distributed implementation of Stochastic Gradient Descent (SGD) [2], which is a
common algorithm used in Machine Learning. The project will provide efficient
parallel processing for SGD in optimised fashion, where different iterations of
the algorithm for different data points can be run in parallel. An efficient
distributed implementation using NAIAD can take advantage of the incremental and
iterative computation. You would also write an application to evaluate the
project. [1] D. Murray, F. McSherry, R. Isaacs, M. Isard, P.
Barham, and M. Abadi. Naiad: a timely dataflow system. SOSP, 2013. [2] Kevin P Murphy. Machine learning: a probabilistic
perspective. MIT press, 2012.
Contact:
Eiko Yoneki (with Michael Schaarschmidt) Stream processing engines such as Apache Storm or
Heron enable real-time distributed event processing. Recently, the concept of
self-regulating stream processing has been introduced to help stream processing
engines adapt to new loads and deal with failures [1]. In prior open source
work, this has been achieved via a set of manually defined rules connecting
certain events in the cluster (e.g. higher load) with certain actions (add nodes
to cluster). The goal of this project is to investigate whether this system can
be improved by learning an end-to-end control mechanism via reinforcement
learning and foregoing manual adjustment rules. A library of robust reinforcement learning algorithms
is available as a starting point [2]. The key research question in this project
is to understand how real time performance metrics in a distributed system can
be leveraged in a reinforcement learning setting. This project requires strong
programming ability in Python/Java, and an interest to learn about emerging
developments in machine learning/reinforcement learning. [1]
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/p1218-floratou.pdf [2]
https://github.com/reinforceio/tensorforce 4. Optimisation of JVM Garbage Collection using ML Contact:
Eiko Yoneki This project extends the Cassandra case study [1,2]
in incremental ways:
1) Cassandra database was
only executed in one setting. We could create an auto-tuner that is able to deal
with a wider variety of settings. This includes:
-
Changing the heap size of the JVM
-
Changing the underlying hardware of Cassandra
-
Changing the underlying hardware of YCSB (Yahoo! Cloud Serving Benchmark)
2) Applying it to other JVM
applications (such as other databases, anything where we have a rigorous
benchmark for throughput and latency)
3) Looking at some of the
other JVM flags (we only tuned three) as well as the other JVM garbage collector For evaluation following two points are evaluated.
1) No GC option as baseline
2) Various GC
implementations (i.e. some are good on certain workloads) [1]
http://www.cl.cam.ac.uk/~ey204/pubs/2017_WWW.pdf [2]
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-900.html 5. Optimising Newral Network Model over Power Consumption Contact:
Eiko Yoneki In this project, you would tune neural network model
itself. Pick one typical Deep Neural Network application (e.g. image
classification), and play with various hyper-parameters, where hyper-parameters
might take substantially different values for the best accuracy or performance.
You would explore various parameters based
experiments and build up a probabilistic model expressing the parameter
relationship applying the BOAT technique (see [1] and [2]). [1]
http://www.cl.cam.ac.uk/~ey204/pubs/2017_WWW.pdf [2]
http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-900.html 6. Building Graph Query Function using Functional Programming Contact:
Eiko Yoneki Demand to store and search of online data with graph
structure is emerging. Such data range from online social networks to web links
which requires efficient query processing in a scalable manner. In this project,
you will build a graph query function (layer) to achieve efficient graph data
processing. The graph query function builds on a lazy graph loaded from multiple
sources and performs queries at the same speed or faster than a relational
database. The function should be written in Clojure or another functional
language to support practical lazy loading from data source to an in-memory
graph database. Efficient representations for graph queries should be also
investigated. The evaluation includes comparisons with existing Graph Database
and the query capability comparing with SQL. You could start the project by our
existing work in this area. References: [1] Microsoft Research, “Trinity project: Distributed
graph database,”
http://research.microsoft.com/en-us/projects/trinity/ [2] Neo Technology, “Java graph database,”
http://neo4j.org/ 7. Unikernel over Raspberry Pi: Building Data Processing Platform Contact:
Eiko Yoneki This project builds a unikernel-based distributed
data processing platform over Raspberry Pi clusters, where mapreduce type of
operation or simple neural network training and inference can be run. A
unikernel is a form of monolithic operating system, but instead of being general
purpose a unikernel is built with a single task in mind. By removing the
unneeded functionality of an operating system, unikernels bring about a variety
of advantages such as extremely small compared to conventional operating
systems. Such small footprint makes movement of code to the distributed nodes
much faster and easier to deal with limited availability of memory in Raspberry
pi. We pursue a project, RaDiCS [2], where unikernel
currently runs over Raspberry pi using QEMU emulator and it can be a starting
point of the project. One of the challenges in this project is optimising the
performance of unikernel in Raspberry pi. Potential applications running over
the built distributed platform will be a type of neural network applications,
where each feature can be trained separately without constant synchronisation
over the layer of NN model, and parallelism using 100s of Paspberry pis can be
deployed. References: [1] Jitsu: Just-In-Time Summoning of Unikernels,
NSDI, 2015. [2] RaDiCS
https://gitlab.com/RaspiUnikernel/RaDiCS. 8.
Approximate
Algorithms Determining Local Clustering Coefficients Anonymously Originator/Supervisor:
Eiko Yoneki (with Amitabha Roy
in Intel Labs) Keywords: Sampling, Approximation, Privacy, Cluster
Coefficient Anonymous social networks are a new phenomenon in
an increasingly privacy conscious world. A natural question to ask in this
setting is whether we can continue to apply known principles of network science
in such settings where neighbourhood information is concealed both between nodes
and external observers. This project is to work on approximate algorithms that
determines clustering coefficient in such settings. Clustering coefficients
provide a way to relatively order nodes by importance in a network, determine
densely connected neighbourhoods and distinguishing social graphs from web
graphs. Algorithms to measure clustering coefficients have hitherto required
global knowledge of graph links. This requires individual nodes in the graph to
give up the identity of their neighbours. This project investigates an algorithm
for estimating the clustering coefficient of a vertex by exchanging only
anonymised set summaries of neighbours, from which it is difficult to reverse
engineer individual links. The bulk of the project will consist of working on
and improving sampling techniques to arrive at accurate local clustering
coefficients without exchanging explicit neighbour lists in social networks . [1] P. Flajolet, Eric Fusy, O. Gandouet, and et al.
Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In
Proceedings of the International Conference on Analysis of Algorithms, 2007.
9.
RasPi-Net: Building Stream Data Processing Platform
over RasPiNET Originator/Supervisor:
Eiko Yoneki Keywords: Raspberry Pi, Delay Tolerant Networks,
Satellite Communication, Stream Processing We have built a decentralised Raspberry Pi network
(RasPiNET [1]), which can be deployed in wild and remote regions as a standalone
network. The gateway Raspberry Pi nodes are integrated with satellite
communication devices, where the light version of Delay Tolerant Network (DTN)
bundle protocol is embedded. RasPiNET could consist of 10-100 nodes. As an
example, a remote sensing application could be written either in RasPi or Smart
phones that can connect to RasPi. Collected data could be processed within
RasPiNET to reduce data size that streams over the satellite communication to
the base location. The crowd sourcing application can run on top of RasPiNET,
too. The goal of this project is building a stream processing platform in both
directions: from data collection from RasPiNET nodes to the data processing
nodes possibly via a satellite gateway and from bulk of data delivery to the
satellite gateway node to disseminate necessary information to RasPiNET nodes. A
good filtering function and RasPiNET in-network data aggregation could be
developed. References: [1] E. Yoneki:
RasPiNET: Decentralised Communication and
Sensing Platform with Satellite Connectivity. ACM CHANTS, 2014. [2] Delay Tolerant Network Bundle Protocol:
http://tools.ietf.org/html/rfc6255 [3] RockBlock Technology:http://rockblock.rock7mobile.com/
10.
Clustering Entities across
Multiple Documents in Massive Scale Originator/Supervisor:
Eiko Yoneki Keywords: Clustering, Graph Partitioning, Random
Walk, Distributed Algorithms Many large-scale distributed problems including the
optimal storage of large sets of graph structured data over several hosts
- a key
problem in today’s Cloud infrastructure. However, in very large-scale
distributed scenarios, state-of-the-art algorithms are not directly applicable,
because frequent global operations over the entire graph are difficult. In [1],
balanced graph partitioning is achieved by a fully distributed an algorithm,
called Ja-be-Ja, that uses local search and simulated annealing techniques for
graph partitioning annealing techniques for graph partitioning. The algorithm is
massively parallel: each node is processed independently, and only the direct
neighbours of the nodes and a small subset of random nodes in the graph need to
be known locally. Strict synchronisation is not required. These features allow
Ja-be-Ja to be easily adapted to any distributed graph processing system. This
project starts understanding Ja-be-Ja as a starting point, and investigates
further performance improvement. A case study: a graph-based approach to
coreference resolution, where a graph representation of the documents and their
context is used and applying a community detection algorithm based in [1] can
speed up the task of coreference resolution by a very large degree. [1] Fatemeh Rahimian, Amir H. Payberah, Sarunas
Girdzijauskas, Mark Jelasity and Seif Haridi: JabeJa: A Distributed
Algorithm for Balanced Graph Partitioning, IEEE International Conference
on Self-Adaptive and Self-Organizing Systems (SASO), 2013. [2] Fatemeh Rahimian, Amir H. Payberah, Sarunas
Girdzijauskas, and Seif Haridi: Distributed Vertex-Cut Partitioning,
DAIS, 2014. Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk for any question. |