GraphCam: Fast, Flexible, and Programmable Graph Processing

News!
OverviewThis project is a joint project between the Computer Laboratory and EPFL in Switzerland. There is a shift that massive data forms networks and
graph processing has become popular for various applications on the increasingly
large power distribution networks, internet backbone, social networks, ground
transportation, and protein interaction networks and so forth. Graph algorithms
are becoming increasingly important for solving many problems in scientific
computing, data mining and other domains. As these problems grow in scale,
parallel computing resources are required to meet their computational and memory
requirements. Unfortunately, the algorithms, software, and hardware that have
worked well for developing mainstream parallel scientific applications are not
necessarily effective for largescale graph problems. This project GraphCam tackles to understand the
difficulties and bottlenecks in large scale graph processing ranging from graph
algorithm design to graphparallel computing. The project aims at demonstrating
how parallel graph algorithms in different scale such as devices level to
cluster computing and build a framework to integrate various approached under
GraphCam Domain Specific Language (DSL) designed for large scale graph
processing that can be implemented on a variety of architectures. In particular,
the following properties of graph problems present significant challenges for
efficient parallelism.
Current Focus: Storage Centric ViewThe determinant of performance in scaleup graph processing on a single system is the speed at which the graph can be fetched from storage: either from disk into memory or from memory into CPUcache. Algorithms that follow edges perform random accesses to the storage medium for the graph and this can often be the determinant of performance, regardless of the algorithmic complexity or runtime efficiency of the actual algorithm in use. A storagecentric viewpoint would suggest that the solution to this problem lies in recognizing that graphs represent a unique workload and therefore should be treated as such by adopting novel ways to access graph structured data. We approach this problem from two different aspects. One approach is specific to graphs stored on SSDs and accelerates random access using a novel prefetcher called RASP. The second approach takes a fresh look at how graphs are accessed and suggests that trading off the low cost of random access for the approach of sequentially streaming a large set of (potentially unrelated) edges can be a winning proposition under certain circumstances: leading to a system for graphs stored on any medium (mainmemory, SSD or magnetic disk) called Xstream. RASP and Xstream therefore take  diametrically opposite  storage centric viewpoints of the graph processing problem. See [1] for comparison of the above two approaches. Our plan is development of an online algorithm that selects between the two approaches, possibly providing the best of both worlds. Source Code Repository
Related Projects
Related Publications[1] E. Yoneki and A. Roy Scaleup Graph Processing: A Storagecentric View. ACM SIGMOD  GRADES, New York, USA, June, 2013 (PDF). [2] A. Roy, K. Nilakant, V. Dalibard, and Eiko Yoneki Mitigating I/O latency in SSDbased Graph Traversal. Technical Report, University of Cambridge, 2012 (UCAMCLTR823). [3] E. Yoneki and Amitabha Roy A Unified Graph Query Layer for Multiple Databases. Technical Report, University of Cambridge, 2012 (UCAMCLTR820). [4] A. Roy, I. Mihailovic, and W. Zwaenepoel XStream: Edgecentric Graph Processing using Streaming Partitions, SOSP 2013. [5] K. Nilakant and E. Yoneki: On the Efficacy of APUs for Heterogeneous Graph Computation. EuroSys  SMFA, Amsterdam, April, 2014 (PDF). [6] K. Nilakant, V. Dalibard, A. Roy, and E. Yoneki: PrefEdge: SSD Prefetcher for LargeScale Graph Traversal. ACM Internataional Systems and Storage Conference (SYSTOR), June, 2014 (PDF). [7] I. Giechaskiel: Distributed Massive Graph Triangulation, MPhil ACS Thesis, 2014 (PDF). POSTERS: [P1] K. Nilakant and E. Yoneki Active Data Management for Graph Processing. Poster EuroSys 2013. [P2] V. Dalibard and E. Yoneki
Optimizing Graph Computations: Trading Communications for
Computations.
Poster EuroSys 2013. [P3] E. Yoneki, K. Nilakant, V. Dalibard and A. Roy RASP: LargeScale Graph Traversal with SSD Prefetching. Poster MSR Big Data Analytics, 2013.
[P4] E. Yoneki, K. Nilakant, V. Dalibard and
A. Roy
SAKYOMI: SSD Prefetcher for LargeScale Graph Traversal.
Poster, SOSP, 2013. Members
Contact EmailPlease email to eiko.yoneki@cl.cam.ac.uk. 