GraphCam: Fast, Flexible, and Programmable Graph Processing
This project is a joint project between the Computer Laboratory and EPFL in Switzerland.
There is a shift that massive data forms networks and graph processing has become popular for various applications on the increasingly large power distribution networks, internet backbone, social networks, ground transportation, and protein interaction networks and so forth. Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for large-scale graph problems.
This project GraphCam tackles to understand the difficulties and bottlenecks in large scale graph processing ranging from graph algorithm design to graph-parallel computing. The project aims at demonstrating how parallel graph algorithms in different scale such as devices level to cluster computing and build a framework to integrate various approached under GraphCam Domain Specific Language (DSL) designed for large scale graph processing that can be implemented on a variety of architectures. In particular, the following properties of graph problems present significant challenges for efficient parallelism.
Current Focus: Storage Centric View
The determinant of performance in scale-up graph processing on a single system is the speed at which the graph can be fetched from storage: either from disk into memory or from memory into CPU-cache. Algorithms that follow edges perform random accesses to the storage medium for the graph and this can often be the determinant of performance, regardless of the algorithmic complexity or runtime efficiency of the actual algorithm in use. A storage-centric viewpoint would suggest that the solution to this problem lies in recognizing that graphs represent a unique workload and therefore should be treated as such by adopting novel ways to access graph structured data. We approach this problem from two different aspects.
One approach is specific to graphs stored on SSDs and accelerates random access using a novel prefetcher called RASP. The second approach takes a fresh look at how graphs are accessed and suggests that trading off the low cost of random access for the approach of sequentially streaming a large set of (potentially unrelated) edges can be a winning proposition under certain circumstances: leading to a system for graphs stored on any medium (main-memory, SSD or magnetic disk) called Xstream. RASP and X-stream therefore take - diametrically opposite - storage centric viewpoints of the graph processing problem.
See  for comparison of the above two approaches. Our plan is development of an online algorithm that selects between the two approaches, possibly providing the best of both worlds.
Source Code Repository
 E. Yoneki and A. Roy Scale-up Graph Processing: A Storage-centric View. ACM SIGMOD - GRADES, New York, USA, June, 2013 (PDF).
 A. Roy, K. Nilakant, V. Dalibard, and Eiko Yoneki Mitigating I/O latency in SSD-based Graph Traversal. Technical Report, University of Cambridge, 2012 (UCAM-CL-TR-823).
 E. Yoneki and Amitabha Roy A Unified Graph Query Layer for Multiple Databases. Technical Report, University of Cambridge, 2012 (UCAM-CL-TR-820).
 A. Roy, I. Mihailovic, and W. Zwaenepoel X-Stream: Edge-centric Graph Processing using Streaming Partitions, SOSP 2013.
 K. Nilakant and E. Yoneki: On the Efficacy of APUs for Heterogeneous Graph Computation. EuroSys - SMFA, Amsterdam, April, 2014 (PDF). K. Nilakant, V. Dalibard, A. Roy, and E. Yoneki: PrefEdge: SSD Prefetcher for Large-Scale Graph Traversal. ACM Internataional Systems and Storage Conference (SYSTOR), June, 2014 (PDF).
 I. Giechaskiel: Distributed Massive Graph Triangulation, MPhil ACS Thesis, 2014 (PDF).
[P1] K. Nilakant and E. Yoneki Active Data Management for Graph Processing. Poster EuroSys 2013.
[P2] V. Dalibard and E. Yoneki
Optimizing Graph Computations: Trading Communications for
Poster EuroSys 2013.
[P3] E. Yoneki, K. Nilakant, V. Dalibard and A. Roy RASP: Large-Scale Graph Traversal with SSD Prefetching. Poster MSR Big Data Analytics, 2013.
[P4] E. Yoneki, K. Nilakant, V. Dalibard and A. Roy SAKYOMI: SSD Prefetcher for Large-Scale Graph Traversal. Poster, SOSP, 2013.
Please email to firstname.lastname@example.org.