Computer Laboratory

Technical reports

PDTL: Parallel and distributed triangle listing for massive graphs

Ilias Giechaskiel, George Panagopoulos, Eiko Yoneki

April 2015, 14 pages

Abstract

This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics. The majority of existing algorithms for massive graphs are sequential processing and distributed versions of algorithms do not guarantee their CPU, I/O, Memory or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting subgraphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. As a result, PDTL suits a variety of computational environments, from single-core machines to high-end clusters. PDTL computes the exact triangle count on graphs of over 6 billion edges and 1 billion vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems PowerGraph, OPT, and PATRIC by 2 times to 4 times. Our approach highlights the importance of I/O considerations in a distributed environment, which has received less attention in the graph processing literature.

Full text

PDF (0.5 MB)

BibTeX record

@TechReport{UCAM-CL-TR-866,
  author =	 {Giechaskiel, Ilias and Panagopoulos, George and Yoneki,
          	  Eiko},
  title = 	 {{PDTL: Parallel and distributed triangle listing for
         	   massive graphs}},
  year = 	 2015,
  month = 	 apr,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-866.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-866}
}