Department of Computer Science and Technology

Technical reports

PDTL: Parallel and distributed triangle listing for massive graphs

Ilias Giechaskiel, George Panagopoulos, Eiko Yoneki

April 2015, 14 pages

DOI: 10.48456/tr-866

Abstract

This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics. The majority of existing algorithms for massive graphs are sequential processing and distributed versions of algorithms do not guarantee their CPU, I/O, Memory or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting subgraphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.’s Massive Graph Triangulation algorithm. As a result, PDTL suits a variety of computational environments, from single-core machines to high-end clusters. PDTL computes the exact triangle count on graphs of over 6 billion edges and 1 billion vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems PowerGraph, OPT, and PATRIC by 2 times to 4 times. Our approach highlights the importance of I/O considerations in a distributed environment, which has received less attention in the graph processing literature.

Full text

PDF (0.5 MB)

BibTeX record

@TechReport{UCAM-CL-TR-866,
  author =	 {Giechaskiel, Ilias and Panagopoulos, George and Yoneki,
          	  Eiko},
  title = 	 {{PDTL: Parallel and distributed triangle listing for
         	   massive graphs}},
  year = 	 2015,
  month = 	 apr,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-866.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-866},
  number = 	 {UCAM-CL-TR-866}
}