A UNIFIED SPECTRAL SPARSIFICATION FRAMEWORK FOR DIRECTED GRAPHS

Abstract

Recent spectral graph sparsification research allows constructing nearly-linearsized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the first few eigenvalues and eigenvectors of the graph Laplacian, leading to the development of a variety of nearly-linear time numerical and graph algorithms. However, there is not a unified approach that allows for trulyscalable spectral sparsification of both directed and undirected graphs. For the first time, we prove the existence of linear-sized spectral sparsifiers for general directed graphs, and introduce a practically-efficient yet unified spectral graph sparsification approach that allows sparsifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectra. By exploiting a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing nearly-linear sized (directed) subgraphs, it enables us to well preserve the key eigenvalues and eigenvectors of the original (directed) graph Laplacians. The proposed method has been validated using various kinds of directed graphs obtained from public domain sparse matrix collections, showing promising results for solving directed graph Laplacians, spectral embedding, and partitioning of general directed graphs, as well as approximately computing (personalized) PageRank vectors.

1. INTRODUCTION

Many research problems for simplifying large graphs leveraging spectral graph theory have been extensively studied by mathematics and theoretical computer science (TCS) researchers in the past decade (Batson et al., 2012; Spielman & Teng, 2011; Kolev & Mehlhorn, 2015; Peng et al., 2015; Lee & Sun, 2017; Cohen et al., 2017; 2018) . Recent spectral graph sparsification research allows constructing nearly-linear-sized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the the first few eigenvalues and eigenvectors of the graph Laplacian. The related results can potentially lead to the development of a variety of nearly-linear time numerical and graph algorithms for solving large sparse matrices and partial differential equations (PDEs), graph-based semi-supervised learning (SSL), computing the stationary distributions of Markov chains and personalized PageRank vectors, spectral graph partitioning and data clustering, max flow and multi-commodity flow of undirected graphs, nearly-linear time circuit simulation and verification algorithms, etc. (Koutis et al., 2010; Spielman & Teng, 2011; Christiano et al., 2011; Spielman & Teng, 2014; Kelner et al., 2014; Cohen et al., 2017; 2018; Feng, 2016; 2018) . However, there is not a unified approach that allows for truly-scalable spectral sparsification of both directed and undirected graphs. For example, the state-of-the-art sampling-based methods for spectral sparsification are only applicable to undirected graphs (Spielman & Srivastava, 2011; Koutis et al., 2010; Spielman & Teng, 2014) ; the latest algorithmic breakthrough in spectral sparsification of directed graphs (Cohen et al., 2017; 2018) can only handle strongly-connected directed graphsfoot_0 , which inevitably limits its applications when confronting real-world graphs, since many directed graphs may not be strongly connected, such as the graphs used in chip design automation (e.g., timing analysis) tasks as well as the graphs used in machine learning and data mining tasks.



A strongly connected directed graph is a directed graph in which any node can be reached from any other node along with direction.

