A UNIFIED SPECTRAL SPARSIFICATION FRAMEWORK FOR DIRECTED GRAPHS

Abstract

Recent spectral graph sparsification research allows constructing nearly-linearsized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the first few eigenvalues and eigenvectors of the graph Laplacian, leading to the development of a variety of nearly-linear time numerical and graph algorithms. However, there is not a unified approach that allows for trulyscalable spectral sparsification of both directed and undirected graphs. For the first time, we prove the existence of linear-sized spectral sparsifiers for general directed graphs, and introduce a practically-efficient yet unified spectral graph sparsification approach that allows sparsifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectra. By exploiting a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing nearly-linear sized (directed) subgraphs, it enables us to well preserve the key eigenvalues and eigenvectors of the original (directed) graph Laplacians. The proposed method has been validated using various kinds of directed graphs obtained from public domain sparse matrix collections, showing promising results for solving directed graph Laplacians, spectral embedding, and partitioning of general directed graphs, as well as approximately computing (personalized) PageRank vectors.

1. INTRODUCTION

Many research problems for simplifying large graphs leveraging spectral graph theory have been extensively studied by mathematics and theoretical computer science (TCS) researchers in the past decade (Batson et al., 2012; Spielman & Teng, 2011; Kolev & Mehlhorn, 2015; Peng et al., 2015; Lee & Sun, 2017; Cohen et al., 2017; 2018) . Recent spectral graph sparsification research allows constructing nearly-linear-sized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the the first few eigenvalues and eigenvectors of the graph Laplacian. The related results can potentially lead to the development of a variety of nearly-linear time numerical and graph algorithms for solving large sparse matrices and partial differential equations (PDEs), graph-based semi-supervised learning (SSL), computing the stationary distributions of Markov chains and personalized PageRank vectors, spectral graph partitioning and data clustering, max flow and multi-commodity flow of undirected graphs, nearly-linear time circuit simulation and verification algorithms, etc. (Koutis et al., 2010; Spielman & Teng, 2011; Christiano et al., 2011; Spielman & Teng, 2014; Kelner et al., 2014; Cohen et al., 2017; 2018; Feng, 2016; 2018) . However, there is not a unified approach that allows for truly-scalable spectral sparsification of both directed and undirected graphs. For example, the state-of-the-art sampling-based methods for spectral sparsification are only applicable to undirected graphs (Spielman & Srivastava, 2011; Koutis et al., 2010; Spielman & Teng, 2014) ; the latest algorithmic breakthrough in spectral sparsification of directed graphs (Cohen et al., 2017; 2018) can only handle strongly-connected directed graphsfoot_0 , which inevitably limits its applications when confronting real-world graphs, since many directed graphs may not be strongly connected, such as the graphs used in chip design automation (e.g., timing analysis) tasks as well as the graphs used in machine learning and data mining tasks. Consequently, there is still a pressing need for the development of highly-robust (theoreticallyrigorous) and truly-scalable (nearly-linear complexity) algorithms for reducing real-world large-scale (undirected and directed) graphs while preserving key graph spectral (structural) properties. In summary, we make the following contributions: •We, for the first time, prove the existence of linear-sized spectral sparsifiers for general directed graphs, and introduces a practically-efficient yet unified spectral sparsification approach that allows simplifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectra. •We exploit a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing ultra-sparse (directed) subgraphs that can well preserve the key eigenvalues and eigenvectors of the original graph Laplacians. Unlike the prior state-ofthe-art methods that are only suitable for handling specific types of graphs (e.g., undirected or strongly-connected directed graphs (Spielman & Srivastava, 2011; Cohen et al., 2017) ), the proposed approach is more general and thus will allow for truly-scalable spectral sparsification of a much wider range of real-world complex graphs. •Through extensive experiments on real-world directed graphs, we show how the proposed directed graph spectral sparsification method can be exploited for computing PageRank vectors, directed graph clustering and developing directed graph Laplacian solvers. The spectrally-sparsified directed graphs constructed by the proposed approach will potentially lead to the development of much faster numerical and graph-related algorithms. For example, spectrallysparsified social (data) networks allow for more efficient modeling and analysis of large social (data) networks; spectrally-sparsified neural networks allow for more scalable model training and processing in emerging machine learning tasks; spectrally-sparsified web-graphs allow for much faster computations of personalized PageRank vectors; spectrally-sparsified integrated circuit networks will lead to more efficient partitioning, modeling, simulation, optimization and verification of large chip designs, etc.

2. RELATED WORKS

Directed graph symmetrization. When dealing with the directed graph sparsification, it's natural to apply symmetrization methods for converting asymmetric directed graphs into symmetric undirected graphs, so that we can apply the existing spectral graph theories for directed graphs after symmetrization. In the following, given a directed graph or its corresponding adjacency matrixfoot_1 A , we will review the most popular graph symmetrization methods: •A + A symmetrization simply ignores the edges' directions, which is the simplest and most efficient way for directed graph symmetrization. However, edge directions may play an important role in directed graphs. As shown in Figure 1 , edges (8, 1) and (4, 5) seem to have the equal importance in the symmetrized undirected graph A + A . However, in the original directed graph, edge (8, 1) is much more important than edge (4, 5), since removing edge (8, 1) will lead to the loss of more connections in the directed graph. For example, removing edge (4, 5) will only affect the walks from node 4 to any other nodes and walks from any other nodes to node 5. However, if we remove edge (8, 1) in the directed graph, it will not only affect walks from node 8 to any other nodes and walks from any other nodes to node 1, there will be also no access from node 5 ,6, 7 and 8 to any of nodes 1, 2, 3 and 4. •Bibliographic symmetrization (Satuluri & Parthasarathy, 2011) adopts AA + A A as the adjacency matrix after symmetrization to take the in-going and out-going edges into consideration. However, it cannot be scaled to large-scale graphs since it will create much denser undirected graphs after symmetrization. Also, disconnected graphs can be created due to the AA + A A symmetrization, as shown in Figure 1 . •Random-walk symmetrization (Chung, 2005) is based on random walks and allows normalized cut to be preserved after symmetrization. This is also the symmetrization approach used in recent work for spectral sparsification of directed graphs (Cohen et al., 2017) . However, it only works on strongly-connected aperiodic directed graphs. For



A strongly connected directed graph is a directed graph in which any node can be reached from any other node along with direction. The concept the adjacency matrix for the directed graph will be further introduced in Section 3.1

