A UNIFIED SPECTRAL SPARSIFICATION FRAMEWORK FOR DIRECTED GRAPHS

Abstract

Recent spectral graph sparsification research allows constructing nearly-linearsized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the first few eigenvalues and eigenvectors of the graph Laplacian, leading to the development of a variety of nearly-linear time numerical and graph algorithms. However, there is not a unified approach that allows for trulyscalable spectral sparsification of both directed and undirected graphs. For the first time, we prove the existence of linear-sized spectral sparsifiers for general directed graphs, and introduce a practically-efficient yet unified spectral graph sparsification approach that allows sparsifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectra. By exploiting a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing nearly-linear sized (directed) subgraphs, it enables us to well preserve the key eigenvalues and eigenvectors of the original (directed) graph Laplacians. The proposed method has been validated using various kinds of directed graphs obtained from public domain sparse matrix collections, showing promising results for solving directed graph Laplacians, spectral embedding, and partitioning of general directed graphs, as well as approximately computing (personalized) PageRank vectors.

1. INTRODUCTION

Many research problems for simplifying large graphs leveraging spectral graph theory have been extensively studied by mathematics and theoretical computer science (TCS) researchers in the past decade (Batson et al., 2012; Spielman & Teng, 2011; Kolev & Mehlhorn, 2015; Peng et al., 2015; Lee & Sun, 2017; Cohen et al., 2017; 2018) . Recent spectral graph sparsification research allows constructing nearly-linear-sized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the the first few eigenvalues and eigenvectors of the graph Laplacian. The related results can potentially lead to the development of a variety of nearly-linear time numerical and graph algorithms for solving large sparse matrices and partial differential equations (PDEs), graph-based semi-supervised learning (SSL), computing the stationary distributions of Markov chains and personalized PageRank vectors, spectral graph partitioning and data clustering, max flow and multi-commodity flow of undirected graphs, nearly-linear time circuit simulation and verification algorithms, etc. (Koutis et al., 2010; Spielman & Teng, 2011; Christiano et al., 2011; Spielman & Teng, 2014; Kelner et al., 2014; Cohen et al., 2017; 2018; Feng, 2016; 2018) . However, there is not a unified approach that allows for truly-scalable spectral sparsification of both directed and undirected graphs. For example, the state-of-the-art sampling-based methods for spectral sparsification are only applicable to undirected graphs (Spielman & Srivastava, 2011; Koutis et al., 2010; Spielman & Teng, 2014) ; the latest algorithmic breakthrough in spectral sparsification of directed graphs (Cohen et al., 2017; 2018) can only handle strongly-connected directed graphsfoot_0 , which inevitably limits its applications when confronting real-world graphs, since many directed graphs may not be strongly connected, such as the graphs used in chip design automation (e.g., timing analysis) tasks as well as the graphs used in machine learning and data mining tasks. Consequently, there is still a pressing need for the development of highly-robust (theoreticallyrigorous) and truly-scalable (nearly-linear complexity) algorithms for reducing real-world large-scale (undirected and directed) graphs while preserving key graph spectral (structural) properties. In summary, we make the following contributions: •We, for the first time, prove the existence of linear-sized spectral sparsifiers for general directed graphs, and introduces a practically-efficient yet unified spectral sparsification approach that allows simplifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectra. •We exploit a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing ultra-sparse (directed) subgraphs that can well preserve the key eigenvalues and eigenvectors of the original graph Laplacians. Unlike the prior state-ofthe-art methods that are only suitable for handling specific types of graphs (e.g., undirected or strongly-connected directed graphs (Spielman & Srivastava, 2011; Cohen et al., 2017) ), the proposed approach is more general and thus will allow for truly-scalable spectral sparsification of a much wider range of real-world complex graphs. •Through extensive experiments on real-world directed graphs, we show how the proposed directed graph spectral sparsification method can be exploited for computing PageRank vectors, directed graph clustering and developing directed graph Laplacian solvers. The spectrally-sparsified directed graphs constructed by the proposed approach will potentially lead to the development of much faster numerical and graph-related algorithms. For example, spectrallysparsified social (data) networks allow for more efficient modeling and analysis of large social (data) networks; spectrally-sparsified neural networks allow for more scalable model training and processing in emerging machine learning tasks; spectrally-sparsified web-graphs allow for much faster computations of personalized PageRank vectors; spectrally-sparsified integrated circuit networks will lead to more efficient partitioning, modeling, simulation, optimization and verification of large chip designs, etc.

2. RELATED WORKS

Directed graph symmetrization. When dealing with the directed graph sparsification, it's natural to apply symmetrization methods for converting asymmetric directed graphs into symmetric undirected graphs, so that we can apply the existing spectral graph theories for directed graphs after symmetrization. In the following, given a directed graph or its corresponding adjacency matrixfoot_1 A , we will review the most popular graph symmetrization methods: •A + A symmetrization simply ignores the edges' directions, which is the simplest and most efficient way for directed graph symmetrization. However, edge directions may play an important role in directed graphs. As shown in Figure 1 , edges (8, 1) and (4, 5) seem to have the equal importance in the symmetrized undirected graph A + A . However, in the original directed graph, edge (8, 1) is much more important than edge (4, 5), since removing edge (8, 1) will lead to the loss of more connections in the directed graph. For example, removing edge (4, 5) will only affect the walks from node 4 to any other nodes and walks from any other nodes to node 5. However, if we remove edge (8, 1) in the directed graph, it will not only affect walks from node 8 to any other nodes and walks from any other nodes to node 1, there will be also no access from node 5 ,6, 7 and 8 to any of nodes 1, 2, 3 and 4. •Bibliographic symmetrization (Satuluri & Parthasarathy, 2011) adopts AA + A A as the adjacency matrix after symmetrization to take the in-going and out-going edges into consideration. However, it cannot be scaled to large-scale graphs since it will create much denser undirected graphs after symmetrization. Also, disconnected graphs can be created due to the AA + A A symmetrization, as shown in Figure 1 . •Random-walk symmetrization (Chung, 2005) is based on random walks and allows normalized cut to be preserved after symmetrization. This is also the symmetrization approach used in recent work for spectral sparsification of directed graphs (Cohen et al., 2017) . However, it only works on strongly-connected aperiodic directed graphs. For  G = (V, E G , w G ): directed (undirected) graph G u = (V, E Gu , w Gu ): undirected graph S = (V, E S , w S ): sparsifier of graph G S u = (V, E Su , w Su ): sparsifier of graph G u L G : Laplacian matrix of graph G L Gu : Laplacian matrix of graph G u L S : Laplacian matrix of sparsifier S L Su : Laplacian matrix of sparsifier S u example, we can not apply the random-walk based symmetrization for the directed graph shown in Figure 1 , since it's a periodic directed graph. Cheeger's inequality for directed graphs. In undirected graph problems, Cheeger's inequality plays a significant role in spectral analysis of undirected graphs, which connects Cheeger constant (conductance) with spectral properties (eigenvalues of the graph Laplacian matrix) of a graph. In (Chung, 2005 ) the Cheeger's inequality has been extended to directed graphs based on the randomwalk Laplacian symmetrization scheme, as we mentioned earlier. It also provides the bound for the smallest eigenvalue of the directed graph Laplacian. However, the related theoretical results can only be applied to strongly-connected and aperiodic directed graphs, which are rare in real-world applications. Spectral sparsification of directed graphs. The latest algorithmic breakthrough in spectral sparsification for strongly-connected aperiodic graphs has been introduced based on the results in (Chung, 2005) , which proposes to convert strongly-connected graphs into Eulerian graphs via Eulerian scaling, and subsequently sparsify the undirected graphs (obtained via directed graph symmetrization (Chung, 2005) ) leveraging existing undirected graph spectral sparsification methods (Cohen et al., 2017) . It has been shown that such an approach can potentially lead to the development of almostlinear-time algorithms for solving asymmetric linear systems, computing the stationary distribution of a Markov chain and computing expected commute times in a directed graph, etc (Cohen et al., 2017; 2018) .

3.1. LAPLACIANS FOR DIRECTED AND UNDIRECTED GRAPHS

Consider a directed graph G = (V, E G , w G ) with V denoting the set of vertices, E G representing the set of directed edges, and w G denoting the associated edge weights. Let n = |V |, m = |E G | be the size of node and edge set. In the following, we denote the diagonal matrix by D G with D G (i, i) being equal to the (weighted) outdegree of node i, as well as the adjacency matrix of G by A G : A G (i, j) = w G (i, j) if (i, j) ∈ E G 0 otherwise . Then the directed Laplacian matrix can be constructed as follows (Cohen et al., 2017) : L G = D G -A G . For better illustration, we have summarized symbols used in our paper in Table 1 . It can be shown that any directed (undirected) graph Laplacian constructed using (3.1) will satisfy the following properties: I) Each column (and row) sum is equal to zero; II) All off-diagonal elements are non-positive; III) The Laplacian matrix is asymmetric (symmetric) and indefinite (positive semidefinite).

3.2. SPECTRAL SPARSIFICATION OF (UN)DIRECTED GRAPHS

Graph sparsification aims to find a subgraph (sparsifier) S = (V, E S , w S ) that has the same set of vertices but much fewer edges than the original graph G. There are two types of sparsification methods: the cut sparsification methods preserve cuts in the original graph through random sampling of edges (Benczúr & Karger, 1996) , whereas spectral sparsification methods preserve the graph spectral (structural) properties, such as distances between vertices, effective resistances, cuts in the graph, as well as the stationary distributions of Markov chains (Cohen et al., 2017; 2018; Spielman & Teng, 2011) . Therefore, spectral graph sparsification is a much stronger notion than cut sparsification. For undirected graphs, spectral sparsification aims to find an ultra-sparse subgraph proxy that is spectrally-similar to the original one. G and S are said to be σ-spectrally similar if the following condition holds for all real vectors x ∈ R V : x L S x σ ≤ x L G x ≤ σx L S x, where L G and L S denote the symmetric diagonally dominant (SDD) Laplacian matrices of graphs G and S, respectively. Relative condition number can be defined as κ(L G , L S ) ≤ σ 2 , implying that a smaller relative condition number or σ 2 corresponds to a higher (better) spectral similarity between two graphs. For directed graphs : Spectrum-preserving L G L G symmetrization the subgraph S can be considered spectrally similar to the original graph G if the condition number or the ratio between the largest and smallest singular values of Cohen et al., 2017; 2018) , where L + S denotes the Moore-Penrose pseudoinverse of L S . Spectral sparsification of directed graphs is equivalent to finding an ultra-sparse subgraph S such that the condition number of (L L + S L G is close to 1 ( + S L G ) (L + S L G ) is small enough. Since the singular values of L + S L G are the square roots of eigenvalues of (L + S L G ) (L + S L G ). While (L + S L G ) (L + S L G ) can be written into L G (L S L S ) + L G , L G (L S L S ) + L G is not equal to (L S L S ) + (L G L G ). They do share the same eigenvalues under special conditions according to the following theorem (Horn & Johnson, 2012): Theorem 3.1. Suppose that matrices X ∈ R m ,n and Y ∈ R n ,m with m ≤ n . Then the n eigenvalues of YX are the m eigenvalues of XY together with n -m zeroes; that is p YX (t) = t n -m p XY (t). If m = n and at least one of X or Y is nonsingular, then XY and YX are similar. Based on Therorem 3.1, L G (L S L S ) + L G and (L S L S ) + (L G L G ) share the same eigenvalues when a small value is added on each diagonal of L G . Under this condition, spectral sparsification of directed graphs is equivalent to finding an ultra-sparse subgraph S such that the condition number of (L S L S ) + (L G L G ) is small enough. Theorem 3.2 shows both L G L G and L S L S are the Laplacian matrices for some undirected graphs. Theorem 3.2. For any directed graph G = (V, E G , w G ) and its directed Laplacian L G , its symmetrized undirected graph G u = (V, E Gu , w Gu ) can be obtained via Laplacian symmetrization L Gu = L G L G . L Gu is positive semi-definite (PSD) and will have the all-one vector as its null space, while the corresponding undirected graph may include negative edge weights. The proof is in the Appendix. If we can prove that there exists an ultra-sparse subgraph S such that its corresponding undirected graph S u (with L Su = L S L S ) is the spectral sparsifier of G u (with L Gu = L G L G ) , then the directed subgraph S becomes the spectral sparsifier of G . More detailed proofs are shown in the Appendix. The core idea of our approach is to leverage a novel spectrum-preserving Laplacian symmetrization procedure to convert directed graphs into undirected ones that may have negative-weighted edges (as shown in Figure 1 ). Such a Laplacian symmetrization scheme will immediately allow us to exploit existing methods for spectral sparsification.

4. A PRACTICAL FRAMEWORK FOR UNIFIED SPECTRAL SPARSIFICATION

To apply our theoretical results to deal with real-world directed graphs, the following concerns should be addressed in advance: • The undirected graph L G L G may become too dense to compute and thus may impose high cost during spectral sparsification. • It can be quite challenging to convert the sparsified undirected graph to its corresponding directed sparsifier L S , even when L Su is available. To address the above concerns for unified spectral graph sparsification, we propose a practicallyefficient framework with following desired features: 1) our approach dose not require to explicitly compute L G L G but only the matrix-vector multiplications; 2) our approach can effectively identify the most spectrally-critical edges for dramatically decreasing the relative condition number; 3) although our approach requires to compute L S L S , the L Su matrix density can be effectively controlled by carefully pruning spectrally-similar edges through the proposed edge similarity checking scheme.

4.1. INITIAL SUBGRAPH SPARSIFIER CONSTRUCTION

Motivated by the recent research on low-stretch spanning trees (Elkin et al., 2008; Abraham & Neiman, 2012) and spectral perturbation analysis (Feng, 2016; 2018) for nearly-linear-time spectral sparsification of undirected graphs, we propose a practically-efficient algorithm for sparsifying general directed graphs by first constructing the initial subgraph sparsifiers of directed graphs with the following procedure: • Compute D -1 (A G + A G ) as a new adjacency matrix, where D denotes the diagonal matrix with each element equal to the row (column) sum of (A G + A G ). Recent research shows such split transformations can effectively reduce graph irregularity while preserving critical graph connectivity, distance between node pairs, the minimal edge weight in the path, as well as outdegrees and indegrees when using push-based and pull-based vertex-centric programming (Nodehi Sabet et al., 2018) . • Construct a maximum spanning tree (MST) based on D -1 (A G + A G ), which allows us to effectively control the number of outgoing edges for each node so that the resultant undirected graph after Laplacian symmetrization will not be too dense. • Recover the direction of each edge in the MST and make sure each node of its sparsifier has at least one outgoing edge if there are more than one in the original graph for achieving stronger connectivity in the initial directed sparsifier.

4.2. SPECTRAL SPARSIFICATION VIA RIEMANNIAN DISTANCE MINIMIZATION

The Riemannian distance δ 2 between positive definite (PSD) matrices is arguably the most natural and useful distance on the positive definite cone S n ++ (Bonnabel & Sepulchre, 2010) , which can be computed by (Lim et al., 2019) : δ 2 : S n ++ × S n ++ - → R + and δ 2 (L Su , L Gu ) = n i=1 log 2 λ i 1 2 , where λ max = λ 1 ≥ • • • ≥ λ n ≥ 1 ≥ λ n +1 ≥, • • • , ≥ λ n denote the descending eigenvalues of L + Su L Gu , and v 1 , v 2 , • • • , v n denote the corresponding eigenvectors. Since both S u and G u are PSD matrices, we have λ i ≥ 0, which leads to the following inequality: δ 2 (L Su , L Gu ) ≤ n i=1 log λ i + n i=n +1 log 1 λ i ≤ max{n log λ 1 , n log 1 λ n }. The Courant-Fischer theorem allows computing generalized eigenvalues and eigenvectors by: λ 1 = max |x| =0, x 1=0 x L Gu x x L Su x λ n = min |x| =0, x 1=0 x L Gu x x L Su x (5) where 1 ∈ R n denotes the all-one vector. Assigning each node in the graph with an integer value either 0 or 1, the corresponding Laplacian quadratic form measures the boundary size (cut) of a node set. For example, if a node set Q is defined as a subset when its nodes are all assigned with an integer value 1 while other nodes are assigned with a value 0, then node set Q and its boundary ∂ Gu (Q) can be represented as Q = {p ∈ V : x(p) = 1} and ∂ Gu (Q) = {(p, q) ∈ E, p ∈ Q, q / ∈ Q}. Since the number of outgoing edges crossing the boundary can be computed by x L Gu x = |∂ Gu (Q)|, the following holds: λ 1 = max |x| =0, x 1=0 x L Gu x x L Su x ≥ max |x| =0, x(p)∈{0,1} x L Gu x x L Su x = |∂ Gu (Q)| |∂ Su (Q)| , which implies that the maximum mismatch between G u and S u will be bounded by λ 1 . Leveraging the dominant generalized eigenvectors will allow us to identify the edges crossing the maximallymismatched boundary ∂ Gu (Q). Including such crossing edges into S u will dramatically decrease the maximum mismatch (λ 1 ), thereby improving spectral approximation of the sparsifier. Consequently, the following problem formulation for spectral graph sparsification is proposed: min L Su max x x L Gu x x L Su x + β L Su 1 , which allows effectively minimizing the largest generalized eigenvalue (upper bound of mismatch) and the Riemaninan distance (δ 2 ) by including the minimum amount of edges into the subgraph S u .

4.3. A UNIFIED SPECTRAL PERTURBATION ANALYSIS FRAMEWORK

As aforementioned, spectral sparsification of (un)directed graphs can be effectively achieved by solving (8). To this end, we will exploit the following spectral perturbation analysis framework. Given the generalized eigenvalue problem L Gu v i = λ i L Su v i with i = 1, • • • , n, let matrix V = [v 1 , • • • , v n ]. Then v i and λ i can be computed to satisfy the following orthogonality requirement: v i L Gu v j = λ i , i = j 0, i = j and v i L Su v j = 1, i = j 0, i = j. Consider the following first-order generalized eigenvalue perturbation problem: L Gu (v i + δv i ) = (λ i + δλ i )(L Su + δL Su )(v i + δv i ), where a small perturbation δL Su in L Su is introduced, leading to the perturbed generalized eigenvalues and eigenvectors λ i + δλ i and v i + δv i . By only keeping the first-order terms, (10) becomes: L Gu δv i = λ i L Su δv i + λ i δL Su v i + δλ i L Su v i . Let δv i = j ψ i,j v j , then (11) can be expressed as: j ψ i,j L Gu v j = λ i L Su ( j ψ i,j v j ) + λ i δL Su v i + δλ i L Su v i (12) Based on the orthogonality properties in (9), multiplying v i to both sides of (12) results in λ i δL Su v i + δλ i L Su v i = 0. ( ) Then the task of spectral sparsification of general (un)directed graphs will require to recover as few as possible extra edges to the initial directed subgraph S such that the largest eigenvalues, or the condition number of L + Su L Gu can be dramatically mitigated. Expanding (13) will simply lead to: δλ i λ i = -v i δL Su v i . ( ) Expand δL Su with only the first-order terms as δL Su = δL S L S + L S δL S , where δL S = w G (p, q)e p,q e p for (p, q) ∈ E G \ E S , e p ∈ R n denotes the vector with only the p-th element being 1 and others being 0, and e p,q = e p -e q . The spectral sensitivity for each off-subgraph edge (p, q) can be expressed as: ζ p,q = v i (δL S L S + L S δL S )v i , It is obvious that (15) can be leveraged to rank the spectral importance of each edge. Consequently, spectral sparsification of general graphs can be achieved by only recovering a few dissimilar edges with large sensitivity values. In this work, the following method based on t-step power iterations is proposed to achieve efficient computation of dominant generalized eigenvectors v 1 ≈ h t = (L + Su L Gu ) t h 0 , where h 0 denotes a random vector. When the number of power iterations is small (e.g., t ≤ 3), h t will be a linear combination of the first few dominant generalized eigenvectors corresponding to the largest few eigenvalues. Then the spectral sensitivity for the off-subgraph edge (p, q) can be approximately computed by ζ p,q ≈ h t (δL S L S + L S δL S )h t . ( ) The computation of h t through power iterations requires solving the linear system of equations L Su x = b for t times. We note that only L Su need to be explicitly computed for generalized power iterations. The latest Lean Algebraic Multigrid (LAMG) solver has been leveraged for computing h t , which can handle undirected graphs with negative edge weights and achieve O(m) runtime complexity for solving large graph Laplacian matrices (Livne & Brandt, 2012) .

4.4. EDGE SPECTRAL SIMILARITIES

To avoid recovering redundant edges into the subgraph, it is indispensable to check spectral similarities between candidate off-subgraph edges. In other words, only the off-subgraph edges that are spectrally critical (have higher spectral sensitivity scores) but not spectrally-similar to each other will be recovered to the initial subgraph. To this end, we exploit the following spectral embedding of off-subgraph edges using approximate dominant generalized eigenvectors h t computed by ( 16): τ p,q = h t e pq e p L S h t , (18) which will help estimate spectral similarities among different off-subgraph edges. To improve the accuracy, we can always compute r = O(log n) approximate dominant generalized eigenvectors h (1) t , ..., h t to obtain a r-dimensional spectral embedding vector T p,q for each edge (p, q). The edge spectral similarity between two edges (p i , q i ) and (p j , q j ) is defined as follows: β i,j = ||T p i ,q i -T p j ,q j ||/max(||T p i ,q i ||, ||T p j ,q j ||). If (1 -β i,j ) < for a given constant , edge (p i , q i ) is considered spectrally dissimilar with edge (p j , q j ).

4.5. ALGORITHM FLOW AND COMPLEXITY

Algorithm 1 Algorithm Flow for Directed Graph Sparsification Input: L G , L S , d out , iter max , λ limit , α, 1: Calculate largest generalized eigenvector h t , largest generalized eigenvalue λ max and let iter = 1; 2: while λ max < λ limit , iter < iter max do 3: Calculate the spectral sensitivities ζ p,q for each off-subgraph edges (p, q) ∈ E G\S ; 4: Sort spectral sensitivities in descending order and obtain the top α% off-subgraph edges into edge list E list = [(p 1 , q 1 ), (p 2 , q 2 ), ...]; 5: Algorithm 1 shows the algorithm flow for directed graph sparsification, where L G is the Laplacian matrix for original graph, L S is the Laplacian matrix of initial spanning tree, d out is the user-defined outgoing degree for nodes, iter max is the maximum number of iterations, and λ limit is the desired maximum generalized eigenvalue. E addlist = Edge_Similarities_Checking(E list , L G , L S , d out , ) is introduced in the Appendix. The complexity has been summarized as follows:  Do E addlist = Edge_Similarities_Checking(E list , L G , L S ,

5. EXPERIMENTAL RESULTS

The proposed algorithm for spectral sparsification of directed graphs has been implemented using MATLAB and C++. Extensive experiments have been conducted to evaluate the proposed method with various types of directed graphs obtained from public-domain data sets (Davis & Hu, 2011) . Table 2 shows comprehensive results on directed graph spectral sparsification for a variety of realworld directed graphs using the proposed method, where |V G |(|E G |) denotes the number of nodes (edges) for the original directed graph G; |E S 0 | and |E S | denote the numbers of edges in the initial subgraph S 0 and final spectral sparsifier S. Notice that we will directly apply the Matlab's "eigs" function if the size of the graph is relatively small (|E S 0 | < 1E4); otherwise we will switch to the LAMG solver for better efficiency when calculating the approximate generalized eigenvector h t . We report the total runtime for the eigsolver using either the LAMG solver or "eigs" function. λ1 λ 1,f in denotes the reduction rate of the largest generalized eigenvalue of L + Su L Gu . Figure 2 shows the runtime scalability regarding to the number of off-subgraph edges (|E added |) added in the final sparsifier for graph gre_1107 (left), big (middle) and gre_115 (right). As we can see, the runtime scales linearly with the added number of edges for all three graphs. Since there are no other existing directed graph sparsification methods to be compared, we compare our proposed method with the existing undirected graph sparsification solver GRASS (Feng, 2016; 2018) . To make sure the directed graphs can be applied to the GRASS, we first convert the directed graphs into undirected ones (G u ) using A + A symmetrization. Then undirected graph sparsifiers S u will be generated by GRASS. Finally, the final directed graph sparsifiers can be formed by adding the edge directions to the obtained undirected sparsifiers S u . The experimental results are shown in Table 3 , where κ represents the relative condition number between the original graph and its final sparsifier. By keeping the similar number of edges in the sparsifiers, we can observe that our method can achieve much better sparsifiers than GRASS does. |E S u | |E G u | κ(L G u , L S u ) |ES | |EG| κ(L G , L S ) |ES | |EG| κ(L G , L S ) More results regarding the directed graph sparsification are shown in Appendix. The directed graph sparsifier can also be directly utilized as the preconditioner for directed Laplacian solver when solving the linear system equation Lx = b with iterative solvers such as generalized minimal residual method (GMRES) (Saad & Schultz, 1986) . Table 4 shows comprehensive results for GMRES iterations when no preconditioner is applied, Incomplete LU factorization (ILU) as the preconditioner is applied, and the directed sparsifier Laplacian L S as the preconditioner is applied. The MATLAB functions gmres and ilu with default setting are applied in our experiments. "relres" is the relative residual to be achieve for three methods, "iter" is GMRES iteration number, "nnz" is the number of non-zeros in the preconditioner, and "N C" represents it did not converge when reaching maximum number of iterations (which is 500 in the experiments). We can conclude that GMRES with directed graph sparsifier as the preconditioner has better convergence rate when comparing to the other two methods. Meanwhile, the sparsifier has much fewer number of non-zeros than ILU preconditioner. More results on directed graph Laplacian solver can be found in Appendix. In the end, we also demonstrate the applications of our proposed directed graph sparsification in solving directed graph Laplacians using both direct method and iterative method, as well as the applications in computing (Personalized) PageRank vectors and directed graph partitioning in Appendix.

6. CONCLUSIONS

For the first time, this paper proves the existence of linear-sized spectral sparsifiers for general directed graphs, and proposes a practically-efficient yet unified spectral graph sparsification framework. Such a novel spectral sparsification approach allows sparsifying real-world, large-scale directed and undirected graphs with guaranteed preservation of the original graph spectral properties. By exploiting a highly-scalable (nearly-linear complexity) spectral matrix perturbation analysis framework for constructing nearly-linear sized (directed) subgraphs, it enables us to well preserve the key eigenvalues and eigenvectors of the original (directed) graph Laplacians. The proposed method has been validated using various kinds of directed graphs obtained from public domain sparse matrix collections, showing promising spectral sparsification and partitioning results for general directed graphs. 

7. APPENDIX

7.1 PROOF OF THEOREM 3.2 Proof. Each element (i, j) in L Gu can be written as follows: L Gu (i, j) = D G 2 (i, i) + k A G 2 (k, i) i = j k (-A G (k, i)A G (k, j) + A G (k, i)D G (k, j) + D G (k, i)A G (k, j)) i = j. ( ) It can be shown that the following is always true: L Gu (i, i) + j,j =i L Gu (i, j) = k L G (i, k)L G (i, k) + j,j =i k L G (j, k)L G (i, k) = k L G (i, k)   L G (i, k) + j,j =i L G (j, k)   = 0, which indicates the all-one vector is the null space of L Gu . For directed graphs, it can be shown that if a node has more than one outgoing edge, in the worst case the neighboring nodes pointed by such outgoing edges will form a clique possibly with negative edge weights in the corresponding undirected graph after symmetrization. As an example shown in Figure 3 , when edge e2 is added into the initial graph G that includes a single edge e1, an extra edge (shown in red dashed line) coupling with e1 will be created in the resultant undirected graph G u ; similarly, when an edge e3 is further added, two extra edges coupling with e1 and e2 will be created in G u . When the last edge e4 is added, It forms a clique. Also, it can be shown that G u will contain negative edge weights under the following condition: k (A G (k, i)D G (k, j) + D G (i, k)A G (j, k)) > k A G (k, i)A G (k, j). ( ) In some cases, there may exist no clique even though all outgoing edges according to one node are added into subgraph only because the weights of these edges satisfy k (A G (k, i)D G (k, j) + D G (i, k)A G (j, k)) = k A G (k, i)A G (k, j). ( )

7.2. EXISTENCE OF LINEAR-SIZED SPECTRAL SPARSIFIER FOR DIRECTED GRAPHS

Existing spectral sparsification methods for undirected graphs (Spielman & Teng, 2011; Batson et al., 2012; Feng, 2016) can not handle undirected graphs with negative-weighted edges. So we have to first prove the existence of spectral sparsifier for undirected graph with negative-weighted edges, or the existence of linear-sized spectral sparsifier for directed graphs . Theorem 7.1. For a given directed graph G and its undirected graph G u = (V, E Gu , w Gu ) obtained via Laplacian symmetrization, there exists a (1 + )-spectral sparsifier S with O(n/ 2 ) edges such that its undirected graph S u = (V, E Su , w Su ) after symmetrization satisfies the following condition for any x ∈ R n : 1 3 2 e1 e2, e3 e4 B =    1 0 -1 0 -1 1 0 1 -1 -1 1 0    C =    1 0 -1 0 -1 1 0 1 -1 -1 1 0    W =    1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1    L G = B W C = 1 -1 0 0 2 -1 -1 -1 1 ( 1 -)x L Gu x ≤ x L Su x ≤(1 + )x L Gu x. Before proving Theorem 7.1 for symmetrized undirected graph G u with negative edge weight, we need to introduce the following lemma 7.2 (Batson et al., 2012) . Lemma 7.2. Let d > 0, and u 1 , u 2 , ..., u m denote a set of vectors in R n that allow expressing the identity decomposition as: 1≤i≤m u i u i = id R n , where id R n ∈ R n×n denotes the identity matrix. Then there exists a series of non-negative coefficients {t i } m i=1 with |i : t i = 0| ≤ dn so that x id R n x ≤ i t i x u i u i x ≤(1 + )x id R n x. ∀x ∈ R n Proof. Lemma (7.2) proves the existence of the sparsifier for an undirected graph with positive weight edges. We have to extend the above lemma to prove the existence of the sparsifier for a symmetrized undirected graph with negative weight edges. The key of our approach is to construct a set of vectors u 1 , • • • , u m in R n such that u i can be expressed as an identity decomposition (25). To construct u i , the Laplacian of an undirected graph can be written as L G = B WB (Spielman & Srivastava, 2011) , where B m×n is the signed edge-vertex incidence matrix: B(i, v) =    1 if v is i-th edge's head -1 if v is i-th edge's tail 0 otherwise . ( ) W m×m is the diagonal matrix with W (i, i) = w i . The Laplacian matrix of a directed graph can be written as L G = B WC, where C m×n is a injection matrix defined as: C(i, v) =    1 if v is i-th edge's head 0 if v is i-th edge's tail 0 otherwise. Figure 4 shows an example for constructing a directed Laplacian matrix based on B and C matrices. In the following, we show how to construct the vectors u i . The undirected Laplacian after symmetrization can be written as L Gu = B W o B with W o = WCC W. Since L Gu and its Pseudoinverse L + Gu can be written as L Gu = m j=1 λ j u j u j , L + Gu = m j=1 1 λ j u j u j , it can be shown that L Gu L + Gu = m j=1 u j u j = id L Gu , where id L Gu is the identity on im(L Gu ) = ker(L Gu ) . Consequently, U n×m matrix with u i for i = 1, ..., m as its column vectors can be constructed as U n×m = [u 1 , ..., u m ] = L + /2 Gu B W 1 /2 o . It can be shown that U n×m will satisfy the following equation: U n×m U n×m = i u i u i = L Gu + /2 B W o BL Gu +T /2 = L Gu + /2 L Gu L Gu +T /2 = id L Gu According to Lemma 7.2, we can always construct a diagonal matrix T ∈ R m×m with t i as its i-th diagonal element. Then there will be at most O(n/ 2 ) positive diagonal elements in T, which allows constructing L Su = B W 1 /2 o TW 1 /2 o B that corresponds to the directed subgraph S for achieving (1 + )-spectral approximation of G as required by ( 24). It can be shown that each u i with a nonzero t i coefficient corresponds to the outgoing edges pointed by the same node. Consequently, for directed graphs with bounded degrees, there will be O(n/ 2 ) total number of directed edges in the (1 + )-spectral sparsifier S.

Algorithm 2 Edge_Similarities_Checking

Input: Elist, L G , L S , dout, 1: Perform t-step power iterations with r = O(log n) initial random vectors h (1) 0 , ..., h (r) 0 to compute r approximate dominant generalized eigenvectors h (1) t , ..., h t ; 2: Compute a r-dimensional embedding vector Tp i ,q i ∈ R r for ∀(pi, qi) ∈ Elist; 3: let Eaddlist = [(p1, q1)]; 4: for i=2:|Elist| do 5: Calculate the spectral similarity score βi,j between (pi, qi) and every edge (pj, qj) in Eaddlist; 6: if 1 -βi,j < , for ∀(pj, qj) ∈ Eaddlist then 7: Eaddlist = [Eaddlist; (pi, qi)]; 8: end if 9: end for 10: Return Eaddlist ;

7.4. RESULTS OF DIRECTED GRAPH SPARSIFICATION

Figure 5 shows the spectral sensitivities of all the off-subgraph edges (e2 to e19 represented with blue color) in both directed and undirected graphs calculated using MATLAB's "eigs" function and the proposed method based on (17) using the LAMG solver, respectively. Meanwhile, the spectral sensitivities of all the off-subgraph edges (e2 to e19) with respect to the dominant eigenvalues (λ max or λ 1 ) in both directed and undirected graphs are plotted. We observe that spectral sensitivities for directed and undirected graphs are drastically different from each other. The reason is that the spectral sensitivities for off-subgraph edges in the directed graph depend on the edge directions. It is also observed that the approximate spectral sensitivities calculated by the proposed t-step power iterations with the LAMG solver match the true solution very well for both directed and undirected graphs. We plot the detailed reduction rates of the largest generalized eigenvalue when adding different number of off-subgraph edges to the sparsifiers of graph "gre_115" and "peta" in Figure 6 . It shows that the largest generalized eigenvalue can be effectively reduced if sufficient off-subgraph edges are included into the sparsifier.

7.5. APPLICATIONS IN DEVELOPING DIRECTED LAPLACIAN SOLVER

Consider the solution of the following linear systems of equations: Lx = b. Recent research has been focused on more efficiently solving the above problem when L is a Laplacian matrix of an undirected graph (Kelner et al., 2014; Koutis et al., 2010) . In this work, we will mainly focus on solving nonsymmetric Laplacian matrices that correspond to directed graphs.

7.5.1. DIRECT METHOD FOR DIRECTED LAPLACIAN SOLVER

Lemma 7.3. When solving (32), the right preconditioning system is applied, leading to the following alternative linear system of equations: L Gu y = b, (33) where vector b will lie in the left singular vector space. When the solution of ( 33) is obtained, the solution of (32) is given by L G y = x. It is obvious that solving the above equation is equivalent to solving the problem of L G L G L + G x = b. In addition, L Gu is a Laplacian matrix of an undirected graph that can be much denser than L G . Therefore, we propose to solve the linear system of L Su ỹ = b instead to effectively approximate (33) since G Su is sparser than G Gu and more efficient to solve in practice. We analyze the solution errors based on the generalized eignvalue problem of L G U and L S U . We have VL G U V = λ and VL S U V = I, where V = [v 1 , v 2 , ..v n ], λ is the diagonal matrix with Figure 7 : GMRES convergence results for graphs of (a) pesa and (b) big its generalized eigenvalues λ i ≥ 1 on its diagonal. Since the errors can be calculated from the following procedure: L G U y -L S U ỹ = L G U (y -ỹ) + (L G U -L S U )ỹ = 0, we can write the error term as follows: (ỹ -y) = L + G U (L G U -L S U )ỹ. Since ỹ = i a i v i , the error can be further expressed as (ỹ -y) = i a i (1 - 1 λ i )v i . Therefore, the error term (36) can be generally considered as a combination of high-frequency errors (generalized eigenvectors with respect to high generalized eigenvalues) and low-frequency errors (generalized eigenvectors with respect to low generalized eigenvalues). After applying GS relaxations, the high-frequency error terms can be efficiently removed (smoothed), while the low-frequency errors tend to become zero if the generalized eigenvalues approach 1 considering (1 -1 λi ) tends to be approaching zero. As a result, the error can be effectively eliminated using the above solution smoothing procedure. In summary, in the proposed directed Laplacian solver, the following steps are needed: (a) We will first extract a spectral sparsifier L S of a given (un)directed graph L G . Then, it is possible to compute an approximate solution by exploiting its spectral sparsifier L Su = L S L S via solving ỹ = L + Su b instead. (b) Then we improve the approximate solution ỹ by getting rid of the high-frequency errors via applying a few steps of GS iterations (Briggs, 1987) . (C) The final solution is obtained from x = L G ỹ. Figure 7 shows the relative residual plot (versus GMRES iteration number) when no preconditioner is applied, Incomplete LU factorization (ILU) as the preconditioner is applied, and the directed sparsifier Laplacian L S as the preconditioner is applied for graph pesa and big. We can conclude that GMRES with directed sparsifiers as preconditioners has faster convergence rate than the other two methods. It is also observed that the number of nonzeros (nnz) in the preconditioner matrix created by the directed sparsifier is the lowest.

7.6. APPLICATIONS IN COMPUTING (PERSONALIZED) PAGERANK VECTORS

The idea of PageRank is to give a measurement of the importance for each web page. For example, PageRank algorithm aims to find the most popular web pages, while the personalized PageRank algorithm aims to find the pages that users will most likely to visit. To state it mathematically, the PageRank vector p satisfies the following equation: p = A G D -1 G p, where p is also the eigenvector of A G D -1 G that corresponds to the eigenvalue equal to 1. Meanwhile, p represents the stable distribution of random walks on graph G. However, D -1 G can not be defined if there exists nodes that have no outgoing edges. To deal with such situation, a self-loop with a small edge weight can be added for each node. The stable distributions of (un)directed graphs may not be unique. For example, the undirected graphs that have multiple strongly-connected components, or the directed graphs that have nodes without any outgoing edges, may have non-unique distributions. In addition, it may take very long time for a random walk to converge to a stable distribution on a given (un)directed graph. To avoid such situation in PageRank, a jumping factor α that describes the possibility at α to jump to a uniform vector can be added, which is shown as follows: p = (1 -α)A G D -1 G p + α n 1, p = α n (I -(1 -α)A G D -1 G ) -1 1, where α ∈ [0, 1] is a jumping constant. After applying Taylor expansions, we can obtain that p = α n i ((1 -α)A G D -1 G ) i . ( ) By setting the proper value of α (e.g., α = 0.15), the term (1 -α) i will be quickly reduced with increasing i. Instead of starting with a uniform vector α n 1, a nonuniform personalization vector pr can be applied: p = (1 -α)A G D -1 G p + αpr. Figure 8 shows the application of the proposed directed graph sparsification for computing PageRank vectors, where the correlation of PageRank results using the original graphs (x-axis) and sparsifiers (y-axis) are plotted for graph ibm_32 (left), mathworks_100 (middle) and gre_1107 (right). Note that a few steps of Gauss-Seidel smoothing have been applied to remove the high-frequency errors to obtain the smoothed PageRank vectors when using the sparsified graphs. We observe that the PageRank vectors obtained from sparsifiers can well approximate the results computed with the original graphs. Similar to the results of PageRank in Figure 8 , Figure 9 shows the application of the proposed directed graph sparsification on the personalized PageRank, where the correlations of personalized PageRank results using the original graphs (x-axis) and sparsifiers (y-axis) are plotted for graph ibm_32 (left), mathworks_100 (middle) and gre_1107 (right). Gauss-Seidel smoothing are also applied when using the sparsified graphs. We can observe that personalized PageRank vectors from sparsifiers match very well with the ones generated from original graphs, which demonstrates the effectiveness of the sparsifiers on the Personalized PageRank application.

7.7. APPLICATIONS IN DIRECTED GRAPH PARTITIONING

It has been shown that partitioning and clustering of directed graphs can play very significant roles in a variety of applications related to machine learning (Malliaros & Vazirgiannis, 2013) , data mining and circuit synthesis and optimization (Micheli, 1994) , etc. However, the efficiency of existing methods for partitioning directed graphs strongly depends on the complexity of the underlying graphs (Malliaros & Vazirgiannis, 2013) . In this work, we propose a spectral method for directed graph partitioning problems. For an undirected graph, the eigenvectors corresponding to the first few smallest eigenvalues can be utilized for the spectral partitioning purpose (Spielman & Teng, 1996) . For a directed graph G on the other hand, the left singular vectors of Laplacian L G will be required for directed graph partitioning. The eigen-decomposition of its symmetrization L G U can be wirtten as λ ! λ " , λ # , λ $ , λ % λ & λ ' , λ !( λ !! λ ) , λ * L G U = i λ i v i v i , where 0 = λ 1 ≤ ...λ k and v 1 , ..., v k , with k ≤ n denote the Laplacian eigenvalues and eigenvectors, respectively. There may not be n eigenvalues when there are some nodes without any outgoing edges. In addition, the spectral properties of L G U are more complicated since the eigenvalues always have multiplicity (either algebraic or geometric multiplicities). For example, the eigenvalues according to the symmetrization of the directed graph in Figure 10 have a a few multiplicities: λ 2 = λ 3 , λ 4 = λ 5 = λ 6 = λ 7 , λ 9 = λ 10 , as shown in Figure 11 . Therefore, we propose to exploit the eigenvectors (left singular vectors of directed Laplacian) corresponding to the first few different eigenvalues (singular values of directed Laplacian) for directed graph partitioning. For example, the partitioning result of the directed graph in Figure 10 will depend on the eigenvectors of v 1 , v 2 , v 4 , v 8 that correspond to eigenvalues of λ 1 , λ 2 , λ 4 , λ 8 . As shown in Figure 10 , the spectral partitioning results can be quite different between the directed and undirected graph with the same set of nodes and edges. In general, it is possible to first extract a spectrally-similar directed graph before any of the prior partitioning algorithms are applied. Since the proposed spectral sparsification algorithm can well preserve the structural (global) properties of the original graphs, the partitioning results obtained from the sparsified graphs will be very similar to the original ones. Figure 12 shows the spectral graph partitioning results on the symmetrized graph G u of original directed graph and its symmetrized sparsifier S u . As observed, very similar partitioning results have been obtained, indicating well preserved spectral properties within the spectrally-sparsified directed graph. 



A strongly connected directed graph is a directed graph in which any node can be reached from any other node along with direction. The concept the adjacency matrix for the directed graph will be further introduced in Section 3.1



Figure 1: Converting a directed graph G in (I) into undirected graphs using A + A as shown in (II), AA + A A as shown in (III), and the proposed L G L G symmetrization as shown in (IV).

Figure 2: Runtime scalability for gre_1107 (left), big (middle), gre_115 (right)

Figure 3: Edge coupling during directed Laplacian symmetrization.

Figure 4: Forming a directed Laplacian with B and C matrices.

Figure 5: The spectral sensitivity scores of off-subgraph edges (e2 to e19 in blue) for the undirected (left) and directed graph (right).

Figure9: The correlation of personalized PageRank between itself and its sparsifier for graph ibm_32 (left), mathworks_100 (middle) and gre_1107 (right) after smoothing

Figure 11: Eigenvalues distribution of L G U for the directed graph in Figure 10

Figure 12: The partitioning results between G u (left) and its sparsifier S u (right) for the 'ibm32.mtx' graph.

Summary of symbols used in this paper Before symmetrizationAfter symmetrization

d out , ) ; Update S new = S + E addlist and calculate largest generalized eigenvector h tnew , largest generalized eigenvalue λ maxnew based on L G and L Snew ; S new , h t = h tnew , λ max = λ maxnew ;

Results of directed graph spectral sparsification

Comparison of spectral sparsification results between the proposed method and GRASS.

Comparison of GMRES results.

Relative errors between exact and approximate solutions of L G x = b w/ or w/o smoothing Table5shows the results of the directed Laplacian solver on different directed graphs. It reports relative errors between the exact solution and the solution calculated by the proposed solver with and without smoothing. It shows that errors can be dramatically reduced after smoothing, and our proposed solver can well approximate the true solution of L G x = b

