GRAPH LEARNING VIA SPECTRAL DENSIFICATION

Abstract

Graph learning plays an important role in many data mining and machine learning tasks, such as manifold learning, data representation and analysis, dimensionality reduction, data clustering, and visualization, etc. For the first time, we present a highly-scalable spectral graph densification approach (GRASPEL) for graph learning from data. By limiting the precision matrix to be a graph-Laplacian-like matrix in graphical Lasso, our approach aims to learn ultra-sparse undirected graphs from potentially high-dimensional input data. A very unique property of the graphs learned by GRASPEL is that the spectral embedding (or approximate effective-resistance) distances on the graph will encode the similarities between the original input data points. By interleaving the latest highperformance nearly-linear time spectral methods, ultrasparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as manifold learning, spectral clustering (SC), and dimensionality reduction.

1. INTRODUCTION

Graph learning is playing increasingly important roles in many machine learning and data mining applications. For example, a key step of many existing machine learning methods requires converting potentially high-dimensional data sets into graph representations: it is a common practice to represent each (high-dimensional) data point as a node, and assign each edge a weight to encode the similarity between the two nodes (data points). The constructed graphs can be efficiently leveraged to represent the underlying structure of a data set or the relationship between data points (Jebara et al., 2009; Maier et al., 2009; Liu et al., 2018) . However, how to learn meaningful graphs from large data set at scale still remains a challenging problem. Several recent graph learning methods leverage emerging graph signal processing (GSP) techniques for estimating sparse graph Laplacians, which show very promising results (Dong et al., 2016; Egilmez et al., 2017; Dong et al., 2019; Kalofolias & Perraudin, 2019) . For example, (Egilmez et al., 2017) addresses the graph learning problem by restricting the precision matrix to be a graph Laplacian and maximizing a posterior estimation of attractive Gaussian Markov Random Field (GMRF)foot_0 , while an l1-regularization term is used to promote graph sparsity; (Rabbat, 2017) provides an error analysis for inferring sparse graphs from smooth signals; (Kalofolias & Perraudin, 2019) leverages approximate nearest-neighbor (ANN) graphs to reduce the number of variables for optimization; (Kumar et al., 2019) introduces a graph Laplacian learning method by imposing Laplacian spectral constraints. However, even the state-of-the-art Laplacian estimation methods for graph learning do not scale well for large data set due to their extremely high algorithm complexity. For example, solving the optimization problem for Laplacian estimation in (Dong et al., 2016; Kalofolias, 2016; Egilmez et al., 2017; Dong et al., 2019) requires O(N 2 ) time complexity per iteration for N data entities and nontrivial parameters tuning for controlling graph sparsity which limits their applications to only very small data sets (e. g. with up to a few thousands of data points); the method introduced in (Carey, 2017) leverages Isomap manifold embedding (Tenenbaum et al., 2000) for graph construction, which requires O(N 3 ) time for manifold construction and thus does not scale to large data set; the latest graph learning approach (Kalofolias & Perraudin, 2019) takes advantages of ANN graphs, but it still runs very slowly for large data sets; the Laplacian estimation method with spectral constraints requires a good graph structure to be provided in advance (Kumar et al., 2019) , which otherwise can be very costly when going through exhaustive graph structure searches. This work for the first time introduces a spectral graph densification approach (GRASPEL) for learning ultra-sparse graphs from data by leveraging the latest results in spectral graph theory (Feng, 2016; 2018; Zhao et al., 2018) . GRASPEL has a close connection with prior GSP-based Laplacian estimation methods (Dong et al., 2016; Kalofolias, 2016; Egilmez et al., 2017; Kalofolias & Perraudin, 2019; Dong et al., 2019) , and the graphical Lasso method (Friedman et al., 2008) . By treating M -dimensional data points as M graph signals, GRASPEL allows efficiently solving a convex problem by iterative identifying and including the most spectrally-critical edges into the latest graph leveraging recent nearly-linear time spectral methods (Feng, 2016; 2018; Zhao et al., 2018) . Compared with prior spectral graph sparsification algorithms (Spielman & Srivastava, 2011; Feng, 2016) that aim to remove edges from a given graph while preserving key graph spectral properties, GRASPEL aims to add edges into the graph such that the learned graphs will have spectral embedding (or effective-resistance) distances encoding the distances between the original input data points. Comparing with state-of-the-art graph learning methods, GRASPEL is more scalable for estimation of attractive Gaussian Markov Random Fields (GMRFs) for even very large data set. We summarize the contribution of this work as follows: • We propose a spectral graph densification approach (GRASPEL) that allows efficient estimation of attractive Gaussian Markov Random Fields (GMRFs) leveraging the latest spectral graph theory. • We show that the graphical Lasso problem with a Laplacian-like precision matrix can be efficiently solved by including spectrally-critical edges to dramatically reduce spectral embedding distortions. • The key to achieving high efficiency is a spectral embedding scheme for finding spectrally-critical edges, allowing each GRASPEL iteration to be completed in O(N log N ) instead of O(N 2 ) time. • For the first time, we introduce a novel convergence criterion for graph learning tasks based on graph spectral stability: when the maximum embedding distortion becomes small enough, or equivalently when graph spectra become sufficiently stable, GRASPEL iterations can be terminated. • Our experiment results show that the graphs learned from high-dimensional data using GRASPEL can lead to more efficient and accurate spectral clustering (SC) as well as dimensionality reduction.

2. BACKGROUND OF GRAPH LEARNING VIA LAPLACIAN ESTIMATION

Given M observations on N data entities in a data matrix X = [x 1 , ..., x M ] ∈ R N ×M , each column vector of X can be considered as a signal on a graph. For example, the USPS data set including 9, 298 images of handwritten digits with each image having 256 pixels will result in a feature matrix X ∈ R N ×M with N = 9, 298 and M = 256. The recent GSP-based graph learning methods (Dong et al., 2016) estimate graph Laplacians from X for achieving the following desired characteristics: Smoothness of Graph Signals. The graph signals corresponding to the real-world data should be sufficiently smooth on the learned graph structure: the signal values will only change gradually across connected neighboring nodes. The smoothness of a signal x over an undirected graph G = (V, E, w) can be measured with the following Laplacian quadratic form: x Lx = (p,q)∈E w p,q (x (p) -x (q)) 2 , where L = D -W denotes the Laplacian matrix of graph G with D and W denoting the degree and the weighted adjacency matrices of G, and w p,q denotes the weight for edge (p, q). The smaller value of (1) indicates the smoother signals across the graph. To quantify the smoothness (Q) of a set of signals X over graph G, the following matrix trace can be computed (Kalofolias, 2016): Q(X, L) = Tr(X LX), where Tr(•) denotes the matrix trace. Sparsity of the Estimated Graph (Laplacian). Graph sparsity is another critical consideration in graph learning. One of the most important motivations of learning a graph is to use it for downstream data mining or machine learning tasks. Therefore, desired graph learning algorithms should allow better capturing and understanding the global structure (manifold) of the data set, while producing



If the precision matrix of a GMRF is an M-matrix with all non-negative off-diagonal elements, we call it an attractive GMRF (Slawski & Hein, 2015; Dong et al., 2019).

