SPARSE QUANTIZED SPECTRAL CLUSTERING

Abstract

Given a large data matrix, sparsifying, quantizing, and/or performing other entrywise nonlinear operations can have numerous benefits, ranging from speeding up iterative algorithms for core numerical linear algebra problems to providing nonlinear filters to design state-of-the-art neural network models. Here, we exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations. In particular, we show that very little change occurs in the informative eigenstructure, even under drastic sparsification/quantization, and consequently that very little downstream performance loss occurs when working with very aggressively sparsified or quantized spectral clustering problems. We illustrate how these results depend on the nonlinearity, we characterize a phase transition beyond which spectral clustering becomes possible, and we show when such nonlinear transformations can introduce spurious non-informative eigenvectors.

1. INTRODUCTION

Sparsifying, quantizing, and/or performing other entry-wise nonlinear operations on large matrices can have many benefits. Historically, this has been used to develop iterative algorithms for core numerical linear algebra problems (Achlioptas & McSherry, 2007; Drineas & Zouzias, 2011) . More recently, this has been used to design better neural network models (Srivastava et al., 2014; Dong et al., 2019; Shen et al., 2020) . A concrete example, amenable to theoretical analysis and ubiquitous in practice, is provided by spectral clustering, which can be solved by retrieving the dominant eigenvectors of X T X, for X = [x 1 , . . . , x n ] ∈ R p×n a large data matrix (Von Luxburg, 2007) . When the amount of data n is large, the Gram "kernel" matrix X T X can be enormous, impractical even to form and leading to computationally unaffordable algorithms. For instance, Lanczos iteration that operates through repeated matrix-vector multiplication suffers from an O(n 2 ) complexity (Golub & Loan, 2013) and quickly becomes burdensome. One approach to overcoming this limitation is simple subsampling: dividing X into subsamples of size εn, for some ε ∈ (0, 1), on which one performs parallel computation, and then recombining. This leads to computational gain, but at the cost of degraded performance, since each data point x i looses the cumulative effect of comparing to the whole dataset. An alternative cost-reduction procedure consists in uniformly randomly "zeroing-out" entries from the whole matrix X T X, resulting in a sparse matrix with only an ε fraction of nonzero entries. For spectral clustering, by focusing on the eigenspectrum of the "zeroed-out" matrix, Zarrouk et al. (2020) showed that the same computational gain can be achieved at the cost of a much less degraded performance: for n/p rather large, almost no degradation is observed down to very small values of ε (e.g., ε ≈ 2% for n/p 100). Previous efforts showed that it is often advantageous to perform sparsification/quantization in a nonuniform manner, rather than uniformly (Achlioptas & McSherry, 2007; Drineas & Zouzias, 2011) . The focus there, however, is often on (non-asymptotic bounds of) the approximation error between

