TENSOR-BASED SKETCHING METHOD FOR THE LOW-RANK APPROXIMATION OF DATA STREAMS

Abstract

Low-rank approximation in data streams is a fundamental and significant task in computing science, machine learning and statistics. Multiple streaming algorithms have emerged over years and most of them are inspired by randomized algorithms, more specifically, sketching methods. However, many algorithms are not able to leverage information of data streams and consequently suffer from low accuracy. Existing data-driven methods improve accuracy but the training cost is expensive in practice. In this paper, from a subspace perspective, we propose a tensor-based sketching method for low-rank approximation of data streams. The proposed algorithm fully exploits the structure of data streams and obtains quasi-optimal sketching matrices by performing tensor decomposition on training data. A series of experiments are carried out and show that the proposed tensor-based method can be more accurate and much faster than the previous work.

1. INTRODUCTION

There are many scenarios that require batch or real-time processing of data streams arising from, e.g., video (Cyganek & Woźniak, 2017; Das, 2021) , signal flow (Cichocki et al., 2015; Sidiropoulos et al., 2017) , hyperspectral images (Wang et al., 2017; Zhang et al., 2019) and numerical simulations (Zhang et al., 2022; Larcher & Klein, 2019) . A data stream can be seen as an ordered sequence of data continuously generated from one or several distributions (Muthukrishnan, 2005; Indyk et al., 2019) , and the data per time slot can be usually represented as a matrix. Therefore, most of the processing methods of data streams can be considered as operations on matrices, such as matrix multiplications, linear system solutions and low-rank approximation. Wherein, low-rank matrix approximation plays an important role in practical applications, such as independent component analysis (ICA) (Stone, 2002; Hyvärinen, 2013) , principle component analysis (PCA) (Karamizadeh et al., 2020; Jolliffe & Cadima, 2016) , image denoising (Guo et al., 2015; Zhang et al., 2019) . In this work, we consider low-rank approximation of matrices from a data stream. Specifically, let {A d ∈ R m×n } D d=1 be matrices from a data stream D, then the low-rank approximation in D can be described as: min B d ∥A d -B d ∥ F , s.t. rank(B d ) ≤ r, (1.1) where d = 1, 2, • • • , D, ∥ • ∥ F represents the Frobenius norm, and r ∈ Z + is a user-specified target rank.

Related work.

A direct approach to solve problem 1.1 is to calculate the truncated rank-r singular value decomposition (SVD) of A d in turn, and the Eckart-Young theorem ensures that it is the best low-rank approximation (Eckart & Young, 1936) . However, it is too expensive to one by one calculate the truncated rank-r SVD of A d for all d = 1, 2, • • • , D, particularly when m or n is large. To address this issue, many sketching algorithms have emerged such as the SCW algorithm (Sarlos, 2006; Clarkson & Woodruff, 2009; 2017) . Unfortunately, a notable weakness of sketching algorithms is that they achieve higher error than the best low-rank approximation, especially when the sketching matrix is generated randomly from some distribution, such as Gaussian, Cauchy, or Rademacher distribution (Indyk, 2006; Woolfe et al., 2008; Clarkson & Woodruff, 2009; Halko et al., 2011; Clarkson & Woodruff, 2017) . To improve accuracy, a natural idea is to perform a preprocessing on the past data (seen as a training set) in order to better handle the future input matrices (seen as a test set). This approach, which is often called the data-driven approach, has gained more attention lately. For low-rank approximation, the pioneer of this work was (Indyk et al., 2019) , who proposed a learning-based method, that we henceforth refer to as IVY. In the IVY method, the sketching matrix is set to be sparse, and the values of non-zero entries are learned instead of setting them randomly as classical methods do. Specifically, learning is done by stochastic gradient descent (SGD), by optimizing a loss function that portrays the quality of the low-rank approximation obtained by the SCW algorithm as mentioned above. To improve accuracy, (Liu et al., 2020) followed the line of IVY by additionally optimizing the location of the non-zero entries of the sketching matrix S, not only their values. Recently, (Indyk et al., 2021) proposed a Few-Shot data-driven low-rank approximation algorithm, and their motivation is to reduce the training time cost of (Indyk et al., 2019) . Wherein, they proposed an algorithm namely FewShotSGD by minimizing a new loss function that measures the distance in subspace between the sketching matrix S and all left-SVD factor matrices of the training matrices, with SGD. However, these data-driven approaches all involve learning mechanisms, which require iterations during the optimization process. This raises a question: can we design an efficient method, such as a non-iterative method, to get a better sketching matrix with both short training time and high approximation quality? It would be an important step for the development of data-driven methods, especially in scenarios requiring low latency. Our contributions. In this work, we propose a new data-driven approach for low-rank approximation of data streams, motivated by a subspace perspective. Specifically, we observe that a perfect sketching matrix S ∈ R k×m should be close to the top-k subspace of U d , where U d is the left-SVD factor matrix of A d . Due to the relevance of matrices in a data stream, it allows us to develop a new sketching matrix S to approximate the top-k subspace of U d for all d = 1, • • • , D. Perhaps the heavy learning mechanisms can be eliminated. In fact, our approach attains the sketching matrix by minimizing a new loss function which is a relaxation of that in IVY. The most important thing is that we can get the minimization of this loss function by tensor decomposition on the training set, which is non-iterative. We refer to this method as tensor-based method. As an extension of the main approach, we also develop the two-sided tensor-based algorithm, which involves two sketching matrices S, W . These two sketching matrices can be obtained simultaneously by performing tensor decomposition once. Both algorithms are significantly faster and more accurate than the previous data-driven approaches.

2. PRELIMINARIES

The SCW algorithm. Randomized SVD is an efficient algorithm for computing the low-rank approximation of matrices from a data stream. For example, the SCW algorithm, proposed by Sarlos, Clarkson and Woodruff (Sarlos, 2006; Clarkson & Woodruff, 2009; 2017) , is a classical randomized SVD algorithm. The algorithm only computes the SVD of the compressed matrices SA and AV , and its time cost is O(r 2 (m + n)) when we set k = O(r). The detailed procedure is shown in Algorithm 1. Algorithm 1 The SCW algorithm (Sarlos, 2006; Clarkson & Woodruff, 2009; 2017) . 



Matrix A ∈ R m×n , sketching matrix S ∈ R k×m , and target rank r < min{m, n}1: ∼, ∼, V T ← full SVD of SA 2: [AV ] r ← truncated rank-r SVD of AV 3: Â ← [AV ] r V T Output: Low-rank approximation of A: Â In (Clarkson & Woodruff,2009), it is proved that if S satisfies the property of Johnson-Lindenstrauss Lemma, k = O(r log(1/δ)/ε) suffices the output Â to satisfy ∥A -Â∥ F ≤ (1 + ε)∥A -[A] r ∥ F

