A NEW FRAMEWORK FOR TENSOR PCA BASED ON TRACE INVARIANTS

Abstract

We consider the Principal Component Analysis (PCA) problem for tensors T ∈ (R n ) ⊗k of large dimension n and of arbitrary order k ≥ 3. It consists in recovering a spike v ⊗k 0 (related to a signal vector v 0 ∈ R n ) corrupted by a Gaussian noise tensor Z ∈ (R n ) ⊗k such that T = βv ⊗k 0 + Z where β is the signal-to-noise ratio. In this paper, we propose a new framework based on tools developed by the theoretical physics community to address this important problem. They consist in trace invariants of tensors built by judicious contractions (extension of matrix product) of the indices of the tensor T. Inspired by these tools, we introduce a new process that builds for each invariant a matrix whose top eigenvector is correlated to the signal for β sufficiently large. Then, we give examples of classes of invariants for which we demonstrate that this correlation happens above the best algorithmic threshold (β ≥ n k/4 ) known so far. This method has many algorithmic advantages: (i) it provides a detection algorithm linear in time and with only O(1) memory requirements (ii) the algorithms are very suitable for parallel architectures and have a lot of potential of optimization given the simplicity of the mathematical tools involved (iii) experimental results show an improvement of the state of the art for the symmetric tensor PCA. We provide experimental results to these different cases that match well with our theoretical findings.

1. INTRODUCTION

Powerful computers and acquisition devices have made it possible to capture and store real-world multidimensional data. For practical applications (Kolda & Bader (2009) ), analyzing and organizing these high dimensional arrays (formally called tensors) lead to the well known curse of dimensionality (Gao et al. (2017 ),Suzuki (2019) ). Thus, dimensionality reduction is frequently employed to transform a high-dimensional data set by projecting it into a lower dimensional space while retaining most of the information and underlying structure. One of these techniques is Principal Component Analysis (PCA), which has made remarkable progress in a large number of areas thanks to its simplicity and adaptability (Jolliffe & Cadima (2016) ; Seddik et al. ( 2019)). In the Tensor PCA, as introduced by Richard & Montanari (2014), we consider a model where we attempt to detect and retrieve an unknown unit vector v 0 from noise-corrupted multi-linear measurements put in the form of a tensor T. Using the notations found below, our model consists in: T = βv ⊗k 0 + Z, with Z a pure Gaussian noise tensor of order k and dimension n with identically independent distributed (iid) standard Gaussian entries: Z i1,i2,...,i k ∼ N (0, 1) and β the signal-to-noise ratio. To solve this important problem, many methods have been proposed. However, practical applications require optimizable and parallelizable algorithms that are able to avoid the high computationally cost due to an unsatisfactory scalability of some of these methods. A summary of the time and space requirement of some existent methods can be found in Anandkumar et al. (2017) . One way to achieve this parallelizable algorithm is through methods based on tensor contractions (Kim et al. ( 2018)) which are extensions of the matrix product. These last years, tools based on tensor contractions have been developed by theoretical physicists where random tensors have emerged as a generalization of random matrices. In this paper, we investigate the algorithmic threshold of tensor PCA and some of its variants using the theoretical physics approach and we show that it leads to new insights and knowledge in tensor PCA. Tensor PCA and tensor decomposition (the recovery of multiple spikes) is motivated by the increasing number of problems in which it is crucial to exploit the tensorial structure (Sidiropoulos et al. ( Recently, a fundamentally different set of mathematical tools that have been developed for tensors in the context of high energy physics have been used to approach the problem. They consist in trace invariants of degree d ∈ N, obtained by contracting pair of indices of d copies of the tensor T. They have been used in Evnin (2020) to study the highest eigenvalue of a real symmetric Gaussian tensor. Subsequently, Gurau ( 2020) provided a theoretical study on a function based on an infinite sum of these invariants. Their results suggest a transition phase for the highest eigenvalue of a tensor for β around n 1/2 in a similar way to the BBP transition in the matrix case (Baik et al. (2005) ). Thus, this function allows the detection of a spike. However evaluating it involves computing an integral over a n-dimensional space, which may not be possible in a polynomial time. The contribution of this paper is the use of these invariant tools to build tractable algorithms with polynomial complexity. In contrast to Gurau (2020), instead of using a sum of an infinite number of invariants, we select one trace invariant with convenient properties to build our algorithms. It lets us detect the presence of the signal linearly in time and with a space requirement in O(1). Moreover, in order to recover the signal vector besides simply detecting it, we introduce new tools in the form of matrices associated to this specific invariant. Within this framework, we show as particular cases, that the two simpler graphs (of degree two) are similar to the tensor unfolding and the homotopy algorithms (which is equivalent to average gradient descent). These two algorithms are the main practical ones known from the point of view of space and time requirement (Anandkumar et al. (2017) provides a table comparison). Notations We use bold characters T, M , v for tensors, matrices and vectors and T ijk , M ij , v i for their components. [p] denotes the set {1, . . . , n}. A real k-th order tensor is of order k if it is a member of the tensor product of R ni , i ∈ [k]: T ∈ k i=1 R ni . It is symmetric if T i1...i k = T τ (i1)...τ (i k ) ∀τ ∈ S k where S k is the symmetric group (more details are provided in Appendix ??). For a vector v ∈ R n , we use v ⊗p ≡ v ⊗ v ⊗ • • • ⊗ v ∈ p R n to denote its p-th tensor power. v, w denotes the scalar product of v and w. Let's define the operator norm, which is equivalent to the highest eigenvalue of a tensor of any order: X op ≡ max {X i1,...,i k (w 1 ) i1 . . . (w k ) i k , ∀i ∈ {1, . . . , n}, w i ≤ 1} The trace of A is denoted Tr(A). We denote the expectation of a variable X by E(X) and its variance by σ(X). We say that a function f is negligible compared to a positive function g and we write f = o(g) if lim n→∞ f /g → 0. Einstein summation convention It is important to keep in mind throughout the paper that we will follow the Einstein summation convention: when an index variable appears twice in a single term and is not otherwise defined, it implies summation of that term over all the values of the index. For example: T ijk T ijk ≡ ijk T ijk T ijk . It is a common convention when addressing tensor problems that helps to make the equations more comprehensible.



)). Recently it was successfully used to address important problems in unsupervised learning (learning latent variable models, in particular latent Dirichlet allocation Anandkumar et al. (2014), Anandkumar et al. (2015)), supervised learning (training of two-layer neural networks, Janzamin et al. (2015)) and reinforcement learning (Azizzadenesheli et al. (2016)). Related work Tensor PCA was introduced by Richard & Montanari (2014) where the authors suggested and analyzed different methods to recover the signal vector like matrix unfolding and power iteration. Since then, various other methods were proposed. Hopkins et al. (2015) introduced algorithms based on the sum of squares hierarchy with the first proven algorithmic threshold of n k/4 . However this class of algorithm generally requires high computing resources and relies on complex mathematical tools (which makes its algorithmic optimization difficult). Other studied methods have been inspired by different perspectives like homotopy in Anandkumar et al. (2017), statistical physics (Arous et al. (2020), Ros et al. (2019), Wein et al. (2019) and Biroli et al. (2020)), quantum computing (Hastings (2020)) as well as statistical query (Dudeja & Hsu (2020)).

