EIGENGAME: PCA AS A NASH EQUILIBRIUM

Abstract

We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm-which combines elements from Oja's rule with a generalized Gram-Schmidt orthogonalization-is naturally decentralized and hence parallelizable through message passing. We demonstrate the scalability of the algorithm with experiments on large image datasets and neural network activations. We discuss how this new view of PCA as a differentiable game can lead to further algorithmic developments and insights.

1. INTRODUCTION

The principal components of data are the vectors that align with the directions of maximum variance. These have two main purposes: a) as interpretable features and b) for data compression. Recent methods for principal component analysis (PCA) focus on the latter, explicitly stating objectives to find the k-dimensional subspace that captures maximum variance (e.g., (Tang, 2019)), and leaving the problem of rotating within this subspace to, for example, a more efficient downstream singular value (SVD) decomposition step 1 . This point is subtle, yet critical. For example, any pair of twodimensional, orthogonal vectors spans all of R 2 and, therefore, captures maximum variance of any two-dimensional dataset. However, for these vectors to be principal components, they must, in addition, align with the directions of maximum variance which depends on the covariance of the data. By learning the optimal subspace, rather than the principal components themselves, objectives focused on subspace error ignore the first purpose of PCA. In contrast, modern nonlinear representation learning techniques focus on learning features that are both disentangled (uncorrelated) and low dimensional (Chen et al., 2016; Mathieu et al., 2018; Locatello et al., 2019; Sarhan et al., 2019) . It is well known that the PCA solution of the d-dimensional dataset X ∈ R n×d is given by the eigenvectors of X X or equivalently, the right singular vectors of X. Impractically, the cost of computing the full SVD scales with O(min{nd 2 , n 2 d})-time and O(nd)-space (Shamir, 2015; Tang, 2019) . For moderately sized data, randomized methods can be used (Halko et al., 2011) . Beyond this, stochastic-or online-methods based on Oja's rule (Oja, 1982) or power iterations (Rutishauser, 1971) are common. Another option is to use streaming k-PCA algorithms such as Frequent Directions (FD) (Ghashami et al., 2016) or Oja's algorithm 2 (Allen-Zhu and Li, 2017) with storage complexity O(kd). Sampling or sketching methods also scale well, but again, focus on the top-k subspace (Sarlos, 2006; Cohen et al., 2017; Feldman et al., 2020) . In contrast to these approaches, we view each principal component (equivalently eigenvector) as a player in a game whose objective is to maximize their own local utility function in controlled competition with other vectors. The proposed utility gradients are interpretable as a combination of Oja's rule and a generalized Gram-Schmidt process. We make the following contributions: • A novel formulation of PCA as finding the Nash equilibrium of a suitable game, • A sequential, globally convergent algorithm for approximating the Nash on full-batch data, 1 After learning the top-k subspace V ∈ R d×k , the rotation can be recovered via an SVD of XV . 2 FD approximates the top-k subspace; Oja's algorithm approximates the top-k eigenvectors. 1

