THE SYMMETRIC GENERALIZED EIGENVALUE PROBLEM AS A NASH EQUILIBRIUM

Abstract

The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive when dealing with streaming data sets (i.e., minibatches) and research has instead concentrated on finding efficient solutions to specific problem instances. In this work, we develop a game-theoretic formulation of the top-k SGEP whose Nash equilibrium is the set of generalized eigenvectors. We also present a parallelizable algorithm with guaranteed asymptotic convergence to the Nash. Current state-ofthe-art methods require O(d 2 k) runtime complexity per iteration which is prohibitively expensive when the number of dimensions (d) is large. We show how to modify this parallel approach to achieve O(dk) runtime complexity. Empirically we demonstrate that this resulting algorithm is able to solve a variety of SGEP problem instances including a large-scale analysis of neural network activations.

1. INTRODUCTION

This work considers the symmetric generalized eigenvalue problem (SGEP), Av = Bv (1) where A is symmetric and B is symmetric, positive definite. While the SGEP is not a common sight in modern machine learning literature, remarkably, it underlies several fundamental problems. Most obviously, when A = X > X, B = I, and X is a data matrix, we recover the ubiquitous SVD/PCA. However, by considering other forms of A and B we recover other well known problems. In general, we assume A and B consist of sums or expectations over outerproducts (e.g., X > Y or E[xy > ]) to enable efficient matrix-vector products. These include, but are not limited to: Canonical Correlation Analysis (CCA): Given a dataset of paired observations (or views) x 2 R dx and y 2 R dy (e.g., gene expressions x and medical imaging y corresponding to the same patient), CCA returns the linear projections of x and y that are maximally correlated. CCA is particularly useful for learning multi-modal representations of data and in semi-supervised learning (McWilliams et al., 2013) ; it is effectively the multi-view generalization of PCA (Guo & Wu, 2019) where A and B contain the cross-and auto-covariances of the two views respectively: A = " 0 E[xy > ] E[yx > ] 0 # B = " E[xx > ] 0 0 E[yy > ] # . ⇤ Asterisk denotes equal contribution. † Work done while at DeepMind.

