A GENERALIZED EIGENGAME WITH EXTENSIONS TO DEEP MULTIVIEW REPRESENTATION LEARNING

Abstract

Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning extensions of CCA require large mini-batch sizes, and therefore large memory consumption, in the stochastic setting to achieve good performance and this has limited its application in practice. Inspired by the Generalized Hebbian Algorithm, we develop an approach to solving stochastic GEPs in which all constraints are softly enforced by Lagrange multipliers. Then by considering the integral of this Lagrangian function, its pseudo-utility, and inspired by recent formulations of Principal Components Analysis and GEPs as games with differentiable utilities, we develop a game-theory inspired approach to solving GEPs. We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case but our method permits extension to general function approximators like neural networks for certain GEPs for dimensionality reduction including CCA which means our method can be used for deep multiview representation learning. We demonstrate the effectiveness of our method for solving GEPs in the stochastic setting using canonical multiview datasets and demonstrate state-of-the-art performance for optimizing Deep CCA.

1. INTRODUCTION

A Generalised Eigenvalue Problem (GEP) is defined by two symmetricfoot_0 matrices A, B ∈ R d×d . They are usually characterised by the set of solutions to the equation w ⊤ Aw subject to w ⊤ Bw = 1, w ⊤ Bw j = 0 for j = 1, . . . , k 1. Aw = λBw (1) with λ ∈ R, w ∈ R d , (2) There is also a simpler (non-iterative) variational characterisation for the top-k subspace (that spanned by {w 1 , . . . , w k }), namely max W ∈R d×k trace(W ⊤ AW ) subject to W ⊤ BW = I k (3) again see Stewart & Sun (1990) ; the drawback of this characterisation is it only recovers the subspace and not the individual eigenvectors. We shall see that these two different characterisations lead to different algorithms for the GEP.



or, more generally, HermitianStewart & Sun (1990) 1



called (generalised) eigenvalue and (generalised) eigenvector respectively. Note that by taking B = I we recover the standard eigenvalue problem. We shall only be concerned with the case where B is positive definite to avoid degeneracy; in this case one can find a basis of eigenvectors spanning R d . Without loss of generality, take w 1 , . . . , w d such a basis of eigenvectors, with decreasing corresponding eigenvalues λ 1 ≥ • • • ≥ λ d . The following variational characterisation Stewart & Sun (1990) provides a useful alternative, iterative definition: w k solves

