EFFECTIVE SUBSPACE INDEXING VIA INTERPOLATION ON STIEFEL AND GRASSMANN MANIFOLDS

Abstract

We propose a novel local Subspace Indexing Model with Interpolation (SIM-I) for low-dimensional embedding of image data sets. Our SIM-I is constructed via two steps: in the first step we build a piece-wise linear affinity-aware subspace model under a given partition of the data set; in the second step we interpolate between several adjacent linear subspace models constructed previously using the "center of mass" calculation on Stiefel and Grassmann manifolds. The resulting subspace indexing model built by SIM-I is a globally non-linear low-dimensional embedding of the original data set. Furthermore, the interpolation step produces a "smoothed" version of the piece-wise linear embedding mapping constructed in the first step, and can be viewed as a regularization procedure. We provide experimental results validating the effectiveness of SIM-I, that improves PCA recovery for SIFT data set and nearest-neighbor classification success rates for MNIST and CIFAR-10 data sets.



)). They are looking for globally linear subspace models. Therefore, they fail to estimate the nonlinearity of the intrinsic data manifold, and ignore the local variation of the data (Saul & Roweis (2003 ), Strassen (1969) ). Consequently, these globally linear models are often ineffective for search problems on large scale image data sets. To resolve this difficulty, nonlinear algorithms such as kernel algorithms (Ham et al. (2004) ) and manifold learning algorithms (Belkin et al. (2006 ), Guan et al. (2011) ) are proposed. However, even though these nonlinear methods significantly improve the recognition performance, they face a serious computational challenge dealing with large-scale data sets due to the complexity of matrix decomposition at the size of the number of training samples. Here we propose a simple method, Subspace Indexing Model with Interpolation (SIM-I), that produces from a given data set a piece-wise linear, locality-aware and globally nonlinear model of low-dimensional embedding. SIM-I is constructed via two steps: in the first step we build a piecewise linear affinity-aware subspace model under a given partition of the data set; in the second step we interpolate between several adjacent linear subspace models constructed previously using the "center of mass" calculation on Stiefel and Grassmann manifolds (Edleman et al. (1999) , Kaneko et al. (2013) , Marrinan et al. (2014) ). The interpolation step outputs a "smoothed" version (Figure 1 ) of the original piece-wise linear model, and can be regarded as a regularization process. Compared to previously mentioned subspace methods, SIM-I enjoys the following advantages: (1) it captures the global nonlinearity and thus the local fluctuations of the data set; (2) it is computationally feasible to large-scale data sets since it avoids the complexity in matrix decomposition at the size of the number of training samples; (3) it includes a regularization step via interpolating between several adjacent pieces of subspace models. Numerical experiments on PCA recovery task for SIFT data set and classification tasks via nearest-neighbor method for MNIST and CIFAR-10 data sets further validate the effectiveness of SIM-I.

2. PIECE-WISE LINEAR LOCALITY PRESERVING PROJECTION (LPP) MODEL

If an image data point x ∈ R D is represented as a vector in a very high-dimensional space, then we want to find a low-dimensional embedding y = f (x) ∈ R d , d < < D such that the embedding function f retains some meaningful properties of the original image data set, ideally close to its intrinsic dimension. If we restrict ourselves to linear maps of the form y = W T x ∈ R d , where the D × d projection matrix W = (w ij ) 1≤i≤D,1≤j≤d (assuming full rank), then such a procedure is called a locally linear low-dimensional embedding (see Roweis & Saulm (2000) ; Van Der Maaten et al. (2009) ). The target is to search for a "good" projection matrix W , such that the projection x → y = W T x must preserve certain locality in the data set (this is called a Locality Preserving Projection, or LPP projection, see He & Niyogi (2003) ). The locality is interpreted as a kind of intrinsic relative geometric relations between the data points in the original high-dimensional space, usually represented by the affinity matrix S = (s ij ) 1≤i,j≤n (which is a symmetric matrix with non-negative terms). As an example, given unlabelled data points x 1 , ..., x n ∈ R D , we can take s ij = exp - xi-xj 2 2σ 2 when x i -x j < ε and s ij = 0 otherwise. Here σ > 0 and ε > 0 is a small threshold parameter, and x i -x j is the Euclidean norm in R D . Based on the affinity matrix S = (s ij ), the search for the projection matrix W can be formulated as the following optimization problem min W φ(W ) = 1 2 n i,j=1 s ij y i -y j 2 , in which y i = W T x i and y j = W T x j and the norm y i -y j is taken in the projected space R d . Usually when x i -x j is large, the affinity s ij will be small, and vice versa. Thus (1) is seeking for the embedding matrix W such that close pairs of image points x i and x j will be mapped to close pairs of embeddings y i = W T x i and y j = W T x j , and vice versa. This helps to preserve the local geometry of the data set, a.k.a the locality. To solve (1), we introduce a weighted fully-connected graph G where the vertex set consists of all data points x 1 , ..., x n and the weight on the edge connecting x i and x j is given by s ij ≥ 0. XLX T w = λXDX T w , (2) where X = [x 1 , ..., x n ] ∈ R D×n . Assume we have obtained an increasing family of eigenvalues 0 = λ 0 < λ 1 ≤ ... ≤ λ n-1 . Let the corresponding eigenvectors be w 0 , w 1 , ..., w n-1 . Then the low-dimensional embedding matrix can



have been successful in many application problems related to dimension reduction (Zhou et al. (2010), Bian & Tao (2011), Si et al. (2010), Zhang et al. (2009)), with applications including, e.g., human face recognition (Fu & Huang (2008)), speech and gait recognition (Tao et al. (2007)), etc.. The classical approaches of subspace selection in dimension reduction include algorithms like Principle Component Analysis (PCA, see Jolliffe (2002)) and Linear Discriminant Analysis (LDA, see Belhumeur et al. (1997), Tao et al. (

Figure 1: The idea of "smoothing" a piece-wise linear low-dimensional embedding model: (a) The piece-wise linear low-dimensional embedding model built from LPP; (b) The regularized lowdimensional embedding by taking Stiefel/Grassmann manifold center-of-mass among adjacent linear pieces.

Consider the diagonal matrix D = diag(D 11 , ..., D nn ) where D ii = n j=1 s ij , and we then introduce the graph Laplacian L = D -S. Then the minimization problem (1), together with the normalization constraint n i=1 ii y 2 i = 1, reduces to the following generalized eigenvalue problem (see He & Niyogi (2003))

