GRASSMANNIAN CLASS REPRESENTATION IN DEEP LEARNING

Abstract

We generalize the class representative vector found in deep classification networks to linear subspaces and show that the new formulation enables the simultaneous enhancement of the inter-class discrimination and intra-class feature variation. Traditionally, the logit is computed by the inner product between a feature and the class vector. In our modeling, classes are subspaces and the logit is defined as the norm of the projection from a feature onto the subspace. Since the set of subspaces forms Grassmann manifolds, finding the optimal subspace representation for classes is to optimize the loss on a Grassmannian. We integrate the Riemannian SGD into existing deep learning frameworks such that the class subspaces in a Grassmannian are jointly optimized with other model parameters in Euclidean. Compared to the vector form, subspaces have two appealing properties: they can be multi-dimensional and they are scaleless. Empirically, we reveal that these distinct characteristics improve various tasks. (1) Image classification. The new formulation brings the top-1 accuracy of ResNet50-D on ImageNet-1K from 78.04% to 79.37% using the standard augmentation in 100 training epochs. This confirms that the representative capability of subspaces is more powerful than vectors. (2) Feature transfer. Subspaces provide freedom for features to vary and we observed that the intra-class variability of features increases when the subspace dimensions are larger. Consequently, the quality of features is better for downstream tasks. The average transfer accuracy across 6 datasets improves from 77.98% to 80.12% compared to the strong baseline of vanilla softmax. (3) Long-tail classification. The scaleless property of subspaces benefits classification in the long-tail scenario and improves the accuracy of ImageNet-LT from 46.83% to 48.94% compared to the standard formulation. With these encouraging results, we believe that more applications could benefit from the Grassmannian class representation. Codes will be released.

1. INTRODUCTION

The idea of representing classes as linear subspaces in machine learning can be dated back, at least, to 1973 (Watanabe & Pakvasa (1973) ), yet it is mostly ignored in the current deep learning literature. In this paper, we revisit the scheme of representing classes as linear subspaces in the deep learning context. To be specific, each class i is associated with a linear subspace S i , and for any feature vector x, the i-th class logit is defined as the norm of projection l i := proj Si x . (1) Since a subspace is a point in the Grassmann manifold (Absil et al. ( 2009)), we call this formulation the Grassmannian class representation. In the following, we answer the two critical questions, 1. Is Grassmannian class representation useful in real applications?

2.. How to optimize the subspaces in training?

The procedure fully-connected layer → softmax → cross-entropy loss is the standard practice in deep classification networks. Each column of the weight matrix of the fullyconnected layer is called the class representative vector and serves as a prototype for one class. This representation of class has achieved huge success, yet it is not without imperfections.

