NEURAL COLLAPSE INSPIRED FEATURE-CLASSIFIER ALIGNMENT FOR FEW-SHOT CLASS INCREMENTAL LEARNING

Abstract

Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions. Finetuning the backbone or adjusting the classifier prototypes trained in the prior sessions would inevitably cause a misalignment between the feature and classifier of old classes, which explains the well-known catastrophic forgetting problem. In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF). It corresponds to an optimal geometric structure for classification due to the maximized Fisher Discriminant Ratio. We propose a neural collapse inspired framework for FSCIL. A group of classifier prototypes are pre-assigned as a simplex ETF for the whole label space, including the base session and all the incremental sessions. During training, the classifier prototypes are not learnable, and we adopt a novel loss function that drives the features into their corresponding prototypes. Theoretical analysis shows that our method holds the neural collapse optimality and does not break the feature-classifier alignment in an incremental fashion. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.

1. INTRODUCTION

Learning incrementally and learning with few-shot data are common in the real-world implementations, and in many applications, such as robotics, the two demands emerge simultaneously. Despite the great success in a closed label space, it is still challenging for a deep learning model to learn new classes continually with only limited samples (LeCun et al., 2015) . To this end, few-shot classincremental learning (FSCIL) was proposed to tackle this problem (Tao et al., 2020b) . Compared with few-shot learning (Ravi & Larochelle, 2017; Vinyals et al., 2016) , FSCIL transfers a trained model into new label spaces incrementally. It also differs from incremental learning (Cauwenberghs & Poggio, 2000; Li & Hoiem, 2017; Rebuffi et al., 2017) in that there are only a few (usually 5) samples accessible for each new class in the incremental sessions. For each session's evaluation, the model is required to infer test images coming from all the classes that have been encountered. The base session of FSCIL contains a large label space and sufficient training samples, while each incremental session only has a few novel classes and labeled images. It poses the notorious catastrophic forgetting problem (Goodfellow et al., 2013) because the novel sessions have no access to the data of the previous sessions. Due to the importance and difficulty, FSCIL has attracted much research attention. The initial solutions to FSCIL finetune the network on new session data with distillation schemes to reduce the forgetting of old classes (Tao et al., 2020b; Dong et al., 2021) . However, the few-shot data in novel sessions can easily induce over-fitting. Following studies favor training a backbone network on the We point out that it is the misalignment dilemma between feature and classifier that causes the catastrophic forgetting problem of old classes. If a backbone network is finetuned in novel sessions, the features of old classes will be easily deviated from their classifier prototypes. Alternatively, when a backbone network is fixed and a group of new prototypes for novel classes are learned incrementally, the adjustment of old-class prototypes will also induce misalignments with their fixed features. In this paper, we pose and study the following question, "Can we look for and pre-assign an optimal feature-classifier alignment such that the model is optimized towards the fixed optimality, so avoids conflict among sessions?"

1.1. MOTIVATIONS AND CONTRIBUTIONS

Neural collapse is a recently discovered phenomenon that at the terminal phase of training (after 0 training error rate), the last-layer features of the same class will collapse into a single vertex, and the vertices of all classes will be aligned with their classifier prototypes and be formed as a simplex equiangular tight frame (ETF) (Papyan et al., 2020) . A simplex ETF is a geometric structure of K vectors in R d , d ≥ K-1. All vectors have the same 2 norm of 1 and any pair of two different vectors has an inner product of -1 K-1 , which corresponds to the largest possible angle of K equiangular vectors. Particularly when d = K -1, a simplex ETF reduces to a regular simplex such as triangle and tetrahedron. It describes an optimal geometric structure for classification due to the minimized within-class variance and the maximized between-class variance (Martinez & Kak, 2001) , which indicates that the Fisher Discriminant Ratio (Fisher, 1936; Rao, 1948) is maximized. Following studies aim to theoretically explain this phenomenon (Fang et al., 2021; Han et al., 2022) .



(a) prior studies: classifier prototypes are learnable (b) our solution: classifier prototypes are pre-assigned and fixed (new) (new) current prototypes prototypes in the last session misalignment

Figure1: A popular choice in prior studies is to evolve the old-class prototypes via delicate design of loss or regularizer to keep them separated from novel-class prototypes, but will cause misalignment. As a comparison, we pre-assign and fix an optimal feature-classifier alignment, and then train a model towards the same neural collapse optimality in each session to avoid target conflict.base session as a feature extractor(Zhang et al., 2021; Hersche et al., 2022; Akyürek et al., 2022). For novel sessions, the backbone network is fixed and a group of novel-class prototypes (classifier vectors) are learned incrementally. But as shown in Figure1(a), the newly added prototypes may lie close to the old-class prototypes, which impedes the ability to discriminate between the old-class and the novel-class samples in evaluation. As a result, adjusting the classifier prototypes is always necessary for two goals: (i) keep a sufficient distance between the old-class and the novel-class prototypes; (ii) prevent the adjusted old-class prototypes from shifting far away from their original positions. However, the two goals rely on sophisticated loss functions or regularizers(Chen & Lee,  2021; Hersche et al., 2022; Akyürek et al., 2022), and are hard to attain simultaneously without qualification. Besides, as shown in Figure1(a), there will be a misalignment between the adjusted classifier and the fixed features of old classes. A recent study proposes to reserve feature space for novel classes to circumvent their conflict with old classes(Zhou et al., 2022a), but an optimal feature-classifier alignment is hard to be guaranteed with learnable classifier(Pernici et al., 2021).

availability

https://github.com/NeuralCollapseApplications/FSCIL 

