ON THE SOFT-SUBNETWORK FOR FEW-SHOT CLASS INCREMENTAL LEARNING

Abstract

Inspired by Regularized Lottery Ticket Hypothesis, which states that competitive smooth (non-binary) subnetworks exist within a dense network, we propose a fewshot class-incremental learning method referred to as Soft-SubNetworks (SoftNet). Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones. SoftNet jointly learns the model weights and adaptive non-binary soft masks at a base training session in which each mask consists of the major and minor subnetwork; the former aims to minimize catastrophic forgetting during training, and the latter aims to avoid overfitting to a few samples in each new training session. We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.

1. INTRODUCTION

Lifelong Learning, or Continual Learning, is a learning paradigm to expand knowledge and skills through sequential training of multiple tasks (Thrun, 1995) . According to the accessibility of task identity during training and inference, the community often categorizes the field into specific problems, such as task-incremental (Pfülb and Gepperth, 2019; Delange et al., 2021; Yoon et al., 2020; Kang et al., 2022 ), class-incremental (Chaudhry et al., 2018; Kuzborskij et al., 2013; Li and Hoiem, 2017; Rebuffi et al., 2017; Kemker and Kanan, 2017; Castro et al., 2018; Hou et al., 2019; Wu et al., 2019) , and task-free continual learning (Aljundi et al., 2019; Jin et al., 2021; Pham et al., 2022; Harrison et al., 2020) . While the standard scenarios for continual learning assume a sufficiently large number of instances per task, a lifelong learner for real-world applications often suffers from insufficient training instances for each problem to solve. This paper aims to tackle the issue of limited training instances for practical Class-Incremental Learning (CIL), referred to as Few-Shot CIL (FSCIL) (Ren et al., 2019; Chen and Lee, 2020; Tao et al., 2020; Zhang et al., 2021; Cheraghian et al., 2021; Shi et al., 2021) . However, there are two critical challenges in solving FSCIL problems: catastrophic forgetting and overfitting. Catastrophic forgetting (Goodfellow et al., 2013; Kirkpatrick et al., 2017) or Catastrophic Interference McCloskey and Cohen (1989) is a phenomenon in which a continual learner loses the previously learned task knowledge by updating the weights to adapt to new tasks, resulting in significant performance degeneration on previous tasks. Such undesired knowledge drift is irreversible since the scenario does not allow the model to revisit past task data. Recent works propose to mitigate catastrophic forgetting for class-incremental learning, often categorized in multiple directions, such as constraint-based (Rebuffi et al., 2017; Castro et al., 2018; Hou et al., 2018; 2019; Wu et al., 2019 ), memory-based (Rebuffi et al., 2017; Chen and Lee, 2020; Mazumder et al., 2021; Shi et al., 2021) , and architecture-based methods (Mazumder et al., 2021; Serra et al., 2018; Mallya and Lazebnik, 2018; Kang et al., 2022) . However, we note that catastrophic forgetting becomes further challenging in FSCIL. Due to the small amount of training data for new tasks, the model tends to severely overfit to new classes and quickly forget old classes, deteriorating the model performance. Meanwhile, several works address overfitting issues for continual learning from various perspectives. NCM (Hou et al., 2019) and BiC (Wu et al., 2019) highlight the prediction bias problem during sequential training that the models are prone to predict the data to classes in recently trained tasks. OCS (Yoon et al., 2022) tackles the class imbalance problems for rehearsal-based continual learning, where the number of instances at each class varies per task so that the model would perform biased training on dominant classes. Nevertheless, these works do not consider the overfitting issues caused by training a sequence of few-shot tasks. FSLL (Mazumder et al., 2021) tackles overfitting for few-shot CIL by partially-splitting model parameters for different sessions through multiple substeps of iterative reidentification and weight selection. However, it led to computationally inefficient. To deploy a practical few-shot CIL model, we propose a simple yet efficient method named SoftNet, effectively alleviating catastrophic forgetting and overfitting. Motivated by Lottery Ticket Hypothesis (Frankle and Carbin, 2019), which hypothesizes the existence of competitive subnetworks (winning tickets) within the randomly initialized dense neural network, we suggest a new paradigm for Few-shot CIL, named Regularized Lottery Ticket Hypothesis: Regularized Lottery Ticket Hypothesis (RLTH). A randomly-initialized dense neural network contains a regularized subnetwork that can retain the prior class knowledge while providing room to learn the new class knowledge through isolated training of the subnetwork. Based on RLTH, we propose a method, referred to as Soft-SubNetworks (SoftNet), illustrated in Figure 1 . First, SoftNet jointly learns the randomly initialized dense model (Figure 1 (a) ) and soft mask m ∈ [0, 1] |θ| pertaining to Soft-subnetwork (Figure 1 (b) ) on the base session training; the soft mask consists of the major part of the model parameters m = 1 and the minor ones m < 1 where m = 1 is obtained by the top-c% of model parameters and m < 1 is obtained by the remaining ones (100-top-c%) sampled from the uniform distribution. Then, we freeze the major part of pre-trained subnetwork weights for maintaining prior class knowledge and update the only minor part of weights for the novel class knowledge (Figure 1 (c) ). We summarize our key contributions as follows: • This paper presents a new masking-based method, Soft-SubNetwork (SoftNet), that tackles two critical challenges in the few-shot class incremental learning (FSCIL), known as catastrophic forgetting and overfitting. • Our SoftNet trains two different types of non-binary masks (subnetworks) for solving FSCIL, preventing the continual learner from forgetting previous sessions and overfitting simultaneously. • We conduct a comprehensive empirical study on SoftNet with multiple class incremental learning methods. Our method significantly outperforms strong baselines on benchmark tasks for FSCIL problems.

2. RELATED WORK

Catastrophic Forgetting. Many recent works have made remarkable progress in tackling the challenges of catastrophic forgetting in lifelong learning. To be specific, Architecture-based approaches (Mallya et al., 2018; Serrà et al., 2018; Li et al., 2019) (2022) shows the existence of a sparse subnetwork, called winning tickets, that performs well on all tasks during continual learning. However, many subnetwork-based approaches are incompatible with the FSCIL setting since performing task inference under data imbalances is challenging. FSLL (Mazumder et al., 2021) aims to search session-specific subnetworks while preserving weights for previous sessions for incremental few-shot learning. However, the expansion process comprises another series of retraining and pruning steps, requiring excessive training time and computational costs. On the contrary, our proposed method, SoftNet, jointly learns the model and task-adaptive

availability

//github.com/ihaeyong/

