BEEF: BI-COMPATIBLE CLASS-INCREMENTAL LEARN-ING VIA ENERGY-BASED EXPANSION AND FUSION

Abstract

Neural networks suffer from catastrophic forgetting when sequentially learning tasks phase-by-phase, making them inapplicable in dynamically updated systems. Class-incremental learning (CIL) aims to enable neural networks to learn different categories at multi-stages. Recently, dynamic-structure-based CIL methods achieve remarkable performance. However, these methods train all modules in a coupled manner and do not consider possible conflicts among modules, resulting in spoilage of eventual predictions. In this work, we propose a unifying energy-based theory and framework called Bi-Compatible Energy-Based Expansion and Fusion (BEEF) to analyze and achieve the goal of CIL. We demonstrate the possibility of training independent modules in a decoupled manner while achieving bi-directional compatibility among modules through two additionally allocated prototypes, and then integrating them into a unifying classifier with minimal cost. Furthermore, BEEF extends the exemplar-set to a more challenging setting, where exemplars are randomly selected and imbalanced, and maintains its performance when prior methods fail dramatically. Extensive experiments on three widely used benchmarks: CIFAR-100, ImageNet-100, and ImageNet-1000 demonstrate that BEEF achieves state-of-the-art performance in both the ordinary and challenging CIL settings.

1. INTRODUCTION

The ability to continuously acquire new knowledge is necessary in our ever-changing world and is considered a crucial aspect of human intelligence. In the realm of applicable AI systems, it is expected that these systems can learn new concepts in a stream while retaining knowledge of previously learned concepts. However, deep neural network-based systems, which have achieved great success, face a well-known issue known as catastrophic forgetting (French, 1999; Golab & Özsu, 2003; Zhou et al., 2023b) , whereby they abruptly forget prior knowledge when directly fine-tuned on new tasks. To address this challenge, the class-incremental learning (CIL) field aims to design learning paradigms that enable deep neural networks to learn novel categories in multi-stages while maintaining discrimination abilities for previous ones (Rebuffi et al., 2017; Zhou et al., 2023a) . Numerous approaches have been proposed to achieve the goal of CIL, with typical methods falling into two groups: regularization-based methods and dynamic-structure-based methods. Regularizationbased methods (Kirkpatrick et al., 2017; Aljundi et al., 2018; Li & Hoiem, 2017; Rebuffi et al., 2017) add constraints (e.g., parameter drift penalty) when updating, thus forcing the model to maintain crucial information for old categories. However, these methods often suffer from the stability-plasticity dilemma, lacking the capacity to handle all categories simultaneously. Dynamicstructure-based methods (Yan et al., 2021; Li et al., 2021) expand new modules at each learning stage to enhance the model's capacity and learn the task-specific knowledge through the new module, achieving remarkable performance. Whereas, these methods have an intrinsic drawback. They

