A MODEL OR 603 EXEMPLARS: TOWARDS MEMORY-EFFICIENT CLASS-INCREMENTAL LEARNING

Abstract

Real-world applications require the classification model to adapt to new classes without forgetting old ones. Correspondingly, Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. Typical CIL methods tend to save representative exemplars from former classes to resist forgetting, while recent works find that storing models from history can substantially boost the performance. However, the stored models are not counted into the memory budget, which implicitly results in unfair comparisons. We find that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work, especially for the case with limited memory budgets. As a result, we need to holistically evaluate different CIL methods at different memory scales and simultaneously consider accuracy and memory size for measurement. On the other hand, we dive deeply into the construction of the memory buffer for memory efficiency. By analyzing the effect of different layers in the network, we find that shallow and deep layers have different characteristics in CIL. Motivated by this, we propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel. MEMO extends specialized layers based on the shared generalized representations, efficiently extracting diverse representations with modest cost and maintaining representative exemplars. Extensive experiments on benchmark datasets validate MEMO's competitive performance.

1. INTRODUCTION

In the open world, training data is often collected in stream format with new classes appearing (Gomes et al., 2017; Geng et al., 2020) . Due to storage constraints (Krempl et al., 2014; Gaber, 2012) or privacy issues (Chamikara et al., 2018; Ning et al., 2021) , a practical Class-Incremental Learning (CIL) (Rebuffi et al., 2017) model requires the ability to update with incoming instances from new classes without revisiting former data. The absence of previous training data results in catastrophic forgetting (French, 1999) in CIL -fitting the pattern of new classes will erase that of old ones and result in a performance decline. The research about CIL has attracted much interest (Zhou et al., 2021b; 2022a; b; Liu et al., 2021b; 2023; Zhao et al., 2021a; b) in the machine learning field. Saving all the streaming data for offline training is known as the performance upper bound of CIL algorithms, while it requires an unlimited memory budget for storage. Hence, in the early years, CIL algorithms are designed in a strict setting without retaining any instances from the former classes (Li & Hoiem, 2017; Kirkpatrick et al., 2017; Aljundi et al., 2017; Lee et al., 2017) . It only keeps a classification model in the memory, which helps save the memory budget and meanwhile preserves privacy in the deployment. Afterward, some works noticed that saving limited exemplars from former classes can boost the performance of CIL models (Rebuffi et al., 2017; Chaudhry et al., 2019) . Various exemplar-based methodologies have been proposed, aiming to prevent forgetting by revisiting the old during new class learning, which improves the performance of CIL tasks steadily (Rolnick et al., 2019; Castro et al., 2018; Wu et al., 2019; Isele & Cosgun, 2018) . The utilization of exemplars has drawn the attention of the community from the strict setting to update the model with restricted memory

availability

https://github.com/wangkiw/

