CLASS-INCREMENTAL LEARNING WITH REPETITION

Abstract

Real-world data streams naturally include the repetition of previous concepts. From a Continual Learning (CL) perspective, repetition is a property of the environment and, unlike replay, cannot be controlled by the user. Nowadays, Class-Incremental scenarios represent the leading test-bed for assessing and comparing CL strategies. This family of scenarios is very easy to use, but it never allows revisiting previously seen classes, thus completely disregarding the role of repetition. We focus on the family of Class-Incremental with Repetition (CIR) scenarios, where repetition is embedded in the definition of the stream. We propose two stochastic scenario generators that produce a wide range of CIR scenarios starting from a single dataset and a few control parameters. We conduct the first comprehensive evaluation of repetition in CL by studying the behavior of existing CL strategies under different CIR scenarios. We then present a novel replay strategy that exploits repetition and counteracts the natural imbalance present in the stream. On both CIFAR100 and TinyImageNet, our strategy outperforms other replay approaches, which are not designed for environments with repetition.

1. INTRODUCTION

Continual Learning (CL) requires a model to learn new information from a stream of experiences presented over time, without forgetting previous knowledge (Parisi et al., 2019; Lesort et al., 2020) . The nature and characteristics of the data stream can vary a lot depending on the real-world environment and target application. Class-Incremental (CI) scenarios (Rebuffi et al., 2017) are the most popular ones in CL. CI requires the model to solve a classification problem where new classes appear over time. Importantly, when a set of new classes appears, the previous ones are never seen again. However, the model still needs to correctly predict them at test time. Conversely, in a Domain-Incremental (DI) scenario (van de Ven & Tolias, 2019) the model sees all the classes at the beginning and continue to observe new instances of the classes over time. The CI and DI scenarios have been very helpful to promote and drive CL research in the last few years. However, they strongly constrain the properties of the data stream in a way that it sometimes considered unrealistic or very limiting (Cossu et al., 2021) . Recently, the idea of Class-Incremental with Repetition (CIR) scenarios has started to gather some attention in CL (Cossu et al., 2021) . CIR scenarios are arguably more flexible in the definition of the stream, since they allow both the introduction of new classes and the repetition of previously seen classes. Crucially, repetition is a property of the environment and cannot be controlled by the CL agent. This is very different from Replay strategies (Hayes et al., 2021) , where the repetition of previous concepts is heavily structured and can be tuned at will. CIR defines a family of CL scenarios which ranges from CI (new classes only, without repetition) to DI (full repetition of all seen classes). Although appealing, currently there exists neither a quantitative analysis nor an empirical evaluation of CL strategies learning in CIR scenarios. Mainly, because it is not obvious how to build a stream with repetition, given the large amount of variables involved. How to manage repetition over time? How to decide what to repeat? What data should we use? In this paper, we provide two generators for CIR that, starting from a single dataset, allow to build customized streams by only setting few parameters. The generators are as easy to use as CI or DI ones. We leveraged our generators to run an extensive empirical evaluation of the behavior of CL strategies in CIR scenarios. We found out that knowledge accumulation happens naturally in streams with repetition. Even a naive fine-tuning, subjected to complete forgetting in CI scenarios, is able to accumulate knowledge for classes that are not always present in an experience. We observed that Replay strategies still provide an advantage in terms of final accuracy, even though they are not crucial to avoid catastrophic forgetting. On one side, distillation-based strategies like LwF (Li & Hoiem, 2018) are competitive in streams with a moderate amount of repetition. On the other side, existing Replay strategies are not specifically designed for CIR streams. We propose a novel Replay approach, called Frequency-Aware Replay (ER-FA) designed for streams with unbalanced repetition (few classes appear rarely, the other very frequently). ER-FA surpasses by a large margin other Replay variants when looking at infrequent classes and it does not lose performance in terms of frequent classes. This leads to a moderate gain in the final accuracy, with a much better robustness and a reduced variance across all classes. Our main contributions are: 1. The design of two CIR generators, able to create streams with repetition by only setting few control parameters. We built both generators with Avalanche (Lomonaco et al., 2021) and we will make them publicly available to foster future research. The generators are general enough to fit any classification dataset and are fully integrated with Avalanche pipeline to run CL experiments. 2. We perform an extensive evaluation of the properties of CIR streams and the performance of CL strategies. We study knowledge accumulation and we showed that Replay, although still effective, is not crucial for the mitigation of catastrophic forgetting. Some approaches (e.g., LwF) look more promising than others in CIR scenarios. We consolidate our results with an analysis of the CL models over time through Centered Kernel Alignment (CKA) (Kornblith et al., 2019 ) and weights analysis. 3. We propose a novel Replay variant, ER-FA, which is designed based on the properties of CIR scenarios. ER-FA surpasses other Replay strategies in unbalanced streams and provide a more robust performance on infrequent classes without losing accuracy on the frequent ones.

2. CLASS-INCREMENTAL LEARNING WITH REPETITION GENERATORS

𝑒 ! CI DI Concepts: 𝑒 " 𝑒 # 𝑒 ! 𝑒 " 𝑒 # 𝑒 $ … 𝑒 ! 𝑒 " 𝑒 # 𝑒 $ … CIR New Inst. New/Old Inst. 1 ). In Table 1 , we formally present and compare the properties of the three scenario types. In CIR, streams with repetition are characterized by multiple occurrences of the same class over time. To study this scenario, we propose two stream generators designed to create a stream from a finite dataset: the Slot-Based Generator (G slot ) and the Sampling-Based Generator (G samp ). G slot generate streams by enforcing constraints on the number of occurrences of classes in the stream using only two parameters. G slot does not repeat already observed samples, therefore the stream length is limited by the number of classes. However, it guarantees that all samples in the dataset will be observed exactly once during the lifetime of the model. Instead, G samp generates streams according to several parametric distributions that control the stream properties. It can generate arbitrarily long streams in which old instances can also re-appear with some probability.

2.1. SLOT-BASED GENERATOR

The Slot-Based Generator G slot allows to carefully control the class repetitions in the generated stream with a single parameter K. G slot takes as input a dataset D, the total number of experiences



Figure1: Illustration of scenario types that can be generated with episodic partial access to a finite set of concepts. The shape colors indicate whether instances are new in each episode or can be a mixture of old and new instances.

