POPULATING MEMORY IN CONTINUAL LEARNING WITH CONSISTENCY AWARE SAMPLING Anonymous authors Paper under double-blind review

Abstract

Continual Learning (CL) methods aim to mitigate Catastrophic Forgetting (CF), where knowledge from previously learned tasks is often lost in favor of the new one. Among those algorithms, some have shown the relevance of keeping a rehearsal buffer with previously seen examples, referred to as memory. Yet, despite their popularity, limited research has been done to understand which elements are more beneficial to store in memory. It is common for this memory to be populated through random sampling, with little guiding principles that may aid in retaining prior knowledge. In this paper, and consistent with previous work, we found that some storage policies behave similarly given a certain memory size or compute budget, but when these constraints are relevant, results differ considerably. Based on these insights, we propose CAWS (Consistency AWare Sampling), an original storage policy that leverages a learning consistency score (C-Score) to populate the memory with elements that are easy to learn and representative of previous tasks. Because of the impracticality of directly using the C-Score in CL, we propose more feasible and efficient proxies to calculate the score that yield state-of-the-art results on CIFAR-100 and Tiny Imagenet.



It is clear that having a representative set of examples of the underlying distribution is critical for preserving previous knowledge. Ideally, one would like to save a large number of samples. Unfortunately, since saving large amounts of data results in computational overhead, we have to limit the memory size and choose which elements to keep. In this paper, we argue that this memory must satisfy two fundamental requirements in order to perform reliably. The first is to have elements that are easy to remember, or that the model can learn quickly. The second is to have elements that are a



have repeatedly shown state of the art performance in numerous tasks, including image recognitionHe et al. (2016); Dosovitskiy et al. (2020), Natural Language Processing (NLP) Devlin et al. (2018); Brown et al. (2020) or games previously thought to be intractable to solve, such as Go Silver et al. (2016) and Starcraft II Vinyals et al. (2019). However, as a common limitation, all these models lack versatility: when trained to perform novel tasks, they rapidly forget how to solve previous ones. This condition is known as catastrophic forgetting and is the main problem tackled by Continual Learning methods Parisi et al. (2019); Delange et al. (2021). A variety of methods have been proposed to approach this problem. Some have focused on allocating parameters sub-spaces for each new task Rusu et al. (2016); Mallya et al. (2018), others define restrictions on gradients learned Kirkpatrick et al. (2017); Lopez-Paz & Ranzato (2017), while others use meta-learning to learn reusable weights for all tasks Rajasegaran et al. (2020); Hurtado et al. (2021). Among these, memory-based methods like Experience Replay Chaudhry et al. (2019); Kim et al. (2020) have consistently exhibited greater performance while being easy to understand. In these methods, a memory of samples from previous tasks is kept during training of the current task to avoid forgetting how to solve previous tasks. Notwithstanding the popularity and effectiveness of memory-based methods, few studies have been conducted on how populating the memory affects the performance of CL methods. In particular, Chaudhry et al. (2018a); Wu et al. (2019); Hayes et al. (2020); Araujo et al. (2022) show that when populating the memory by focusing solely on sample diversity or class balance, random selection of elements ends up performing nearly or just as well without adding extra computation.

