ONLINE BIAS CORRECTION FOR TASK-FREE CONTINUAL LEARNING

Abstract

Task-free continual learning is the machine-learning setting where a model is trained online with data generated by a nonstationary stream. Conventional wisdom suggests that, in this setting, models are trained using an approach called experience replay, where the risk is computed both with respect to current stream observations and to a small subset of past observations. In this work, we explain both theoretically and empirically how experience replay biases the outputs of the model towards recent stream observations. Moreover, we propose a simple approach to mitigate this bias online, by changing how the output layer of the model is optimized. We show that our approach improves significantly the learning performance of experience-replay approaches over different datasets. Our findings suggest that, when performing experience replay, the output layer of the model should be optimized separately from the preceding layers.

1. INTRODUCTION

In broad terms, continual learning is the process of incrementally aggregating knowledge from data that are generated by a nonstationary distribution (Lee et al., 2019; Riemer et al., 2019) . The main motivation for studying continual learning is to give artificial learners the ability to learn as biological learners do-perpetually updating and refining their body of knowledge under changing external conditions (Silver et al., 2013) . The inability of artificial learners to learn continually stems from the fact that they overwrite previously learned knowledge whenever they encounter new information. This phenomenon is called catastrophic forgetting (McCloskey & Cohen, 1989; French, 1999) . In this paper, we focus specifically on task-free continual learning (Aljundi et al., 2019b) . In this setting, the data are presented to the learner in small minibatches, and this setting is agnostic to the way the data distribution changes over time. In other words, we assume no knowledge about whether the distribution is piecewise-stationary (that is, when there are distinct tasks being learned), or whether the distribution changes continuously over time (Aljundi et al., 2019b) . Most task-free continual learning approaches make use of a memory which can store a small percentage (typically 10% or less) of all observed data instances. The data instances stored in memory are subsequently replayed in order to mitigate catastrophic forgetting. This simple paradigm, called replay-based continual learning, is surprisingly effective in task-free settings. Furthermore, it is also supported by findings from the field of neuroscience, in relation to how biological learning takes place (Marr, 1971; Ji & Wilson, 2007; Liu et al., 2019) . A number of continual learning methods tend to make predictions that are biased towards recently observed data (Buzzega et al., 2021; Mai et al., 2021) . Several strategies have been proposed to deal with this prediction bias (also called recency bias). Unfortunately, most of them are not applicable to task-free continual learning, since they have been designed for continual learning settings that consist of a task sequence, and they require knowledge of which classes the current task comprises (Wu et al., 2019; Belouadah & Popescu, 2019; Buzzega et al., 2021) . One approach which is applicable in task-free continual learning is proposed in Mai et al. ( 2021), but it can only be performed after the end of training, hence the learner's predictions during training would remain biased. In this paper, we propose a simple approach that performs online bias correction for task-free continual learning. Our contributions are as follows: a) We formally illustrate that the conventional paradigm of model training in task-free continual learning overweights the importance of current stream observations (Section 3.2), and we speculate that this overweighting is a cause of prediction bias of continual learners; b) We propose a novel metric to quantify prediction bias (Section 3.3), and we show that this bias can be effectively mitigated by appropriately modifying the parameters of only the final layer of the model, after the end of training (Section 3.4); c) We propose a novel approach called Online Bias Correction (OBC; Section 3.5), which maintains an unbiased model online, throughout the entire duration of learning (see Figure 1 for an illustration); d) We evaluate the performance of OBC extensively, and we show that it significantly improves a number of task-free continual learning methods, over multiple datasets (Section 4).

2. BACKGROUND 2.1 TASK-FREE CONTINUAL LEARNING

We define task-free continual learning as the online optimization of a model via small minibatches that are sampled from a nonstationary stream. In task-free continual learning, no strong assumptions are made about the nature of the distributional nonstationarity of the stream (Aljundi et al., 2019b) . Other continual learning settings, such as task-incremental and class-incremental continual learning assume a data distribution that is piecewise stationary, hence one that only changes at discrete points in time (Van de Ven & Tolias, 2019) . The objective of continual learning is to learn from all observed data despite the nonstationary nature of the distribution (Jin et al., 2021) , and, in general, previous work assumes no distributional mismatch between training and evaluation data. Previous work in task-free continual learning mostly focuses on replay-based methods (Aljundi et al., 2019c; Jin et al., 2021) . The prevalent replay paradigm is called experience replay (ER) (Isele & Cosgun, 2018; Chaudhry et al., 2019) . According to the ER paradigm, each minibatch of observations received by the learner is combined with another minibatch of equal size sampled from the memory. The model is then trained for one step with the combined stream-and-memory minibatch. Moreover, the memory is typically maintained by an online memory-population algorithm called reservoir sampling (Vitter, 1985) . There are multiple variants of the ER method. For instance, one approach called Maximally-Interfered Retrieval (MIR) replays instances that are going to be interfered the most by the current minibatch of new observations. Another approach called Class-Balancing Reservoir Sampling (CBRS) (Chrysakis & Moens, 2020) modifies the memory population algorithm to ensure that the memory remains balanced. There also other approaches that deviate from the ER paradigm, such as Greedy Sampler and Dumb Learner (GDUMB) (Prabhu et al., 2020) , which only trains the model using data stored in memory, or Asymmetric Corss-Entropy (ACE) (Caccia et al., 2022) , which uses a modified version of the cross-entropy loss to prevent the drift of latent representations.

2.2. COMPUTATIONAL COST

An important issue in task-free continual learning is computational cost. Since practical applications will likely involve large amounts of data, the design of task-free continual learners should ensure they are tractable. In practical terms, let us assume that a model has to learn from a stream of n instances. Moreover, we assume that applications with larger streams will likely involve memory storages of larger size m. In real-world applications, the difference between an O(n) learning algorithm, and an O(mn) algorithm could be enormous. Hence, in this work we will only focus on learning algorithms whose computational cost per incoming batch is independent of the memory size m, so that the computational complexity of learning from the entire stream is O(n).

2.3. BIAS CORRECTION IN TASK-FREE CONTINUAL LEARNING

To the best of our knowledge, there is only one approach explicitly designed to correct for prediction biases in task-free continual learning. Mai et al. (2021) propose learning a model using conventional experience replay, and after the entire stream has been observed, they replace the final linear layer of the model with a nearest-class-mean (NCM) classifier computed using all data stored in mem-

