ONLINE BIAS CORRECTION FOR TASK-FREE CONTINUAL LEARNING

Abstract

Task-free continual learning is the machine-learning setting where a model is trained online with data generated by a nonstationary stream. Conventional wisdom suggests that, in this setting, models are trained using an approach called experience replay, where the risk is computed both with respect to current stream observations and to a small subset of past observations. In this work, we explain both theoretically and empirically how experience replay biases the outputs of the model towards recent stream observations. Moreover, we propose a simple approach to mitigate this bias online, by changing how the output layer of the model is optimized. We show that our approach improves significantly the learning performance of experience-replay approaches over different datasets. Our findings suggest that, when performing experience replay, the output layer of the model should be optimized separately from the preceding layers.

1. INTRODUCTION

In broad terms, continual learning is the process of incrementally aggregating knowledge from data that are generated by a nonstationary distribution (Lee et al., 2019; Riemer et al., 2019) . The main motivation for studying continual learning is to give artificial learners the ability to learn as biological learners do-perpetually updating and refining their body of knowledge under changing external conditions (Silver et al., 2013) . The inability of artificial learners to learn continually stems from the fact that they overwrite previously learned knowledge whenever they encounter new information. This phenomenon is called catastrophic forgetting (McCloskey & Cohen, 1989; French, 1999) . In this paper, we focus specifically on task-free continual learning (Aljundi et al., 2019b) . In this setting, the data are presented to the learner in small minibatches, and this setting is agnostic to the way the data distribution changes over time. In other words, we assume no knowledge about whether the distribution is piecewise-stationary (that is, when there are distinct tasks being learned), or whether the distribution changes continuously over time (Aljundi et al., 2019b) . Most task-free continual learning approaches make use of a memory which can store a small percentage (typically 10% or less) of all observed data instances. The data instances stored in memory are subsequently replayed in order to mitigate catastrophic forgetting. This simple paradigm, called replay-based continual learning, is surprisingly effective in task-free settings. Furthermore, it is also supported by findings from the field of neuroscience, in relation to how biological learning takes place (Marr, 1971; Ji & Wilson, 2007; Liu et al., 2019) . A number of continual learning methods tend to make predictions that are biased towards recently observed data (Buzzega et al., 2021; Mai et al., 2021) . Several strategies have been proposed to deal with this prediction bias (also called recency bias). Unfortunately, most of them are not applicable to task-free continual learning, since they have been designed for continual learning settings that consist of a task sequence, and they require knowledge of which classes the current task comprises (Wu et al., 2019; Belouadah & Popescu, 2019; Buzzega et al., 2021) . One approach which is applicable in task-free continual learning is proposed in Mai et al. (2021) , but it can only be performed after the end of training, hence the learner's predictions during training would remain biased.

