LEARNING WITHOUT PREJUDICES: CONTINUAL UNBI-ASED LEARNING VIA BENIGN AND MALIGNANT FOR-GETTING

Abstract

Although machine learning algorithms have achieved state-of-the-art status in image classification, recent studies have substantiated that the ability of the models to learn several tasks in sequence, termed continual learning (CL), often suffers from abrupt degradation of performance from previous tasks. A large body of CL frameworks has been devoted to alleviating this forgetting issue. However, we observe that forgetting phenomena in CL are not always unfavorable, especially when there is bias (spurious correlation) in training data. We term such type of forgetting benign forgetting, and categorize detrimental forgetting as malignant forgetting. Based on this finding, our objective in this study is twofold: (a) to discourage malignant forgetting by generating previous representations, and (b) encourage benign forgetting by employing contrastive learning in conjunction with feature-level augmentation. Extensive evaluations of biased experimental setups demonstrate that our proposed method, Learning without Prejudices, is effective for continual unbiased learning.

1. INTRODUCTION

In continual learning (CL), a model learns a sequence of tasks to accumulate existing knowledge for a new task. This is preferable in practice, where a model cannot retrieve previously used data, owing to privacy, limited data capacity, or an online streaming setup. The main challenge in CL is to alleviate "catastrophic forgetting," whereby a model forgets prior information while training on new information (McCloskey & Cohen, 1989) . A line of recent works has been dedicated to mitigating this issue. Regularization-based methods force a current model not to be far from the previous one by penalizing changes in the parameters learned in previous tasks (Kirkpatrick et al., 2017; Chaudhry et al., 2018; Aljundi et al., 2018; 2019a; Ahn et al., 2019; Dhar et al., 2019; Douillard et al., 2020) . Replay-based methods store samples of prior tasks in a buffer and employ them along with present samples (Robins, 1995; Lopez-Paz & Ranzato, 2017; Buzzega et al., 2020; Aljundi et al., 2019b; Mai et al., 2021; Lin et al., 2021; Madaan et al., 2021; Chaudhry et al., 2021; Bonicelli et al., 2022) . Generator-based methods generate prior samples and input them into current tasks (Shin et al., 2017; Kemker & Kanan, 2017; Xiang et al., 2019; Ostapenko et al., 2019; Liu et al., 2020; Yin et al., 2020) . A common assumption of the above-mentioned existing methods is that the training dataset is welldistributed. However, a source dataset is often biased, and a machine learning algorithm could perceive the bias as meaningful information, thereby leading to misleading generalizability of the model (Kim et al., 2019; Jeon et al., 2022) . In the experiment in Section 3.1, we show that biased distributions are detrimental to the robustness of models in existing CL scenarios. Thus, we propose a new type of CL, termed "continual unbiased learning (CUL)", in which the dataset of each task has a different bias. With CUL, we aim to make any model trained on any task unbiased, considering all models as candidates for application. This is particularly desirable in practice, whereby a model designed for a specific purpose is deployed for long periods and training datasets with divergent distributions are fed sequentially to update the model. 

