GRADIENT BASED MEMORY EDITING FOR TASK-FREE CONTINUAL LEARNING Anonymous

Abstract

Continual learning often assumes a knowledge of (strict) task boundaries and identities for the instances in a data stream-i.e., a "task-aware" setting. However, in practice it is rarely the case that practitioners can expose task information to the model; thus needing "task-free" continual learning methods. Recent attempts towards task-free continual learning focus on developing memory-construction and replay strategies such that model performance over previously seen instances is best retained. In this paper, looking from a complementary angle, we propose to "edit" memory examples to allow for the model to better retain past performance when memory is replayed. Such memory editing is achieved by making gradient updates to memory examples so that they are more likely to be "forgotten" by the model when viewing new instances in the future. Experiments on five benchmark datasets show our proposed method can be seamlessly combined with baselines to significantly improve performance and achieve state-of-the-art results. 1

1. INTRODUCTION

Accumulating past knowledge and adapting to evolving environments are one of the key traits in human intelligence (McClelland et al., 1995) . While contemporary deep neural networks have achieved impressive results in a range of machine learning tasks Goodfellow et al. (2015) , they haven't yet manifested the ability of continually learning over evolving data streams (Ratcliff, 1990) . These models suffer from catastrophic forgetting (McCloskey & Cohen, 1989; Robins, 1995) when trained in an online fashion-i.e., performance drops over previously seen examples during the sequential learning process. To this end, continual learning (CL) methods are developed to alleviate catastrophic forgetting issue when models are trained on non-stationary data streams (Goodfellow et al., 2013) . Most existing work on continual learning assume that, when models are trained on a stream of tasks sequentially, the task specifications such as task boundaries or identities are exposed to the models. These task-aware CL methods make explicit use of task specifications to avoid catastrophic forgetting issue, including consolidating important parameters on previous tasks (Kirkpatrick et al., 2017; Zenke et al., 2017; Nguyen et al., 2018) , distilling knowledge from previous tasks (Li & Hoiem, 2017; Rannen et al., 2017) , or separating task-specific model parameters (Rusu et al., 2016; Serrà et al., 2018) . However, in practice, it is more likely that the data instances comes in a sequential, non-stationary fashion without task identity or boundary-a setting that is commonly termed as task-free continual learning (Aljundi et al., 2018) . To tackle this setting, recent attempts on task-free CL methods have been made (Aljundi et al., 2018; Zeno et al., 2018; Lee et al., 2020) . These efforts revolve around regularization and model expansion based approaches, which rely on inferring task boundaries or identities (Aljundi et al., 2018; Lee et al., 2020) and perform online paramater importance estimation (Zeno et al., 2018) , to consolidate or separate model parameters. In another line of efforts, memory-based CL methods have achieved strong results in task-free setting Aljundi et al. (2019b) . These methods store a small set of previously seen instances in a fix-sized memory, and utilize them for replay (Robins, 1995; Rolnick et al., 2019) or regularization (Lopez-Paz & Ranzato, 2017; Chaudhry et al., 2019a) . The core problem in memory-based CL methods is how to manage the memory instances (e.g., which to replace with new instances) and replay them given a restricted computation budget, so that the model performance can be maximally preserved or 1 Code has been uploaded in the supplementary materials and will be published. 1

