GRAPH-BASED CONTINUAL LEARNING

Abstract

Despite significant advances, continual learning models still suffer from catastrophic forgetting when exposed to incrementally available data from non-stationary distributions. Rehearsal approaches alleviate the problem by maintaining and replaying a small episodic memory of previous samples, often implemented as an array of independent memory slots. In this work, we propose to augment such an array with a learnable random graph that captures pairwise similarities between its samples, and use it not only to learn new tasks but also to guard against forgetting. Empirical results on several benchmark datasets show that our model consistently outperforms recently proposed baselines for task-free continual learning.

1. INTRODUCTION

Recent breakthroughs of deep neural networks often hinge on the ability to repeatedly iterate over stationary batches of training data. When exposed to incrementally available data from non-stationary distributions, such networks often fail to learn new information without forgetting much of its previously acquired knowledge, a phenomenon often known as catastrophic forgetting (Ratcliff, 1990; McCloskey & Cohen, 1989; French, 1999) . Despite significant advances, the limitation has remained a long-standing challenge for computational systems that aim to continually learn from dynamic data distributions (Parisi et al., 2019) . Among various proposed solutions, rehearsal approaches that store samples from previous tasks in an episodic memory and regularly replay them are one of the earliest and most successful strategies against catastrophic forgetting (Lin, 1992; Rolnick et al., 2019) . An episodic memory is typically implemented as an array of independent slots; each slot holds one example coupled with its label. During training, these samples are interleaved with those from the new task, allowing for simultaneous multi-task learning as if the resulting data were independently and identically distributed. While such approaches are effective in simple settings, they require sizable memory and are often impaired by memory constraints, performing rather poorly on complex datasets. A possible explanation is that slot-based memories fail to utilize relational structure between samples; semantically similar items are treated independently both during training and at test time. In marked contrast, relational memory is a prominent feature of biological systems that has been strongly linked to successful memory retrieval and generalization (Prince et al., 2005) . Humans, for example, encode event features into cortical representations and bind them together in the medial temporal lobe, resulting in a durable, yet flexible form of memory (Shimamura, 2011) . In this paper, we introduce a novel Graph-based Continual Learning model (GCL) that resembles some characteristics of relational memory. More specifically, we explicitly model pairwise similarities between samples, including both those in the episodic memory and those found in the current task. These similarities allow for representation transfer between samples and provide a resilient mean to guard against catastrophic forgetting. Our contributions are twofold: (1) We propose the use of random graphs to represent relational structures between samples. While similar notions of dependencies have been proposed in the literature (Louizos et al., 2019; Yao et al., 2020) , the application of random graphs in task-free continual learning is novel, at least to the best of our knowledge.

