NEW INSIGHTS FOR THE STABILITY-PLASTICITY DILEMMA IN ONLINE CONTINUAL LEARNING

Abstract

The aim of continual learning is to learn new tasks continuously (i.e., plasticity) without forgetting previously learned knowledge from old tasks (i.e., stability). In the scenario of online continual learning, wherein data comes strictly in a streaming manner, the plasticity of online continual learning is more vulnerable than offline continual learning because the training signal that can be obtained from a single data point is limited. To overcome the stability-plasticity dilemma in online continual learning, we propose an online continual learning framework named multi-scale feature adaptation network (MuFAN) that utilizes a richer context encoding extracted from different levels of a pre-trained network. Additionally, we introduce a novel structure-wise distillation loss and replace the commonly used batch normalization layer with a newly proposed stability-plasticity normalization module to train MuFAN that simultaneously maintains high plasticity and stability. Mu-FAN outperforms other state-of-the-art continual learning methods on the SVHN, CIFAR100, miniImageNet, and CORe50 datasets. Extensive experiments and ablation studies validate the significance and scalability of each proposed component: 1) multi-scale feature maps from a pre-trained encoder, 2) the structure-wise distillation loss, and 3) the stability-plasticity normalization module in MuFAN.

1. INTRODUCTION

Humans excel in learning new skills without forgetting what they have previously learned over their lifetimes. Meanwhile, in continual learning (CL) (Chen & Liu, 2018) , wherein a stream of tasks is observed, a deep learning model forgets prior knowledge when learning a new task if samples from old tasks are unavailable. This problem is known as catastrophic forgetting (McCloskey & Cohen, 1989) . In recent years, promising research has been conducted to address this problem (Parisi et al., 2019) . However, excessive retention of old knowledge impedes the balance between preventing forgetting (i.e., stability) and acquiring new concepts (i.e., plasticity), which is referred to as the stability-plasticity dilemma (Abraham & Robins, 2005) . In this study, we cover the difference in the stability-plasticity dilemma encountered by online CL and offline CL and propose a novel approach that addresses the stability-plasticity dilemma in online CL. Most offline CL methods aim at less constraining plasticity in the process of preventing forgetting instead of improving it because obtaining high plasticity through iterative training is relatively easy. However, as shown in Figure 1 , the learning accuracy (showing plasticity) of online CL is way lower than that of offline CL, with a gap of 10-20% on all three CL benchmarks. That is, for online CL, wherein data comes in a streaming manner (single epoch), an approach that aims at suppressing excessive forgetting in the process of enhancing plasticity is required. For it, we propose a multi-scale feature adaptation network (MuFAN), which consists of three components to obtain high stability and plasticity simultaneously: 1) multi-scale feature maps exploited from shallow to deeper layers of a pre-trained model, 2) a novel structure-wise distillation loss across tasks, and 3) a

availability

://github.com/whitesnowdrop/MuFAN.

