ENERGY-BASED MODELS FOR CONTINUAL LEARNING Anonymous authors Paper under double-blind review

Abstract

We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs have a natural way to support a dynamically-growing number of tasks and classes and less interference with old tasks. We show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. We also find that EBMs outperform the baseline methods by a large margin on several continual learning benchmarks. These observations point towards EBMs as a class of models naturally inclined towards the continual learning regime.

1. INTRODUCTION

Humans are able to rapidly learn new skills and continuously integrate them with prior knowledge. The field of Continual Learning (CL) seeks to build artificial agents with the same capabilities (Parisi et al., 2019) . In recent years, CL has seen increased attention, particularly in the context of classification problems. A crucial characteristic of continual learning is the ability to learn new data without forgetting prior data. Models must also be able to incrementally learn new skills, without necessarily having a notion of an explicit task identity. However, standard neural networks (He et al., 2016; Simonyan & Zisserman, 2014; Szegedy et al., 2015) experience the catastrophic forgetting problem and perform poorly in this setting. Different approaches have been proposed to mitigate catastrophic forgetting, but many rely on the usage of external memory (Lopez-Paz & Ranzato, 2017; Li & Hoiem, 2017) , additional models (Shin et al., 2017) , or auxiliary objectives and regularization (Kirkpatrick et al., 2017; Schwarz et al., 2018; Zenke et al., 2017; Maltoni & Lomonaco, 2019) , which can constrain the wide applicability of these methods. In this work, we focus on classification tasks. These tasks are usually tackled by utilizing normalized probability distribution (i.e., softmax output layer) and trained with a cross-entropy objective. In this paper, we argue that by viewing classification from the lens of training an un-normalized probability distribution, we can significantly improve continual learning performance in classification problems. In particular, we interpret classification as learning an Energy-Based Model (EBM) across seperate classes (Grathwohl et al., 2019) . Training becomes a wake-sleep process, where the energy of an input data and its ground truth label is decreased while the energy of the input and another selected class is increased. This offers freedom to choose what classes to update in the CL process. By contrast, the cross entropy objective reduces the likelihood of all negative classes when given a new input, creating updates that lead to forgetting. The energy function, which maps a data and class pair to a scalar energy, also provides a way for the model to select and filter portions of the input that are relevant towards the classification on hand. We show that this enables EBMs training updates for new data to interfere less with previous data. In particular, our formulation of the energy function allows us to compute the energy of a data by learning a conditional gain based on the input label which serves as an attention filter to select the most relevant information. In the event of a new class, a new conditional gain can be learned. These unique benefits are applicable across a range of continual learning tasks. Most existing works (Kirkpatrick et al., 2017; Zhao et al., 2020) toward continual learning typically learn a sequence of distinct tasks with clear task boundaries (Boundary-Aware). Many of these methods depend on knowing the task boundaries that can provide proper moments to consolidate knowledge. However, this scenario is not very common in the real world, and a more natural scenario is the Boundary-1

