A UNIFIED BAYESIAN FRAMEWORK FOR DISCRIMI-NATIVE AND GENERATIVE CONTINUAL LEARNING

Abstract

Continual Learning is a learning paradigm where learning systems are trained on a sequence of tasks. The goal here is to perform well on the current task without suffering from a performance drop on the previous tasks. Two notable directions among the recent advances in continual learning with neural networks are (1) variational Bayes based regularization by learning priors from previous tasks, and, (2) learning the structure of deep networks to adapt to new tasks. So far, these two approaches have been orthogonal. We present a novel Bayesian framework for continual learning based on learning the structure of deep neural networks, addressing the shortcomings of both these approaches. The proposed framework learns the deep structure for each task by learning which weights to be used, and supports inter-task transfer through the overlapping of different sparse subsets of weights learned by different tasks. An appealing aspect of our proposed continual learning framework is that it is applicable to both discriminative (supervised) and generative (unsupervised) settings. Experimental results on supervised and unsupervised benchmarks shows that our model performs comparably or better than recent advances in continual learning.

1. INTRODUCTION

Continual learning (CL) (Ring, 1997; Parisi et al., 2019) is the learning paradigm where a single model is subjected to a sequence of tasks. At any point of time, the model is expected to (i) make predictions for the tasks it has seen so far, (ii) if subjected to training data for a new task, adapt to the new task leveraging the past knowledge if possible (forward transfer) and benefit the previous tasks if possible (backward transfer). While the desirable aspects of more mainstream transfer learning (sharing of bias between related tasks (Pan & Yang, 2009)) might reasonably be expected here too, the principal challenge is to retain the predictive power for the older tasks even after learning new tasks, thus avoiding the so-called catastrophic forgetting. Real world applications in, for example, robotics or time-series forecasting, are rife with this challenging learning scenario, the ability to adapt to dynamically changing environments or evolving data distributions being essential in these domains. Continual learning is also desirable in unsupervised learning problems as well (Smith et al., 2019; Rao et al., 2019b) where the goal is to learn the underlying structure or latent representation of the data. Also, as a skill innate to humans (Flesch et al., 2018) , it is naturally an interesting scientific problem to reproduce the same capability in artificial predictive modelling systems. Existing approaches to continual learning are mainly based on three foundational ideas. One of them is to constrain the parameter values to not deviate significantly from their previously learned value by using some form of regularization or trade-off between previous and new learned weights (Schwarz et al., 2018; Kirkpatrick et al., 2017; Zenke et al., 2017; Lee et al., 2017) . A natural way to accomplish this is to train a model using online Bayesian inference, whereby the posterior of the parameters learned from task t serve as the prior for task t + 1 as in Nguyen et al. (2018) and Zeno et al. (2018) . This new informed prior helps in the forward transfer, and also prevents catastrophic forgetting by penalizing large deviations from itself. In particular, VCL (Nguyen et al., 2018) achieves the state of the art results by applying this simple idea to Bayesian neural networks. The second idea is to perform an incremental model selection for every new task. For neural networks, this is done by evolving the structure as newer tasks are encountered (Golkar et al., 2019; Li 

