DEEP CLASS CONDITIONAL GAUSSIANS FOR CONTINUAL LEARNING

Abstract

The current state of the art for continual learning with frozen, pre-trained embedding networks are simple probabilistic models defined over the embedding space, for example class conditional Gaussians. However, as of yet, in the taskincremental online setting, it has been an open question how to extend these methods to when the embedding function has to be learned from scratch. In this paper, we propose an empirical Bayesian framework that works by storing a fixed number of examples in memory which are used to calculate the posterior of the probabilistic model and a conditional marginal likelihood term used to fit the embedding function. The learning of the embedding function can be interpreted as using a variant of experience replay, which is a highly performative method for continual learning. As part of our framework, we decide which examples to store by selecting the subset that minimises the KL divergence between the true posterior and the posterior induced by the subset, which is shown to be necessary to achieve good performance. We demonstrate the performance of our method on a range of task-incremental online settings, including those with overlapping tasks which thus far have been under-explored. Our method outperforms all other methods, including several other replay-based methods, evidencing the potential of our approach.

1. INTRODUCTION

Real world use of deep learning methods can often necessitate dynamic updating of solutions on non-stationary data streams (Farquhar & Gal, 2018; Antoniou et al., 2020) . This is one of the main problems studied in continual learning and as a result, continual learning has become of increasing interest to the machine learning community, with many proposed approaches (Parisi et al., 2019) and settings (Hsu et al., 2018; Antoniou et al., 2020; Delange et al., 2021) . Currently, the two biggest challenges in continual learning are catastrophic forgetting and positive transfer. Catastrophic forgetting describes the common occurrence in learning where unconstrained deep models easily forget information derived from previous data after updating on other data. Positive transfer is the ability of a model, given the current data, to improve its understanding of previous data and of what future data might imply. While there have been significant steps taken to solve these problems (Delange et al., 2021; Mai et al., 2021) , in many settings there are still gains to be made (Farquhar & Gal, 2018) . In common with many works in continual learning, the setting considered here is task-incremental online learning where a data stream is split into a sequential set of tasks and methods are given information about what the current task is (van de Ven & Tolias, 2019; Prabhu et al., 2020) . Each task is encapsulated by a representative dataset (considered to be sampled i.i.d. from a task distribution) which is given to a method batch by batch. Different tasks will generally be associated with different distributions, as well as different target problems. The target problems are summarised in a task objective function. In our case the task objective is classification (Hsu et al., 2018) , where the classes being considered vary between tasks. The overall objective of a method, after seeing all of the tasks, is to perform well on all of them, given constraints on the amount of memory used by the method. Currently one of the best ways to approach continual learning is to use a frozen pretrained embedding function and define a simple probabilistic model on top to classify the data (Ostapenko et al., 2022; 1 

