EEC: LEARNING TO ENCODE AND REGENERATE IM-AGES FOR CONTINUAL LEARNING

Abstract

The two main impediments to continual learning are catastrophic forgetting and memory limitations on the storage of data. To cope with these challenges, we propose a novel, cognitively-inspired approach which trains autoencoders with Neural Style Transfer to encode and store images. During training on a new task, reconstructed images from encoded episodes are replayed in order to avoid catastrophic forgetting. The loss function for the reconstructed images is weighted to reduce its effect during classifier training to cope with image degradation. When the system runs out of memory the encoded episodes are converted into centroids and covariance matrices, which are used to generate pseudo-images during classifier training, keeping classifier performance stable while using less memory. Our approach increases classification accuracy by 13-17% over state-of-the-art methods on benchmark datasets, while requiring 78% less storage space. 1

1. INTRODUCTION

Humans continue to learn new concepts over their lifetime without the need to relearn most previous concepts. Modern machine learning systems, however, require the complete training data to be available at one time (batch learning) (Girshick, 2015) . In this paper, we consider the problem of continual learning from the class-incremental perspective. Class-incremental systems are required to learn from a stream of data belonging to different classes and are evaluated in a single-headed evaluation (Chaudhry et al., 2018) . In single-headed evaluation, the model is evaluated on all classes observed so far without any information indicating which class is being observed. Creating highly accurate class-incremental learning systems is a challenging problem. One simple way to create a class-incremental learner is by training the model on the data of the new classes, without revisiting the old classes. However, this causes the model to forget the previously learned classes and the overall classification accuracy decreases, a phenomenon known as catastrophic forgetting (Kirkpatrick et al., 2017) . Most existing class-incremental learning methods avoid this problem by storing a portion of the training samples from the earlier learned classes and retraining the model (often a neural network) on a mixture of the stored data and new data containing new classes (Rebuffi et al., 2017; Hou et al., 2019) . Storing real samples of the previous classes, however, leads to several issues. First, as pointed out by Wu et al. (2018b) , storing real samples exhausts memory capacity and limits performance for real-world applications. Second, storing real samples introduces privacy and security issues (Wu et al., 2018b) . Third, storing real samples is not biologically inspired, i.e. humans do not need to relearn previously known classes. This paper explores the "strict" class-incremental learning problem in which the model is not allowed to store any real samples of the previously learned classes. The strict class-incremental learning problem is more akin to realistic learning scenarios such as a home service robot that must learn continually with limited on-board memory. This problem has been previously addressed using generative models such as autoencoders (Kemker & Kanan, 2018) or Generative Adversarial Networks (GANs) (Ostapenko et al., 2019) . Most approaches for strict class-incremental learning use GANs to generate samples reflecting old class data, because GANs generate sharp, fine-grained images (Ostapenko et al., 2019) . The downside of GANs, however, is that they tend to generate images which do not belong to any of the learned classes, hurting classification performance. Autoencoders, on the other hand, always generate images that relate to the learned classes, but tend to produce blurry images that are also not good for classification. To cope with these issues, we propose a novel, cognitively-inspired approach termed Encoding Episodes as Concepts (EEC) for continual learning, which utilizes convolutional autoencoders to generate previously learned class data. Inspired by models of the hippocampus (Renoult et al., 2015) , we use autoencoders to create compressed embeddings (encoded episodes) of real images and store them in memory. To avoid the generation of blurry images, we borrow ideas from the Neural Style Transfer (NST) algorithm proposed by Gatys et al. (2016) to train the autoencoders. For efficient memory management, we use the notion of memory integration, from hippocampal and neocortical concept learning (Mack et al., 2018) , to combine similar episodes into centroids and covariance matrices eliminating the need to store real data. This paper contributes: 1) an autoencoder based approach to strict class-incremental learning which uses Neural Style Transfer to produce quality samples reflecting old class data (Sec. 3.1); 2) a cognitively-inspired memory management technique that combines similar samples into a centroid/covariance representation, drastically reducing the memory required (Sec. 3.2); 3) a data filtering and a loss weighting technique to manage image degradation of old classes during classifier training (Sec. 3.3). We further show that EEC outperforms state-of-the-art (SOTA) approaches on benchmark datasets by significant margins while also using far less memory.

2. RELATED WORK

Most recent approaches to class-incremental learning store a portion of the real images belonging to the old classes to avoid catastrophic forgetting. Rebuffi et al. ( 2017 To avoid storing old class images, some approaches store features from the last fully-connected layer of the neural networks (Xiang et al., 2019; Hayes & Kanan, 2020; Ayub & Wagner, 2020b; d) . These approaches, however, use a network pretrained on ImageNet to extract features, which gives them an unfair advantage over other approaches. Because of their reliance on a pretrained network, these approaches cannot be applied in situations when new data differs drastically from ImageNet (Russakovsky et al., 2015) . These difficulties have forced researchers to consider using generative networks. Methods employing generative networks tend to model previous class statistics and regenerate images belonging to the old classes while attempting to learn new classes. Both Shin et al. (2017) and Wu et al. (2018a) use generative replay where the generator is trained on a mixture of generated old class images and real images from the new classes. This approach, however, causes images belonging to classes learned in earlier increments to start to semantically drift, i.e. the quality of images degrades because of the repeated training on synthesized images. Ostapenko et al. (2019) avoids semantic drift by training the GAN only once on the data of each class. Catastrophic forgetting is avoided by applying elastic weight consolidation (Kirkpatrick et al., 2017) , in which changes in important weights needed for old classes are avoided when learning new classes. They also grow their network when it runs out of memory while learning new classes, which can be difficult to apply in situations with restricted memory. One major issue with GAN based approaches is that GANs tend to generate images that do not belong to any of the learned classes which decreases classification accuracy. For these reasons, most approaches only perform well on simpler datasets such as MNIST (LeChun, 1998) but perform poorly on complex datasets such as ImageNet. Conditional GAN can be used to mitigate the problem of images belonging to none of the classes as done by Ostapenko et al. (2019) , however the performance is still poor on complex datasets such as ImageNet-50 (see Table 1 and  



A preliminary version of this work was presented at ICML Workshop on Lifelong Machine Learning (Ayub & Wagner, 2020c).



) (iCaRL) store old class images and utilize knowledge distillation (Hinton et al., 2015) for representation learning and the nearest class mean (NCM) classifier for classification of the old and new classes. Knowledge distillation uses a loss term to force the labels of the images of previous classes to remain the same when learning new classes. Castro et al. (2018) (EEIL) improves iCaRL with an end-to-end learning approach. Wu et al. (2019) also stores real images and uses a bias correction layer to avoid any bias toward the new classes.

