PBES: PCA BASED EXEMPLAR SAMPLING ALGO-RITHM FOR CONTINUAL LEARNING

Abstract

Traditional machine learning is both data and computation intensive. The most powerful models require huge quantities of data to train and the training is highly time consuming. In the streaming or incremental model of machine learning, the data is received and processed in a streaming manner, i.e., the entire data stream is not stored, and the models are updated incrementally. While this is closer to the learning process of humans, a common problem associated with this is "catastrophic forgetting" (CF), i.e., because the entire data is not stored, but just a sketch of it, as more and more data arrives, the older data has invariably a smaller representation in the stored sketch, and this causes models to perform badly on tasks that are closer to older data. One of the approaches to solve this problem stores an "exemplar set" of data items from the stream -but this raises the central question: how to choose which items to store? Current approaches to solve this are based on herding, which is a way to select a random looking sample by a deterministic algorithm. We propose a novel selection approach based on Principal Component analysis and median sampling. This approach avoids the pitfalls due to outliers and is both simple to implement and use across various incremental machine learning models. It also has independent usage as a sampling algorithm. We achieve better performance compared to state-of-the-art methods.

1. INTRODUCTION

In traditional machine learning, one usually trains a model from a given data set that serves for training and testing purposes. The model can then be applied to various queries. On the other hand, humans learn and build "mental models" incrementally over time from a series of experiences and information sources. The corresponding machine learning scenario is known as continual learning (CL). In this scenario, a machine learning model continually keeps learning and adapting to new data; in effect, the data is viewed as a stream rather than a batch. Both humans and continual learning systems suffer from the problem of forgetting. Humans can forget information they learned in the past, whereas a machine learning model's adaptation to new incoming data suffers from socalled catastrophic forgetting (CF) due to inaccessibility of the data of earlier tasks in CL setups. In machine learning, a trained model depends heavily on the class distribution seen in the data. The reason for CF is that the class distributions of incoming data changes over time. Newer classes are introduced with time and this invariably changes the class distribution. Conceptually the stream of data can be thought of as a sequence of tasks where each task usually consists of data about a few new classes. Thus, different tasks have different class distributions and the model needs to update itself in order to recognize the new classes. It is impractical to retrain a model over all the past data, and thus one "remembers" either just a sample of the older data or else one tries to make the model itself more complex so that it can hope to just incorporate the newer data into the additional model complexity. In either case, there is some forgetting involved as the entire old data cannot be stored or remembered in the model. The challenges posed by CF cause difficulties for AI models to adapt to practical systems across a multitude of fields (Prabhu et al., 2020; sem; Li et al., 2020) . In particular, different streams of data have different inter-task distributional gaps and thus pose challenges of different natures. While there have been continual learning studies on various domains such as image classification on commonly found day-to-day objects (CIFAR10, CIFAR100, ImageNet datasets), food images (Food1K dataset), but datasets with high diversity (large inter-class gaps) have not been studied previously. Such data usually has outliers, and it

