PBES: PCA BASED EXEMPLAR SAMPLING ALGO-RITHM FOR CONTINUAL LEARNING

Abstract

Traditional machine learning is both data and computation intensive. The most powerful models require huge quantities of data to train and the training is highly time consuming. In the streaming or incremental model of machine learning, the data is received and processed in a streaming manner, i.e., the entire data stream is not stored, and the models are updated incrementally. While this is closer to the learning process of humans, a common problem associated with this is "catastrophic forgetting" (CF), i.e., because the entire data is not stored, but just a sketch of it, as more and more data arrives, the older data has invariably a smaller representation in the stored sketch, and this causes models to perform badly on tasks that are closer to older data. One of the approaches to solve this problem stores an "exemplar set" of data items from the stream -but this raises the central question: how to choose which items to store? Current approaches to solve this are based on herding, which is a way to select a random looking sample by a deterministic algorithm. We propose a novel selection approach based on Principal Component analysis and median sampling. This approach avoids the pitfalls due to outliers and is both simple to implement and use across various incremental machine learning models. It also has independent usage as a sampling algorithm. We achieve better performance compared to state-of-the-art methods.

1. INTRODUCTION

In traditional machine learning, one usually trains a model from a given data set that serves for training and testing purposes. The model can then be applied to various queries. On the other hand, humans learn and build "mental models" incrementally over time from a series of experiences and information sources. The corresponding machine learning scenario is known as continual learning (CL). In this scenario, a machine learning model continually keeps learning and adapting to new data; in effect, the data is viewed as a stream rather than a batch. Both humans and continual learning systems suffer from the problem of forgetting. Humans can forget information they learned in the past, whereas a machine learning model's adaptation to new incoming data suffers from socalled catastrophic forgetting (CF) due to inaccessibility of the data of earlier tasks in CL setups. In machine learning, a trained model depends heavily on the class distribution seen in the data. The reason for CF is that the class distributions of incoming data changes over time. Newer classes are introduced with time and this invariably changes the class distribution. Conceptually the stream of data can be thought of as a sequence of tasks where each task usually consists of data about a few new classes. Thus, different tasks have different class distributions and the model needs to update itself in order to recognize the new classes. It is impractical to retrain a model over all the past data, and thus one "remembers" either just a sample of the older data or else one tries to make the model itself more complex so that it can hope to just incorporate the newer data into the additional model complexity. In either case, there is some forgetting involved as the entire old data cannot be stored or remembered in the model. The challenges posed by CF cause difficulties for AI models to adapt to practical systems across a multitude of fields (Prabhu et al., 2020; sem; Li et al., 2020) . In particular, different streams of data have different inter-task distributional gaps and thus pose challenges of different natures. While there have been continual learning studies on various domains such as image classification on commonly found day-to-day objects (CIFAR10, CIFAR100, ImageNet datasets), food images (Food1K dataset), but datasets with high diversity (large inter-class gaps) have not been studied previously. Such data usually has outliers, and it becomes crucial to remember data that is not an outlier. An example domain that has this large data variance is the sports domain. In the sports domain, there is high diversity and a large inter-class gap, and thus existing approaches do not perform well while classifying images of various sports. Another difficulty associated with learning systems is handling class imbalance. Labeled data for classes may be highly imbalanced and this affects the model trained so that its performance can be bad. This problem is perhaps exacerbated in a continual learning scenario where only a sample of the older data can be remembered. In case the data is highly imbalanced, it is possible that the rare classes get no representation in the sample at all, while the dense classes occupy all of the samples. This can potentially lead to overfitting or underfitting of the trained model. The goal of this paper is to address these problems. First, we want to design a more robust sampling scheme so that the data remembered is less prone to outliers. Perhaps an example real-life scenario will serve to convey the problem. Imagine attending a party at a large company you just started working for. At the party, you meet many people, but you want to retain the names of the "key" people. In particular, if you cannot remember everyone's name, perhaps it is good to remember those of the people in your group or closely related ones as well as the top company officials, but compromise on those of their friends or family members they may have invited. Second, our learning approach should be able to handle class imbalance gracefully. There have been intriguing findings in recent rehearsal-based CL papers -approaches that maintain a fraction of previously seen data when training new incoming classes (Prabhu et al., 2020) in class-incremental scenarios, hence, mitigating CF (Goodfellow et al., 2013) . However, as mentioned, in rehearsal-based class-incremental scenarios an important question arises: how should the representative memory be managed optimally? Due to the limited number of stored data points compared to the number of incoming data points, during training the stored data points could either suffer from over-fitting or be disregarded because of the large quantity of incoming data points. A naive approach would be to progressively raise the storage size as new tasks are coming; however, this technique neglects an important representative memory constraint, i.e., to store a fixed number of data points. Hence, an approach is required that can preserve enough information about the previous class while using a modest number of data points. The literature (Castro et al., 2018; Rebuffi et al., 2017; Wu et al., 2019; Zhao et al., 2020; He et al., 2020; Hou et al., 2019) mainly uses a herding algorithm (Welling, 2009) for choosing data points, also known as exemplars, that is based only on the class mean. As per Javed & Shafait (2018), herding algorithm is no better than a random selection of data points. Many researchers have proposed other effective algorithms to select data points or exemplars in rehearsal based method to mitigate the CF (Kim et al., 2020; Aljundi et al., 2019a; Chen & Lin, 2021; Wiewel & Yang, 2021) . As mentioned before however, none of the current approaches performed well in our experiments when the data variance is very large, such as in the sports domain. We propose a novel sampling algorithm that performs better than state-of-the-art methods in CL. Our proposed continual learning system is effective for both class-balanced and class imbalanced datasets. The proposed system is effective even when the dataset is sparse and intra-class variation is high. To test the performance of our system in a class imbalanced scenario, we use it for the image classification problem in the sports domain. For our experiments we have used Sports73, Sports100 and Tiny ImageNet datasets. See Figure 1 for a sample of images from Sports100 dataset. Our main contributions are as follows: (1) A novel sampling algorithm PBES to select exemplars is proposed that is robust even when outliers are present. (2) We show how to mitigate class imbalance issue in continual learning settings by using KeepAugment (Gong et al., 2021) -a data augmentation approach. (3) We demonstrate the efficiency of our proposed method using two class-imbalanced Sports73 and Sports100 image datasets and one balanced Tiny ImageNet dataset. We demonstrate that our overall continual learning system outperforms existing state-of-the-art approaches in all cases.

2. RELATED WORK IN CONTEXT

In this section, we mention some related work and how the work in this paper fits in the context. Intuitively, CL can be viewed as learning where the data is presented as a stream during the learning phase. Even so, it can be divided into tasks where each task is a group of related data. For concreteness, let us consider a classification scenario. In this case, a task is a data for a sequence of classes. Data for a class in a task is not repeated in future tasks. There are two main situations under which continual learning works (van de Ven et al., 2021) -(i) task-incremental (TIL): here a task ID is present along with class data. This comes of use later in testing, where during a classification query, the task ID of the (unknown) class is available to us, and, (ii) class-incremental (CIL): no such task

