SOLVING CONTINUAL LEARNING VIA PROBLEM DECOMPOSITION

Abstract

This paper is concerned with class incremental learning (CIL) in continual learning (CL). CIL is the popular continual learning paradigm in which a system receives a sequence of tasks with different classes in each task and is expected to learn to predict the class of each test instance without given any task related information for the instance. Although many techniques have been proposed to solve CIL, it remains to be highly challenging due to the difficulty of dealing with catastrophic forgetting (CF). This paper starts from the first principle and proposes a novel method to solve the problem. The definition of CIL reveals that the problem can be decomposed into two probabilities: within-task prediction probability and task-id prediction probability. This paper proposes an effective technique to estimate these two probabilities based on the estimation of feature distributions in the latent space using incremental PCA and Mahalanobis distance. The proposed method does not require a memory buffer to save replay data and it outperforms strong baselines including replay-based methods. 1

1. INTRODUCTION

Continual learning (CL) is a learning problem where a system learns and accumulates knowledge over time without forgetting the previous knowledge (Chen & Liu, 2018) . The key challenge is the catastrophic forgetting (CF), which is a phenomenon that the system corrupts the learned knowledge in the past in learning a new task (McCloskey & Cohen, 1989) . This paper focuses on the challenging CL setting of class incremental learning (CIL) (Rebuffi et al., 2017) in the offline (or batch) mode. In this setting, the system learns a sequence of classification tasks incrementally, where each task arrives with all its training data of a set of classes. The resulting classifier can identify the class of a test instance among all the classes learned in the process with no task information provided. The other popular setting of CL is task incremental learning (TIL), which builds a separate model for each task and in testing, the test instance together with the task-id that the test instance belongs to are provided so that the system can use the model of the specific task to classify the instance. Existing approaches to CIL can be grouped into several categories. Regularization (Kirkpatrick et al., 2017) or distillation (Li & Hoiem, 2016) tries not to change the parameters or knowledge that are important to old tasks when learning the new task. Replay/memory-based approaches (Rebuffi et al., 2017) save some old data and use them jointly with the new task data to learn the new task and to preserve/adjust the old knowledge. Parameter isolation approaches (Serra et al., 2018) expand the network or mask out the important parameters for old tasks (see Sec. 2 for more details). Our approach is entirely different and is derived directly from the definition of the CIL setting. Definition: Class incremental learning (CIL) learns a sequence of tasks 1, ..., t, where each task i has a training data D i = {(x i j , y i j )} n i j=1 with x i j ∈ X i (input space) and y i j ∈ Y i (class label space). The class labels of tasks are disjoint, Y i ∩ Y k = ∅ for any i ̸ = kfoot_1 . Let X = ∪ t i=1 X i and Y = ∪ t i=1 Y i . The goal is to learn a function f : X → Y to predict the class label of test case x.



The code is included in the Supplementary Material. In (Bang et al., 2021), tasks are considered to have shared classes. For instance, the system receives two datasets D 1 and D 2 consisting of classes {y1, y2} and {y1, y3, y4}, respectively. We define task 1 and 2 consisting of {y1, y2} and {y3, y4}, respectively, and consider the samples of shared label y1 as additional training data for task 1. This work does not consider this learning scenario. We leave it for our future work.

