OPTIMIZING SPCA-BASED CONTINUAL LEARNING: A THEORETICAL APPROACH

Abstract

Catastrophic forgetting and the stability-plasticity dilemma are two major obstacles to continual learning. In this paper, we first propose a theoretical analysis of a SPCA-based continual learning algorithm using high-dimensional statistics. Second, we design OSCL (Optimized Spca-based Continual Learning) which builds on a flexible task optimization based on the theory. By optimizing a single task, catastrophic forgetting can be prevented theoretically. While optimizing multitasks, the trade-off between integrating knowledge from the new task and retaining previous knowledge of the old tasks can be achieved by assigning appropriate weights to corresponding tasks in compliance with the objectives. Experimental results confirm that the various theoretical conclusions are robust to a wide range of data distributions. Besides, several applications on synthetic and real data show that the proposed method while being computationally efficient, achieves comparable results with some state of the art.

1. INTRODUCTION

Continual learning paradigm. Machine learning methods generally learn from samples of data randomly drawn from a stationary distribution. However, this scenario is rare in reality. Continual learning (CL) is a particular machine learning paradigm in which data continuously arrive in a possibly non i.i.d. way and knowledge is accumulated over time (Schlimmer & Fisher, 1986; Ebrahimi et al., 2019; Lee et al., 2020; De Lange et al., 2021) . For designing real-world machine learning systems that mimic humans, continual learning is essential. On the one hand, humans continue to acquire knowledge and solve new problems throughout their lifetimes. The goal of continual learning is to mimic the capacity of humans to learn from a non-stationary data stream without forgetting catastrophically the learned knowledge (Titsias et al., 2019; Lee et al., 2020) . On the other hand, when deploying a trained model in real applications, the distribution of data will consistently drift over time. Therefore, the machine learning algorithm must be able to adapt continuously to these changes (Kirkpatrick et al., 2017; Lesort et al., 2020) . Challenges in continual learning. One of the major challenges of continual learning is to avoid catastrophic forgetting (McCloskey & Cohen, 1989; Chen & Liu, 2018; Aljundi, 2019) . This occurs when the performance of previous tasks is severely degraded during the learning process. To take into account both the current task and the previous tasks, the stability-plasticity dilemma was introduced (Nguyen et al., 2017; Rajasegaran et al., 2019) . More specifically, plasticity refers to the ability of integrating new knowledge, and stability to the capacity of retaining previous knowledge (which is related to catastrophic forgetting). Note that the term catastrophic forgetting, although strongly referenced in the literature in deep neural network models, is a fairly general concept that can occur in any machine learning algorithm as it has been noted in shadow single-layer models, such as self-organizing feature maps (Richardson & Thomas, 2008; Chen & Liu, 2018) .

