FORWARD AND BACKWARD LIFELONG LEARNING WITH TIME-DEPENDENT TASKS

Abstract

For a sequence of classification tasks that arrive over time, lifelong learning methods can boost the effective sample size of each task by leveraging information from preceding and succeeding tasks (forward and backward learning). However, backward learning is often prone to a so-called catastrophic forgetting in which a task's performance gets worse while trying to repeatedly incorporate information from succeeding tasks. In addition, current lifelong learning techniques are designed for i.i.d. tasks and cannot capture the usual higher similarities between consecutive tasks. This paper presents lifelong learning methods based on minimax risk classifiers (LMRCs) that effectively exploit forward and backward learning and account for time-dependent tasks. In addition, we analytically characterize the increase in effective sample size provided by forward and backward learning in terms of the tasks' expected quadratic change. The experimental evaluation shows that LMRCs can result in a significant performance improvement, especially for reduced sample sizes.

1. INTRODUCTION

In practical scenarios, classification problems (tasks) often have limited sample sizes and arrive sequentially over time. Lifelong learning (also known as continual learning) can boost the effective sample size (ESS) of each task by leveraging information from preceding and succeeding tasks (forward and backward learning) (Ruvolo & Eaton, 2013; Lopez-Paz & Ranzato, 2017; Chen & Liu, 2018) . The general goal of such approaches is to replicate the humans' ability to continually improve the performance of each task exploiting information acquired from other tasks. The development of lifelong learning techniques is hindered by the continuous arrival of samples from tasks characterized by different underlying distributions. In particular, backward learning (also known as reverse transfer) is often prone to a so-called catastrophic forgetting in which a task's performance gets worse while trying to repeatedly incorporate information from the succeeding tasks (Kirkpatrick et al., 2017; Hurtado et al., 2021; Henning et al., 2021) . More generally, lifelong learning methods face a so-called stability-plasticity dilemma: the excessive usage of information from different tasks can result in a performance decrease while a moderate usage does not fully exploit the potential of lifelong learning (Rolnick et al., 2019; Ke et al., 2021) . Most of lifelong learning techniques are designed for tasks sampled i.i.d. from a task environment (Baxter, 2000; Maurer et al., 2016; Denevi et al., 2019) , and current methods cannot capture the usual higher similarities between consecutive tasks. For a sequence of tasks that arrive over time, it is common that the tasks are time-dependent and consecutive tasks are significantly more similar. For instance, if each task corresponds to the classification of portraits from a specific time period (Ginosar et al., 2015) , the similarity between tasks is markedly higher for consecutive tasks (see Figure 1 ). In the current literature of lifelong learning, only Pentina & Lampert (2015) considers scenarios with time-dependent tasks and analyzes the feasibility of transferring information from the preceding tasks. On the other hand, methods designed for concept drift adaptation (Zhao et al., 2020; Tahmasbi et al., 2021; Álvarez et al., 2022) account for time-dependent underlying distributions but only aim to learn the last task in the sequence. This paper presents lifelong learning methods based on minimax risk classifiers (LMRCs). The proposed techniques effectively exploit forward and backward learning and account for time-dependent tasks. Specifically, the main contributions presented in the paper are as follows.

