ONLINE CONTINUAL LEARNING UNDER CONDITIONAL DOMAIN SHIFT

Abstract

Existing continual learning benchmarks often assume each task's training and test data are from the same distribution, which may not hold in practice. Towards making continual learning practical, in this paper, we introduce a novel setting of online continual learning under conditional domain shift, in which domain shift exists between training and test data of all tasks: P tr (X, Y ) = P te (X, Y ), and the model is required to generalize to unseen domains at test time. To address this problem, we propose Conditional Invariant Experience Replay (CIER) that can simultaneously retain old knowledge, acquire new information, and generalize to unseen domains. CIER employs an adversarial training to correct the shift in P (X, Y ) by matching P (X|Y ), which results in an invariant representation that can generalize to unseen domains during inference. Our extensive experiments show that CIER can bridge the domain gap in continual learning and significantly outperforms state-of-the-art methods. We will release our benchmarks and implementation upon acceptance.

1. INTRODUCTION

Continual learning is a promising framework towards human-level intelligence by developing models that can continuously learn over time (Ring, 1997; Parisi et al., 2019) . Unlike traditional learning paradigms, continual learning methods observe a continuum of tasks and have to simultaneously perform well on all tasks with limited access to previous data. Therefore, they have to achieve a good trade-off between retaining old knowledge (French, 1999) and acquiring new skills, which is referred to as the stability-plasticity dilemma (Abraham & Robins, 2005) . Continual learning is not only a challenging research problem but also has tremendous impacts on many applications (Diethe et al., 2019) . In particular, the deployed model may encounter new problems over time, and re-training every time new data arrive is infeasible, especially for large, complex neural networks. Despite recent success, existing continual learning benchmarks assume the data of each class are drawn from the same distribution during training and testing, i.e. P tr (X, Y ) = P te (X, Y ). This restrictive assumption is unlikely to hold in practice and prohibits the use of continual learning strategies in numerous real-world applications. For example, consider a continual learning agent already trained to recognize certain objects in the indoor environment; then, it moves to a new outdoor environment to acquire new skills. The agent may fail to recognize the learned objects because they are placed in a completely different background. Such an environment change is referred to as domain shift (Khosla et al., 2012; Long et al., 2015; Diethe et al., 2019) , and is very natural for continual learning in practice (Diethe et al., 2019) . To formalize the continual learning under domain shift problem, we need to consider the interaction between domain and the observation. In many computer vision applications, the causal structure is usually Y → X, i.e., the object class is the cause for image features (Lopez-Paz et al., 2017) . This setting ignores the effect of conditional domain shift (Zhang et al., 2013) , where both the domain and the object class are the causes for the image features. That is, images may come from different domains, but they have the same label. Figure 1 shows the causal graph for continual learning under conditional domain shift problem and an illustrative example of the domain shift between training and testing data. Continual learning under conditional domain shift requires the model to perform previous tasks in new domains, which poses a great challenge since we do not know in hindsight which domain will be tested. Therefore, the model needs to learn the concepts presented in the data while ignoring the domains. For example, placing an object in different background does not change the label of that object. As a result, the model needs to achieve an invariant representation using data from various source domains observed during training, i.e. the joint distribution P (X, Y ) is the same for all source domains. To this end, we develop Conditional Invariant Experience Replay (CIER) that can simultaneously retain previous knowledge, learn new tasks, and generalize to new domains. CIER employs an episodic memory to perform experience replay and an adversarial loss to achieve an invariant representation. Particularly, CIER formulates a multiplayer minimax game such that the conditional distribution P (X|Y ) is stable across each class's observed source domains. Therefore, if the prior distribution P (Y ) is stable in the target domains, i.e., no class imbalance, CIER can achieve the invariant representation in the joint distribution P (X, Y ) and generalize to unseen domains. Fig. 2 provides a high level overview of the proposed CIER method. In summary, we formalize the continual learning under conditional domain shift problem and construct three novel benchmarks using real data with different levels of domain shift and diversity in the number of domains, tasks, and classes. Then, we develop CIER, a novel continual learning method that learns a conditional invariant representation and generalizes to novel domains. Our extensive experiments demonstrate the limitations of existing continual learning methods when being tested on unseen domains and show that the proposed CIER can mitigate such domain gaps effectively.

2. RELATED WORK

2.1 CONTINUAL LEARNING Continual learning aims at developing a model that can continuously learn different tasks over a data continuum. In literature, there are different continual learning protocols with different properties of the continuum. First, a setting can be either task-free (Rebuffi et al., 2017; Aljundi et al., 2019b) or task-aware (Kirkpatrick et al., 2017; Lopez-Paz & Ranzato, 2017) based on whether a task-indicator



Figure 1: (a) A causal model for the online continual learning under conditional domain shift. T is the number of tasks and N t is the number of training samples of task t. The causal interaction between the domains and images are not considered in the traditional setting. (b) An example of the domain shift, sampled images are extracted from the DomainNet dataset (Peng et al., 2019).

