DEJA VU: CONTINUAL MODEL GENERALIZATION FOR UNSEEN DOMAINS

Abstract

In real-world applications, deep learning models often run in non-stationary environments where the target data distribution continually shifts over time. There have been numerous domain adaptation (DA) methods in both online and offline modes to improve cross-domain adaptation ability. However, these DA methods typically only provide good performance after a long period of adaptation, and perform poorly on new domains before and during adaptation -in what we call the "Unfamiliar Period", especially when domain shifts happen suddenly and significantly. On the other hand, domain generalization (DG) methods have been proposed to improve the model generalization ability on unadapted domains. However, existing DG works are ineffective for continually changing domains due to severe catastrophic forgetting of learned knowledge. To overcome these limitations of DA and DG in handling the Unfamiliar Period during continual domain shift, we propose RaTP, a framework that focuses on improving models' target domain generalization (TDG) capability, while also achieving effective target domain adaptation (TDA) capability right after training on certain domains and forgetting alleviation (FA) capability on past domains. RaTP includes a trainingfree data augmentation module to prepare data for TDG, a novel pseudo-labeling mechanism to provide reliable supervision for TDA, and a prototype contrastive alignment algorithm to align different domains for achieving TDG, TDA and FA. Extensive experiments on Digits, PACS, and DomainNet demonstrate that RaTP significantly outperforms state-of-the-art works from Continual DA, Source-Free DA, Test-Time/Online DA, Single DG, Multiple DG and Unified DA&DG in TDG, and achieves comparable TDA and FA capabilities.

1. INTRODUCTION

A major concern in applying deep learning models to real-world applications is whether they are able to deal with environmental changes over time, which present significant challenges with data distribution shifts. When the shift is small, deep learning models may be able to handle it because their robustness is often evaluated and improved before deployment. However, when the data distribution shifts significantly for a period of time, in what we call the "Unfamiliar Period", model performance on new scenarios could deteriorate to a much lower level. For example, surveillance cameras used for environmental monitoring can work normally with excellent performance on clear days, but have inferior performance or even become "blind" when the weather turns bad or the lighting conditions become poor (Bak et al., 2018) . As another example, consider conducting lung imaging analysis for corona-viruses, deep learning models may present excellent performance after being trained on a large number of samples for certain variant (e.g., the Alpha variant of COVID-19), but are difficult to provide accurate and timely analysis for later variants (e.g., the Delta or Omicron variant) and future types of corona-viruses (Singh et al., 2020) when they just appear. In the following, we will first discuss related works, highlight their limitations in addressing the poor model performance during the Unfamiliar Period, and then introduce our approach. Domain adaptation (DA) methods have been proposed to tackle continual data drifts in dynamic environments in either online or offline mode. For example, Continual DA (Liu et al., 2020; Rostami, 2021) starts from a labeled source domain and continually adapts the model to various target domains, while keeping the model performance from degrading significantly on seen domains. However, existing Continual DA works often assume that the source domain can be accessed all the time, which may be difficult to guarantee in practical scenarios, especially considering the possible limitation on memory storage and regulations on privacy or intellectual property. Source-Free DA (Yang et al., 2021; Qu et al., 2022) can overcome this issue and achieve target adaptation without the source domain data. In addition, Test-Time or Online DA (Wang et al., 2022; Iwasawa & Matsuo, 2021; Panagiotakopoulos et al., 2022) can improve the target model performance with a small training cost; however, the target domain data is only learned once by the model and the performance improvement is limited (higher improvement would require a large amount of data). With these DA methods, although the model may perform better on the new target domain after sufficient adaptation, its performance on the target domain before and during the adaptation process, i.e., in the Unfamiliar Period, is often poor. In cases where the domain shift is sudden and the duration of seeing a new target domain is short, this problem becomes even more severe. In this work, we believe that for many applications, it is very important to ensure that the model can also perform reasonably well in the Unfamiliar Period, i.e., before seeing a lot of target domain data. For instance in environmental surveillance, having poor performance under uncommon/unfamiliar weather or lighting conditions may cause significant security and safety risks. In the example of lung imaging analysis for corona-viruses, being able to quickly provide good performance for detecting new variants is critical for the early containment and treatment of the disease. Domain generalization (DG) methods also solve the learning problem on multiple data domains, especially for cases where the target domain is unavailable or unknown during training. However, existing DG works are typically based on accurate supervision knowledge of the source domain data, whether it is drawn from a single domain (Wang et al., 2021c; Li et al., 2021) or multiple domains (Yao et al., 2022; Zhang et al., 2022) , which may not be achievable in continually changing scenarios. Moreover, when DG is applied in scenarios with continual domain shifts, as it focuses more on the target domain, there could be severe catastrophic forgetting of domains that have been learned. There are also some works unifying DA and DG (Ghifary et al., 2016; Motiian et al., 2017; Jin et al., 2021) ; however they can only be used in standard DA or DG individually, thus still suffering their limitations. Bai et al. (2022) and Nasery et al. ( 2021) study the smooth temporal shifts of data distribution, but they cannot handle large domain shifts over time. Our Approach and Contribution. In this work, we focus on the study of Continual Domain Shift Learning (CDSL) problem, in which the learning model is first trained on a labeled source domain and then faces a series of unlabeled target domains that appear continually. Our goal, in particular, is to improve model performance before and during the training stage of each previously-unseen target domain (i.e., in the Unfamiliar Period), while also maintaining good performance in the time periods after the training. Thus, we propose a framework called RaTP that optimizes three objectives: (1) to improve the model generalization performance on a new target domain before and during its training, namely the target domain generalization (TDG) performance, (2) to provide good model performance on a target domain right after its training, namely the target domain adaptation (TDA) performance, and (3) to maintain good performance on a trained domain after the model is trained with other domains, namely the forgetting alleviation (FA) performance. For improving TDG, RaTP includes a training-free data augmentation module that is based on Random Mixup, and this module can generate data outside of the current training domain. For TDA, RaTP includes a Top 2 Pseudo Labeling mechanism that lays more emphasis on samples with a higher possibility of correct classification, which can produce more accurate pseudo labels. Finally, for optimizing the model towards TDG, TDA, and FA at the same time, RaTP includes a Prototype Contrastive Alignment algorithm. Comprehensive experiments and ablation studies on Digits, PACS, and DomainNet demonstrate that RaTP can significantly outperform state-of-the-art works in TDG, including Continual DA, Source-Free DA, Test-Time/Online DA, Single DG, Multiple DG, and Unified DA&DG. RaTP can also produce comparable performance in TDA and FA as these baselines. In summary: • We tackle an important problem in practical scenarios with continual domain shifts, i.e., to improve model performance before and during training on a new target domain, in what we call the Unfamiliar Period. And we also try to achieve good model performance after training, providing the model with capabilities of target domain adaptation and forgetting alleviation.

