TEMPORAL DOMAIN GENERALIZATION WITH DRIFT-AWARE DYNAMIC NEURAL NETWORKS

Abstract

Temporal domain generalization is a promising yet extremely challenging area where the goal is to learn models under temporally changing data distributions and generalize to unseen data distributions following the trends of the change. The advancement of this area is challenged by: 1) characterizing data distribution drift and its impacts on models, 2) expressiveness in tracking the model dynamics, and 3) theoretical guarantee on the performance. To address them, we propose a Temporal Domain Generalization with Drift-Aware Dynamic Neural Network (DRAIN) framework. Specifically, we formulate the problem into a Bayesian framework that jointly models the relation between data and model dynamics. We then build a recurrent graph generation scenario to characterize the dynamic graph-structured neural networks learned across different time points. It captures the temporal drift of model parameters and data distributions and can predict models in the future without the presence of future data. In addition, we explore theoretical guarantees of the model performance under the challenging temporal DG setting and provide theoretical analysis, including uncertainty and generalization error. Finally, extensive experiments on several real-world benchmarks with temporal drift demonstrate the proposed method's effectiveness and efficiency.

1. INTRODUCTION

In machine learning, researchers often assume that training and test data follow the same distribution for the trained model to work on test data with some generalizability. However, in reality, this assumption usually cannot be satisfied, and when we cannot make sure the trained model is always applied in the same domain where it was trained. This motivates Domain Adaptation (DA) which builds the bridge between source and target domains by characterizing the transformation between the data from these domains (Ben-David et al., 2010; Ganin et al., 2016; Tzeng et al., 2017) . However, in more challenging situations when target domain data is unavailable (e.g., no data from an unknown area, no data from the future, etc.), we need a more realistic scenario named Domain Generalization (DG) (Shankar et al., 2018; Arjovsky et al., 2019; Dou et al., 2019) . Most existing works in DG focus on generalization among domains with categorical indices, such as generalizing the trained model from one dataset (e.g., MNIST (LeCun et al., 1998) ) to another (e.g., SVHN (Netzer et al., 2011 )), from one task (e.g., image classification (Krizhevsky et al., 2012) ) to another (e.g., image segmentation (Lin et al., 2014)), etc. However, in many real-world applications, the "boundary" among different domains is unavailable and difficult to detect, leading to a concept drift across the domains. For example, when a bank leverages a model to predict whether a person will be a "defaulted borrower", features like "annual incoming", "profession type", and "marital status" are considered. However, due to the temporal change of the society, how these feature values indicate the prediction output should change accordingly following some trends that could be predicted somehow in a range of time. Figure 1 shows another example, seasonal flu prediction via Twitter data which evolves each year in many aspects. For example, monthly active users are increasing, new friendships are formed, the age distribution is shifting under some trends, etc. Such temporal change in data distribution gradually outdated the models. Correspondingly, suppose there was an ideal, always update-to-date model, then the model parameters should gradually change However, as an extension of traditional DG, temporal DG is extremely challenging yet promising. Existing DG methods that treat the domain indices as a categorical variable may not be suitable for temporal DG as they require the domain boundary as apriori to learn the mapping from source to target domains (Muandet et al., 2013; Motiian et al., 2017; Balaji et al., 2018; Arjovsky et al., 2019) . Until now, temporal domain indices have been well explored only in DA (Hoffman et al., 2014; Ortiz-Jimenez et al., 2019; Wang et al., 2020) but not DG. There are very few existing works in temporal DG due to its big challenges. One relevant work is Sequential Learning Domain Generalization (S-MLDG) (Li et al., 2020) that proposed a DG framework over sequential domains via meta-learning (Finn et al., 2017) . S-MLDG meta-trains the target model on all possible permutations of source domains, with one source domain left for meta-test. However, S-MLDG in fact still treats domain index as a categorical variable, and the method was only tested on categorical DG dataset. A more recent paper called Gradient Interpolation (GI) (Nasery et al., 2021) proposes a temporal DG algorithm to encourage a model to learn functions that can extrapolate to the near future by supervising the first-order Taylor expansion of the learned function. However, GI has very limited power in characterizing model dynamics because it can only learn how the activation function changes along time while making all the remaining parameters fixed across time. The advancement of temporal domain generalization is challenged by several critical bottlenecks, including 1) Difficulty in characterizing the data distribution drift and its influences on models. Modeling the temporally evolving distributions requires making the model time-sensitive. Intuitive ways include feeding the time as an input feature to the model, which is well deemed simple yet problematic as it discards the other features' dependency on time and dependency on other confounding factors changed along time (Wang et al., 2020) . Another possible way is to make the model parameters a function of time. However, these ways cannot generalize the model to future data as long as the whole model's dynamics and data dynamics are not holistically modeled. 2) Lack of expressiveness in tracking the model dynamics. Nowadays, complex tasks have witnessed the success of big complex models (e.g., large CNNs (Dosovitskiy et al., 2020)) , where the neurons and model parameters are connected as a complex graph structure. However, they also significantly challenge tracking their model dynamics in temporal DG. An expressive model dynamics characterization and prediction requires mapping data dynamics to model dynamics and hence the graph dynamics of model parameters across time. This is a highly open problem, especially for the temporal DG area. 3) Difficulty in theoretical guarantee on the performance. While there are fruitful theoretical analyses on machine learning problems under the independent and identically distributed assumptions (He & Tao, 2020), similar analyses meet substantial hurdles to be extended to out-of-distribution (OOD) problem due to the distribution drift over temporally evolving domains. Therefore, it is essential to enhance the theoretical analyses on the model capacity and theoretical relation among different temporal domain generalization models. To address all the above challenges, we propose a Temporal Domain Generalization with DRift-Aware dynamIc neural Networks (DRAIN) framework that solves all challenges above simultane-



Figure 1: An illustrative example of temporal domain generalization. Consider training a model for some classification tasks based on the annual Twitter dataset such that the trained model can generalize to the future domains (e.g., 2023). The temporal drift of data distribution can influence the prediction model such as the rotation of the decision boundary in this case. correspondingly to counter the trend of data distribution shifting across time. It can also "predict" what the model parameters should look like in an arbitrary (not too far) future time point. This requires the power of temporal domain generalization.

