ADVERSARIAL CAUSAL AUGMENTATION FOR GRAPH COVARIATE SHIFT Anonymous

Abstract

Out-of-distribution (OOD) generalization on graphs is drawing widespread attention. However, existing efforts mainly focus on the OOD issue of correlation shift. While another type, covariate shift, remains largely unexplored but is the focus of this work. From a data generation view, causal features are stable substructures in data, which play key roles in OOD generalization. While their complementary parts, environments, are unstable features that often lead to various distribution shifts. Correlation shift establishes spurious statistical correlations between environments and labels. In contrast, covariate shift means that there exist unseen environmental features in test data. Existing strategies of graph invariant learning and data augmentation suffer from limited environments or unstable causal features, which greatly limits their generalization ability on covariate shift. In view of that, we propose a novel graph augmentation strategy: Adversarial Causal Augmentation (AdvCA), to alleviate the covariate shift. Specifically, it adversarially augments the data to explore diverse distributions of the environments. Meanwhile, it keeps the causal features stable across diverse environments. It maintains the environmental diversity while ensuring the invariance of the causal features, thereby effectively alleviating the covariate shift. Extensive experimental results with in-depth analyses demonstrate that AdvCA can outperform 14 baselines on synthetic and real-world datasets with various covariate shifts.



Graph learning mostly follows the assumption that training and test data are independently drawn from an identical distribution. Such an assumption is difficult to be satisfied in the wild, due to out-of-distribution (OOD) issues (Shen et al., 2021) , where the training and test data are from different distributions. Hence, OOD generalization on graphs is attracting widespread attention (Li et al., 2022b) . However, existing studies mostly focus on correlation shift, which is just one type of OOD issue (Ye et al., 2022; Wiles et al., 2022) . While another type, covariate shift, remains largely unexplored but is the focus of our work. Covariate shift is in stark contrast to correlation shift w.r.t. causal and environmental features of datafoot_0 . Specifically, from a data generation view, causal featuresfoot_1 are the substructures of the entire graphs that truly reflect the predictive property of data, while their complementary parts are the environmental features that are noncausal to the predictions. Following prior studies (Arjovsky et al., 2019; Wu et al., 2022b) , we assume causal features are stable across distributions, in contrast to the environmental features. Correlation shift denotes that environments and labels establish inconsistent statistical correlations in training and test data; whereas, covariate shift means that the environmental features in test data are unseen in training data (Ye et al., 2022; Wiles et al., 2022; Gui et al., 2022) . For example, in Figure 1 , the environmental features ladder and tree are different in training and test data, which forms the covariate shift (↔). Taking molecular property predictions as another example, functional groups (e.g., nitrogen dioxide (NO 2 )) are causal features that determine the predictive property of molecules. While scaffolds (e.g., carbon rings) are irrelevant patterns (Wu et al., 2018) , which can be seen as the environments. In practice, we often need to use molecular graphs collected in the past to train models, hoping that the models can predict the properties of molecules with new scaffolds in the future (Hu et al., 2020) . Because of the differences between correlation and covariate shifts, we take a close look at the existing efforts on graph generalization. Existing efforts (Li et al., 2022b ) mainly fall into the following research lines, each of which has inherent limitations to solve covariate shift. • Invariant graph learning (Wu et al., 2022b; Liu et al., 2022; Sui et al., 2022) However, they are prone to destroy the causal features, which easily loses control of the perturbed distributions. For example, in Figure 1 , the random strategy of DropEdge (Rong et al., 2020) will inevitably perturb the causal features (highlighted by red circles). As such, it fails to alleviate the covariate shift (↔), even degenerating the generalization ability. Scrutinizing the limitations of the aforementioned studies, insufficient environments and unstable causal features largely hinder the ability of these generalization efforts against the covariate shift. Hence, we naturally ask a question: "Can the augmented samples simultaneously preserve the diversity of environmental features and the invariance of causal features?" Towards this end, we first propose two principles for graph augmentation: environmental diversity and causal invariance. Specifically, environmental diversity encourages the augmentation to extrapolate unseen environments; meanwhile, causal invariance shortens the distribution gap between the augmented data and test data. To achieve these principles, we design a novel graph augmentation strategy: Adversarial Causal Augmentation (AdvCA). Specifically, we augment the graphs by a network, named adversarial augmenter. It adversarially generates the masks on edges and node features, which makes OOD exploration for improving the environmental diversity. To maintain the stability of the causal features, we adopt another network, named causal generator. It generates the masks that capture causal features. Finally, we delicately combine these masks and apply them to graph data. As shown in Figure 1 , AdvCA only perturbs the environmental features, while keeping the causal parts untorched. Our quantitative experiments also verify that AdvCA can narrow the distribution gap between the augmented data and test data, as illustrated in Figure 1 (↔), thereby effectively overcoming the covariate shift issues. Our contributions can be summarized as: • Problem: We are exploring one specific type of OOD issue in graph learning: covariate shift, which is of great need but largely unexplored. • Method: We design a graph augmentation method, AdvCA, which focuses on covariate shift issues. It maintains the stability of causal features while ensuring environmental diversity. • Experiment: We conduct extensive experiments on synthetic and real datasets. The experimental results with in-depth analyses demonstrate the effectiveness of AdvCA.

2. PRELIMINARIES

In this section, we first give the formal definitions of causal features, environmental features, and graph covariate shift. Then we present the problem of graph classification under covariate shift.



We provide detailed discussions of these two distribution shifts in Appendix C. We provide a formal definition in Assumption 1.



Figure 1: P train and P test denote the training and test distributions. P drop and P ours represent the distributions of augmented data via DropEdge and AdvCA. AdvCA establishes a smaller covariate shift (↔) with test distribution than DropEdge (↔).

gradually becomes a prevalent paradigm for OOD generalization. The main idea is to capture the causal features by minimizing the empirical risks within different environments. Unfortunately, it implicitly makes a prior assumption that all test environments are available during training. This assumption is unrealistic owing to the obstacle of training data covering all possible test environments. Learning in limited environments can only alleviate the spurious correlations that are hidden in the training data, but fail to extrapolate test distributions with unseen environments.• Graph data augmentation(Ding et al., 2022; Zhao et al., 2022)  perturbs graph features to enrich the distribution seen during training for better generalization. It can be roughly divided into

