GRAPH DOMAIN ADAPTATION VIA THEORY-GROUNDED SPECTRAL REGULARIZATION

Abstract

Transfer learning on graphs drawn from varied distributions (domains) is in great demand across many applications. Emerging methods attempt to learn domaininvariant representations using graph neural networks (GNNs), yet the empirical performances vary and the theoretical foundation is limited. This paper aims at designing theory-grounded algorithms for graph domain adaptation (GDA). (i) As the first attempt, we derive a model-based GDA bound closely related to two GNN spectral properties: spectral smoothness (SS) and maximum frequency response (MFR). This is achieved by cross-pollinating between the OT-based (optimal transport) DA and graph filter theories. (ii) Inspired by the theoretical results, we propose algorithms regularizing spectral properties of SS and MFR to improve GNN transferability. We further extend the GDA theory into the more challenging scenario of conditional shift, where spectral regularization still applies. (iii) More importantly, our analyses of the theory reveal which regularization would improve performance of what transfer learning scenario, (iv) with numerical agreement with extensive real-world experiments: SS and MFR regularizations bring more benefits to the scenarios of node transfer and link transfer, respectively. In a nutshell, our study paves the way toward explicitly constructing and training GNNs that can capture more transferable representations across graph domains.

1. INTRODUCTION

Many applications call for "transferring" graph representations learned from one distribution (domain) to another, which we refer to as graph domain adaptation (GDA). Examples include temporally-evolved social networks (Wang et al., 2021) , molecules of different scaffolds (Hu et al., 2019) , and protein-protein interaction networks in various species (Cho et al., 2016) . In general, this setting of transfer learning is challenging due to the data-distribution shift between the training (source) and test (target) domains (i.e. P S (G, Y ) ̸ = P T (G, Y )). In particular, such a challenge escalates for graph-structured data that are abstractions of diverse nature (You et al., 2021; 2022) . Despite the tremendous needs arising from real-world applications, current methods for GDA (as reviewed in Section 2) mostly fall short in delivering competitive target performance with theoretical guarantee. Inevitably those approaches assuming distribution invariance (or adopting heuristic principles) are restricted in theory (Garg et al., 2020; Verma & Zhang, 2019) . The emerging approaches (Zhang et al., 2019; Wu et al., 2020) straightforwardly apply adversarial training between source and target representations, intentionally founded on the DA theory to bound the target risk (Redko et al., 2020) . However, the generic DA bound in theory is agnostic to graph data and models, which could be more precisely tailored for graphs. We therefore set out to explore the following question: How to design algorithms to boost transfer performance across different graph domains, with the grounded theoretical foundation? Our step-by-step answers are as follows. (i) Derivation of model-based GDA bound. Building upon the rigorous assurance established in the DA theory (Section 3), we start by directly rewriting the OT-based (optimal transport) DA bound (Redko et al., 2017; Shen et al., 2018) for graphs (Corollary 1), which is closely coupled with the Lipschitz constant of graph encoders. The nontrivial challenge here is how to formulate GNN Lipschitz w.r.t the distance metric of non-Euclidean data. Leveraging the graph filter theory (Gama et al., 2020; Arghal et al., 2021) , we first state that GNNs can be constructed stably w.r.t. the misalignment of edges and that of node features, multiplied by two spectral properties respectively: spectral smoothness (SS) and maximum frequency response (MFR) (Lemma 1). Subsequently, we utilize SS and MFR to formulate GNN Lipschitz w.r.t graph distances into a general form, and instantiate it as (informally) max O(SS), O(MFR) w.r.t. the commonly-used matching distance (Gama et al., 2020; Arghal et al., 2021) (Lemma 2). This leads to the first model-based GDA bound. (ii) Theory-grounded spectral regularization. One potential way to tighten the DA bound is to modulate the Lipschitz constant (Section 3). Guided by the theoretical results above, we are wellmotivated to propose spectral regularization (i.e. SSReg and MFRReg) to restrict the target risk bound (Section 4.2). We also extend the GDA theory into the more challenging conditional-shift scenario (Li et al., 2021a; Zhao et al., 2019 ) (Lemma 3), where spectral regularization still applies. (iii) Interpretation on how theory drives practice. Our further analyses on theory reveal which regularization would improve performance of what graph transfer scenario: specifically, SSReg and MFRReg are respectively beneficial to the scenarios of node transfer and link transfer (Section 4.2), (iv) with extensive numerical evidences from un/semi-supervised (cross-species protein-protein interaction) link prediction and (temporally-shifted paper topic) node classification (Section 5).

2. RELATED WORKS

Self-supervision on graphs. Graph self-supervised learning, surging recently, learns empirically more generalizable representations through exploiting vast unlabelled graph data (please refer to (Xie et al., 2021) for a comprehensive review). The success of self-supervision largely hinges on big data and, more importantly, heuristically-designed pretext tasks. The tasks can be predictive (Velickovic et al., 2019; Hu et al., 2019; Jin et al., 2020; You & Shen, 2022; You et al., 2020b; Chien et al., 2021; Talukder et al., 2022) or contrastive (You et al., 2020a; Zhu et al., 2020b; Qiu et al., 2020; Wei et al., 2022) , which does not provide theoretical guarantee of the target performance and, as a result, occasionally leads to "negative transfer" in practice (Hu et al., 2019; You et al., 2020a) . Transferring GNNs with explicit covariate shifts. To promote target performance, one line of work is to utilize more data and make specific assumptions. One such example is to assume the access to source labels and the explicit covariate shift that P S (Y |G) = P T (Y |G) and P S (G) ̸ = P T (G) in a specific way, which enables theoretical tools for certain guarantees. (Ruiz et al., 2020; Yehudai et al., 2021) study the specific setting of size generalization and use the graphon theory (Lovász, 2012) to develop size-invariant representations. (Bevilacqua et al., 2021) works on transfer learning in shifting d-patterns of subgraphs and adopts the theory of GNN expressiveness (Xu et al., 2018; Morris et al., 2019) to demonstrate the existence of negative-transferring GNNs despite their universal approximation capability. Accordingly, the study proposes d-pattern classification pre-training to help escape from negative-transferring GNNs. These methods are restricted to the designated transfer learning scenarios. Besides, some other works (Fan et al., 2021; Sui et al., 2021; Li et al., 2021b; Kenlay et al., 2021; Chen et al., 2022; Li et al., 2022; Zhang et al., 2022; Jin et al., 2022) adopt the implicit covariate shift assumption with source labels while lacking assurance in theory, e.g. (Wu et al., 2022a; b) assumes that the shift could be implicitly modeled with an environment learner (please refer to (Gui et al., 2022) for a comprehensive review). Graph domain adaptation. To deliver a generally applicable guarantee, several methods (Dai et al., 2019; Cai et al., 2021; Zhang et al., 2019; Wu et al., 2020; Xu et al., 2022) additionally utilize target graphs to learn domain-invariant representations. According to the DA theory (Ben-David et al., 2007; 2010; Redko et al., 2020; Zhang et al., 2020; Yan et al., 2017) , the target risk is guaranteed to be bounded (please refer to (Redko et al., 2020) for a comprehensive review). The generic DA bound is not designated for graph data or encoders where further improvement could be achieved.

3. PRELIMINARIES

Problem setup. We are given i.i.d. samples (Verma & Zhang, 2019; Zhu et al., 2021; Cong et al., 2021) 



and their labels {(G n , Y n )} NS n=1 from the source distribution P S (G, Y ) of graphs G ∈ G and labels Y ∈ Y, where G = {V, E} is associated with the set of nodes V and edges E, together with the node feature X ∈ R |V |×D and adjacency matrices A ∈ R |V |×|V | . We also have access to unlabeled samples {G n } NT n=1 from the marginalized target distribution P T (G, Y )dY . With the covariate shift assumption that P S (G) ̸ = P T (G), P S (Y |G) = P T (Y |G) (Ben-David et al., 2007; 2010), we are expected to train a graph neural network (GNN) h : G → Y with the accessible data and then evaluate on target samples from P T (G, Y ).

availability

://github.com/Shen-Lab/

