DEBIASED GRAPH NEURAL NETWORKS WITH AGNOSTIC LABEL SELECTION BIAS

Abstract

Most existing Graph Neural Networks (GNNs) are proposed without considering the selection bias in data, i.e., the inconsistent distribution between the training set with test set. In reality, the test data is not even available during the training process, making selection bias agnostic. Training GNNs with biased selected nodes leads to significant parameter estimation bias and greatly impacts the generalization ability on test nodes. In this paper, we first present an experimental investigation, which clearly shows that the selection bias drastically hinders the generalization ability of GNNs, and theoretically prove that the selection bias will cause the biased estimation on GNN parameters. Then to remove the bias in GNN estimation, we propose a novel Debiased Graph Neural Networks (DGNN) with a differentiated decorrelation regularizer. The differentiated decorrelation regularizer estimates a sample weight for each labeled node such that the spurious correlation of learned embeddings could be eliminated. We analyze the regularizer in causal view and it motivates us to differentiate the weights of the variables based on their contribution on the confounding bias. Then, these sample weights are used for reweighting GNNs to eliminate the estimation bias, thus help to improve the stability of prediction on unknown test nodes. Comprehensive experiments are conducted on several challenging graph datasets with two kinds of label selection bias. The results well verify that our proposed model outperforms the state-of-the-art methods and DGNN is a flexible framework to enhance existing GNNs.

1. INTRODUCTION

Graph Neural Networks (GNNs) are powerful deep learning algorithms on graphs with various applications (Scarselli et al., 2008; Kipf & Welling, 2016; Veličković et al., 2017; Hamilton et al., 2017) . Existing GNNs mainly learn a node embedding through aggregating the features from its neighbors, and such message-passing framework is supervised by node label in an end-to-end manner. During this training procedure, GNNs will effectively learn the correlation between the structure pattern and node feature with node label, so that GNNs are capable of learning the embeddings of new nodes and inferring their labels. One basic requirement of GNNs making precise prediction on unseen test nodes is that the distribution of labeled training and test nodes is same, i.e., the structure and feature of labeled training and test nodes follow the similar pattern, so that the learned correlation between the current graph and label can be well generalized to the new nodes. However, in reality, there are two inevitable issues. (1) Because it is difficult to control the graph collection in an unbiased environment, the relationship between the collected real-world graph and the labeled nodes is inevitably biased. Training on such graph will cause biased correlation with node label. Taking a scientist collaboration network as an example, if most scientists with "machine learning" (ML) label collaborate with those with "computer vision" (CV) label, existing GNNs may learn spurious correlation, i.e., scientists who cooperate with CV scientist are ML scientists. If a new ML scientist only connects with ML scientists or the scientists in other areas, it will be probably misclassified. (2) The test node in the real scenario is usually not available, implying that the distribution of new nodes is agnostic. Once the distribution is inconsistent with that in the training nodes, the performance of all the current GNNs will be hindered. Even transfer learning is able to solve the distribution shift problem, however, it still needs the prior of test distribution, which actually cannot be obtained beforehand. Therefore, the agnostic label selection bias greatly affects the generalization ability of GNNs on unknown test data. The data selection bias will cause the spurious correlation between these two kinds of variables. Thereby we prove that with the inevitable model misspecification, the spurious correlation will further cause the parameter estimation bias. Once the weakness of the current GNNs with selection bias is identified, one natural question is "how to remove the estimation bias in GNNs?" In this paper, we propose a novel Debiased Graph Neural Network (DGNN) framework for stable graph learning by jointly optimizing a differentiated decorrelation regularizer and a weighted GNN model. Specifically, the differentiated decorrelation regularizer is able to learn a set of sample weights under differentiated variable weights, so that the spurious correlation between stable and unstable variables would be greatly eliminated. Based on the causal view analysis of decorrelation regularizer, we theoretically prove that the weights of variables can be differentiated by the regression weights. Moreover, to better combine the decorrelation regularizer with GNNs, we prove that adding the regularizer to the embedding learned by the second to last layer could be both theoretically sound and flexible. Then the sample weights learned by decorrelation regularizer are used to reweight GNN loss so that the parameter estimation could be unbiased. In summary, the contributions of this paper are three-fold: i) We investigate a new problem of learning GNNs with agnostic label selection bias. The problem setting is general and practical for real applications. ii) We bring the idea of variable decorrelation into GNNs to relieve bias influence on model learning and propose a general framework DGNN which could be adopted to various GNNs. iii) We conduct the experiments on real-world graph benchmarks with two kinds of agnostic label selection bias, and the experimental results demonstrate the effectiveness and flexibility of our model. 

2.1. EXPERIMENTAL INVESTIGATION

We conduct an experimental investigation to examine whether the state-of-the-art GNNs are sensitive to the selection bias. The main idea is that we will perform two representative GNNs: GCN (Kipf



Figure 1: Effect of selection bias on GCN and GAT.

In this section, we first formulate our target problem as follows: Problem 1 (Semi-supervised Learning on Graph with Agnostic Label Selection Bias). Given a training graph G train = {A train , X train , Y train }, where A train ∈ R N ×N (N nodes) represents the adjacency matrix, X train ∈ R N ×D (D features) refers to the node features and Y train ∈ R n×C (n labeled nodes, C classes) refers to the available labels for training (n ≪ N ), the task is to learn a GNN g θ (⋅) with parameter θ to precisely predict the label of nodes on test graph G test = {A test , X test , Y test }, where distribution Ψ(G train ) ≠ Ψ(G test ).

