DIP-GNN: DISCRIMINATIVE PRE-TRAINING OF GRAPH NEURAL NETWORKS

Abstract

Graph neural network (GNN) pre-training methods have been proposed to enhance the power of GNNs. Specifically, a GNN is first pre-trained on a large-scale unlabeled graph and then fine-tuned on a separate small labeled graph for downstream applications, such as node classification. One popular pre-training method is to mask out a proportion of the edges, and a GNN is trained to recover them. However, such a generative method suffers from graph mismatch. That is, the masked graph input to the GNN deviates from the original graph. To alleviate this issue, we propose DiP-GNN (Discriminative Pre-training of Graph Neural Networks). Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges. The discriminator is subsequently used for downstream fine-tuning. In our pre-training framework, the graph seen by the discriminator better matches the original graph because the generator can recover a proportion of the masked edges. Extensive experiments on large-scale homogeneous and heterogeneous graphs demonstrate the effectiveness of the proposed framework. Our code will be publicly available.

1. INTRODUCTION

Graph neural networks (GNNs) have achieved superior performance in various applications, such as node classification (Kipf & Welling, 2017) , knowledge graph modeling (Schlichtkrull et al., 2018) and recommendation systems (Ying et al., 2018) . To enhance the power of GNNs, generative pretraining methods are developed (Hu et al., 2020b) . During the pre-training stage, a GNN incorporates topological information by training on a large-scale unlabeled graph in a self-supervised manner. Then, the pre-trained model is fine-tuned on a separate small labeled graph for downstream applications. Generative GNN pre-training is akin to masked language modeling in language model pre-training (Devlin et al., 2019) . That is, for an input graph, we first randomly mask out a proportion of the edges, and then a GNN is trained to recover the original identity of the masked edges. One major drawback with the abovementioned approach is graph mismatch. That is, the input graph to the GNN deviates from the original one since a considerable amount of edges are dropped. This causes changes in topological information, e.g., node connectivity. Consequently, the learned node embeddings may not be desirable. To mitigate the above issues, we propose DiP-GNN ( Discriminative Pre-training of Graph Neural Networks). In DiP-GNN, we simultaneously train a generator and a discriminator. The generator is trained similar to existing generative pre-training approaches, where the model seeks to recover the masked edges and outputs a reconstructed graph. Subsequently, the reconstructed graph is fed to the discriminator, which predicts whether each edge resides in the original graph (i.e., a true edge) or is wrongly constructed by the generator (i.e., a fake edge). After pre-training, we fine-tune the discriminator on downstream tasks. Figure 1 illustrates our training framework. Note that our work is related to Generative Adversarial Nets (GAN, Goodfellow et al. 2014) , and detailed discussions are presented in Section 3.4. We remark that similar approaches have been used in natural language processing (Clark et al., 2020) . However, we identify the graph mismatch problem (see Section 4.5), which is specific to graph-related applications and is not observed in natural language processing. The proposed framework is more advantageous than generative pre-training. This is because the reconstructed graph fed to the discriminator better matches the original graph compared with the 1

