TOWARDS RELIABLE LINK PREDICTION WITH ROBUST GRAPH INFORMATION BOTTLENECK

Abstract

Link prediction on graphs has achieved great success with the rise of deep graph learning. However, the potential robustness under the edge noise is less investigated. We reveal that the inherent edge noise that naturally perturbs both input topology and target label leads to severe performance degradation and representation collapse. In this work, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the general information bottleneck, RGIB decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives toward robust representation. We also provide two instantiations, RGIB-SSL and RGIB-REP, which benefit from different methodologies, i.e., self-supervised learning and data reparametrization, for implicit and explicit data denoising, respectively. Extensive experiments on 6 benchmarks of various scenarios verify the effectiveness of the proposed RGIB.

1. INTRODUCTION

As a fundamental problem in graph learning, link prediction (Liben-Nowell & Kleinberg, 2007) has attracted growing interest in real-world applications like drug discovery (Ioannidis et al., 2020) , knowledge graph completion (Bordes et al., 2013) , and question answering (Huang et al., 2019) . Recent advances from heuristic designs (Katz, 1953; Page et al., 1999) to graph neural networks (GNNs) (Kipf & Welling, 2016a; Gilmer et al., 2017; Kipf & Welling, 2016b; Zhang & Chen, 2018; Zhu et al., 2021) have achieved superior performances. Nevertheless, the poor robustness in imperfect scenarios with the inherent edge noise is still a practical bottleneck to the current deep graph models (Gallagher et al., 2008; Ferrara et al., 2016; Wu et al., 2022a; Dai et al., 2022) . Early explorations improve the robustness of GNNs for node classification under label noise (Dai et al., 2021; Li et al., 2021) through the smoothing effect of neighboring nodes. Other methods achieve a similar goal via randomly removing edges (Rong et al., 2020) or actively selecting the informative nodes or edges and pruning the task-irrelevant ones (Zheng et al., 2020; Luo et al., 2021) . However, when applying these noise-robust methods to the link prediction with noise, only marginal improvements are achieved (see Section 5). The attribution is that the edge noise can naturally deteriorate both the input topology and the target labels (Figure 1(a) ). Previous works that consider the noise either in input space or label space cannot effectively deal with such a coupled scenario. Therefore, it raises a new challenge to understand and tackle the edge noise for robust link prediction. In this paper, we dive into the inherent edge noise and empirically show the significantly degraded performances it leads to (Section 3.1). Then, we reveal the negative effect of the edge noise through carefully inspecting the distribution of learned representations, and discover that graph representation is severely collapsed, which is reflected by much lower alignment and poorer uniformity (Section 3.2). To solve this challenging problem, we propose the Robust Graph Information Bottleneck (RGIB) principle based on the basic GIB for adversarial robustness (Wu et al., 2020 ) (Section 4.1). Conceptually, the RGIB principle is with new learning objectives that decouple the mutual information (MI) among noisy inputs Ã, noisy labels Ỹ , and the representation H. As illustrated in Figure 1(b) , RGIB generalizes the basic GIB to learn a robust representation that is resistant to the edge noise. Technically, we provide two instantiations of RGIB based on different methodologies, i.e., RGIB-SSL and RGIB-REP: (1) the former utilizes contrastive pairs with automatically augmented views

