CAN GNNS LEARN HEURISTIC INFORMATION FOR LINK PREDICTION?

Abstract

Graph Neural Networks (GNNs) have shown superior performance in Link Prediction (LP). Especially, SEAL and its successors address the LP problem by classifying the subgraphs extracted specifically for candidate links, gaining state-of-theart results. Nevertheless, we question whether these methods can effectively learn the information equivalent to link heuristics such as Common Neighbors, Katz index, etc. (we refer to such information as heuristic information in this work). We show that link heuristics and GNNs capture different information. Link heuristics usually collect pair-specific information by counting the involved neighbors or paths between two nodes in a candidate link, while GNNs learn node-wise representations through a neighborhood aggregation algorithm in which two nodes in the candidate link do not pay special attention to each other. Our further analysis shows that SEAL-type methods only use a GNN to model the pair-specific subgraphs and also cannot effectively capture heuristic information. To verify our analysis, a straightforward way is to compare the LP performance between existing methods and a model that learns heuristic information independently of the GNN learning. To this end, we present a simple yet light framework ComHG 1 by directly Combining the embeddings of link Heuristics and the representations produced by a GNN. Experiments on OGB LP benchmarks show that ComHG outperforms all top competitors by a large margin, empirically confirming our propositions. Our experimental study also indicates that the contributions of link heuristics and the GNN to LP are sensitive to the graph degree, where the former is powerful on sparse graphs while the latter becomes dominant on dense graphs.

1. INTRODUCTION

Link Prediction (LP), aiming at predicting the existence likelihood of a link between a pair of nodes in a graph, is a prominent task in graph-based data mining (Kumar et al., 2020) . It has a wide range of beneficial applications, such as recommender systems (Wu et al., 2021) , molecular interaction prediction (Huang et al., 2020) , and knowledge graph completion (Li et al., 2022) . Throughout the history of LP research, a number of link heuristics have been defined, such as Common Neighbors (CN), Katz index (Katz, 1953) , etc. A link heuristic usually describes a specific fact or hypothesis that gives the best interpretation to a statistical pattern in link observations (Martínez et al., 2016) . The effectiveness of many link heuristics has been confirmed in various real-world LP applications (Liben-Nowell & Kleinberg, 2007; Zhou et al., 2009; Martínez et al., 2016) . Recently, graph representation learning has been proven powerful for LP (Perozzi et al., 2014; Zhang & Chen, 2018; Yun et al., 2021) . Among the approaches in this domain, Graph Neural Networks (GNNs) have demonstrated stronger LP performance than others like node embedding methods based on positional encoding (Perozzi et al., 2014; Galkin et al., 2021) . Modern prevalent GNNs like GCN (Kipf & Welling, 2017) , GAT (Veličković et al., 2018) , etc. follow a form of neighborhood information aggregation algorithm in which each node's representation is updated by aggregating the representations of this node and its neighbors. In this paper, we use the term GNNs to refer to such aggregation-based GNNs.



Our code is available at https://github.com/astroming/ComHG 1

