LINK PREDICTION WITHOUT GRAPH NEURAL NET-WORKS

Abstract

Link prediction, which consists of predicting edges based on graph features, is a fundamental task in many graph applications. As for several related problems, Graph Neural Networks (GNNs), which are based on an attribute-centric messagepassing paradigm, have become the predominant framework for link prediction. GNNs have consistently outperformed traditional topology-based heuristics, but what contributes to their performance? Are there simpler approaches that achieve comparable or better results? To answer these questions, we first identify important limitations in how GNN-based link prediction methods handle the intrinsic class imbalance of the problem-due to the graph sparsity-in their training and evaluation. Moreover, we propose Gelato, a novel topology-centric framework that applies a topological heuristic to a graph enhanced by attribute information via graph learning. Our model is trained end-to-end with an N-pair loss on an unbiased training set to address class imbalance. Experiments show that Gelato is 145% more accurate, trains 11 times faster, infers 6,000 times faster, and has less than half of the trainable parameters compared to state-of-the-art GNNs for link prediction.

1. INTRODUCTION

Machine learning on graphs supports various structured-data applications including social network analysis (Tang et al., 2008; Li et al., 2017; Qiu et al., 2018b) , recommender systems (Jamali & Ester, 2009; Monti et al., 2017; Wang et al., 2019a) , natural language processing (Sun et al., 2018a; Sahu et al., 2019; Yao et al., 2019) , and physics modeling (Sanchez-Gonzalez et al., 2018; Ivanovic & Pavone, 2019; da Silva et al., 2020) . Among the graph-related tasks, one could argue that link prediction (Lü & Zhou, 2011; Martínez et al., 2016) is the most fundamental one. This is because link prediction not only has many concrete applications (Qi et al., 2006; Liben-Nowell & Kleinberg, 2007; Koren et al., 2009) but can also be considered an (implicit or explicit) step of the graph-based machine learning pipeline (Martin et al., 2016; Bahulkar et al., 2018; Wilder et al., 2019 )-as the observed graph is usually noisy and/or incomplete. In recent years, Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Hamilton et al., 2017; Veličković et al., 2018) have emerged as the predominant paradigm for machine learning on graphs. Similar to their great success in node classification (Klicpera et al., 2018; Wu et al., 2019; Zheng et al., 2020) and graph classification (Ying et al., 2018; Zhang et al., 2018a; Morris et al., 2019) , GNNs have been shown to achieve state-of-the-art link prediction performance (Zhang & Chen, 2018; Yun et al., 2021; Pan et al., 2022) . Compared to classical approaches that rely on expert-designed heuristics to extract topological information (e.g., Common Neighbors (Newman, 2001) , Adamic-Adar (Adamic & Adar, 2003 ), Preferential Attachment (Barabási et al., 2002) ), GNNs have the potential to discover new heuristics via supervised learning and the natural advantage of incorporating node attributes. However, there is little understanding of what factors contribute to the success of GNNs in link prediction, and whether simpler alternatives can achieve comparable performance-as recently found for node classification (Huang et al., 2020) . GNN-based methods approach link prediction as a binary classification problem. Yet different from other classification problems, link prediction deals with extremely class-imbalanced data due to the sparsity of real-world graphs. We argue that class imbalance should be accounted for in both training and evaluation of link prediction. In addition, GNNs combine topological and attribute information by learning topology-smoothened attributes (embeddings) via message-passing (Li et al., 2018) . This attribute-centric mechanism has been proven

