GRAPH NEURAL NETWORKS FOR LINK PREDICTION WITH SUBGRAPH SKETCHING

Abstract

Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH.

1. INTRODUCTION

Link Prediction (LP) is an important problem in graph ML with many industrial applications. For example, recommender systems can be formulated as LP; link prediction is also a key process in drug discovery and knowledge graph construction. There are three main classes of LP methods: (i) heuristics (See Appendix C.1) that estimate the distance between two nodes (e.g. personalized page rank (PPR) (Page et al., 1999) or graph distance (Zhou et al., 2009) ) or the similarity of their neighborhoods (e.g Common Neighbors (CN), Adamic-Adar (AA) (Adamic & Adar, 2003) , or Resource Allocation (RA) (Zhou et al., 2009) ); (ii) unsupervised node embeddings or factorization methods, which encompass the majority of production recommendation systems (Koren et al., 2009; Chamberlain et al., 2020) ; and, recently, (iii) Graph Neural Networks, in particular of the Message-Passing type (MPNNs) (Gilmer et al., 2017; Kipf & Welling, 2017; Hamilton et al., 2017) . 1 GNNs excel in graph-and node-level tasks, but often fail to outperform node embeddings or heuristics on common LP benchmarks such as the Open Graph Benchmark (OGB) (Hu et al., 2020) . There are two related reasons why MPNNs tend to be poor link predictors. Firstly, due to the equivalence of message passing to the Weisfeiler-Leman (WL) graph isomorphism test (Xu et al., 2019; Morris et al., 2019) , standard MPNNs are provably incapable of counting trian-gles (Chen et al., 2020) and consequently of counting Common Neighbors or computing onehop or two-hop LP heuristics such as AA or RA. Secondly, GNN-based LP approaches combine permutation-equivariant structural node representations (obtained by message passing on the graph) and a readout function that maps from two node representations to a link probability. However, generating link representations as a function of equivariant node representations encounters the problem that all nodes u in the same orbit induced by the graph automorphism group have equal representations. Therefore, the link probability p(u, v) is the same for all u in the orbit independent of e.g. the graph distance d(u, v) (Figure 1 ). (Srinivasan & Ribeiro, 2019). et al., 2020; Zhang et al., 2021; You et al., 2021) . However, adding structural features amounts to computing structural node representations that are conditioned on an edge and so can no longer be efficiently computed in parallel. For the purpose of tractability, state-of-the-art methods for LP restrict computation to subgraphs enclosing a link, transforming link prediction into binary subgraph classification (Zhang et al., 2021; Zhang & Chen, 2018; Yin et al., 2022) . Subgraph GNNs (SGNN) are inspired by the strong performance of LP heuristics compared to more sophisticated techniques and are motivated as an attempt to learn data-driven LP heuristics. Despite impressive performance on benchmark datasets, SGNNs suffer from some serious limitations: (i) Constructing the subgraphs is expensive; (ii) Subgraphs are irregular and so batching them is inefficient on GPUs (iii); Each step of inference is almost as expensive as each training step because subgraphs must be constructed for every test link. These drawbacks preclude many applications, where scalability or efficient inference are required.

Main contributions. (i)

We analyze the relative contributions of SGNN components and reveal which properties of the subgraphs are salient to the LP problem. (ii) Based on our analysis, we develop an MPNN (ELPH) that passes subgraph sketches as messages. The sketches allow the most important qualities of the subgraphs to be summarized in the nodes. The resulting model removes the need for explicit subgraph construction and is a full-graph MPNN with the similar complexity to GCN. (iii) We prove that ELPH is strictly more expressive than MPNNs for LP and that it solves the automorphic node problem. (iv) As full-graph GNNs suffer from scalability issues when the data exceeds GPU memory, we develop BUDDY, a highly scalable model that precomputes sketches and node features. (v) We provide an open source Pytorch library for (sub)graph sketching that generates data sketches via message passing on the GPU. Experimental evaluation shows that our methods compares favorably to state-of-the-art both in terms of accuracy and speed.

2. PRELIMINARIES

Notation. Let G = (V, E) be an undirected graph comprising the set of n nodes (vertices) V and e links (edges) E. We denote by d(u, v) the geodesic distance (shortest walk length) between nodes



* Equal contribution. † Work done while at Twitter Inc. ‡ benjamin.chamberlain@gmail.com. 1 GNNs are a broader category than MPNNs. Since the majority of GNNs used in practice are of the message passing type, we will use the terms synonymously. More precisely, WL-equivalent, which is a necessary but insufficient condition for isomorphism.



We refer to this phenomenon as the automorphic node problem and define automorphic nodes (denoted u ∼ = v) to be those nodes that are indistinguishable by means of a given k-layer GNN. On the other hand, transductive node embedding methods such as TransE(Bordes et al., 2013)  and DeepWalk(Perozzi et al., 2014), or matrix factorization(Koren et al., 2009)  do not suffer from this problem as the embeddings are not permutation equivariant.

