TWO-DIMENSIONAL WEISFEILER-LEHMAN GRAPH NEURAL NETWORKS FOR LINK PREDICTION

Abstract

Link prediction is one important application of graph neural networks (GNNs). Most existing GNNs for link prediction are based on one-dimensional Weisfeiler-Lehman (1-WL) test. As pointed out by previous works, 1-WL-GNNs by nature learn node-level representations thereby have poor expressive power on links. Some node labeling methods relieve this weakness but introduce low efficiency. In this paper, we study a completely different approach which directly obtain node pair (link) representations based on two-dimensional Weisfeiler-Lehman (2-WL) tests. 2-WL tests directly use links (2-tuples) as message passing units instead of nodes, and thus can directly obtain link representations. We theoretically analyze the expressive power of 2-WL tests to discriminate non-isomorphic links, and prove their superior link discriminating power than 1-WL. Based on different 2-WL variants, we propose a series of novel 2-WL-GNN models for link prediction. Experiments on a wide range of real-world datasets demonstrate their competitive performance to state-of-the-art baselines.

1. INTRODUCTION

Link prediction is a key problem of graph-structured data (Al Hasan et al., 2006; Liben-Nowell & Kleinberg, 2007; Menon & Elkan, 2011; Trouillon et al., 2016) . It refers to utilizing node characteristics and graph topology to measure how likely a link exists between a pair of nodes. Due to the importance of predicting pairwise relations, it has wide applications in various domains, such as recommendation in social networks (Adamic & Adar, 2003) , knowledge graph completion (Nickel et al., 2015) , and metabolic network reconstruction (Oyetunde et al., 2017) . One class of traditional link prediction methods are heuristic methods, which use manually designed graph structural features of a target node pair such as number of common neighbors (CN) (Liben-Nowell & Kleinberg, 2007) , preferential attachment (PA) (Barabási & Albert, 1999) , and resource allocation (RA) (Zhou et al., 2009) to estimate the likelihood of link existence. Another class of methods, embedding methods, including Matrix Factorization (MF) (Menon & Elkan, 2011) and node2vec (Grover & Leskovec, 2016), learn node embeddings from the graph structure in a transductive manner, which cannot generalize to unseen nodes or new graphs. Recently, with the popularity of GNNs, their application to link prediction brings a number of cutting-edge models (Kipf & Welling, 2016; Zhang & Chen, 2018; Zhang et al., 2021; Zhu et al., 2021) . Most existing GNN models for link prediction are based on one-dimensional Weisfeiler-Lehman (1-WL) test (Weisfeiler & Leman, 1968; Shervashidze et al., 2011) . 1-WL test is a popular heuristic for detecting non-isomorphic graphs. In each update, it obtains all nodes' new colors by hashing their own colors and multisets of their neighbors' colors. Vanilla GNNs simulate 1-WL test by iteratively aggregating neighboring node features to the center node to update node representations, which we call 1-WL-GNNs. With the node representations, 1-WL-GNNs compute link prediction scores by aggregating pairwise node representations. Graph Auto-encoder (GAE, and its variant VGAE) (Kipf & Welling, 2016 ) is such a model. However, 1-WL-GNNs can only discriminate links on the "node" level. This is illustrated by Figure 1 left: v 2 and v 3 are symmetric nodes in the graph thus having the same representation by 1-WL-GNN, but links (v 1 , v 2 ) and (v 1 , v 3 ) are not symmetric. However, 1-WL-GNNs are unable to discriminate links (v 1 , v 2 ) and (v 1 , v 3 ), though (v 1 , v 2 ) has a shorter path between them than (v 1 , v 3 ). Although positional node embeddings or random features can alleviate

