REVISITING GRAPH NEURAL NETWORKS FOR LINK PREDICTION

Abstract

Graph neural networks (GNNs) have achieved great success in recent years. Three most common applications include node classification, link prediction, and graph classification. While there is rich literature on node classification and graph classification, GNNs for link prediction is relatively less studied and less understood. Two representative classes of methods exist: GAE and SEAL. GAE (Graph Autoencoder) first uses a GNN to learn node embeddings for all nodes, and then aggregates the embeddings of the source and target nodes as their link representation. SEAL extracts a subgraph around the source and target nodes, labels the nodes in the subgraph, and then uses a GNN to learn a link representation from the labeled subgraph. In this paper, we thoroughly discuss the differences between these two classes of methods, and conclude that simply aggregating node embeddings does not lead to effective link representations, while learning from properly labeled subgraphs around links provides highly expressive and generalizable link representations. Experiments on the recent large-scale OGB link prediction datasets show that SEAL has up to 195% performance gains over GAE methods, achieving new state-of-the-art results on 3 out of 4 datasets.

1. INTRODUCTION

Link prediction is to predict potential or missing links connecting pairwise nodes in a network. It has wide applications in various fields, such as friend recommendation in social networks (Adamic & Adar, 2003) , movie recommendation in Netflix (Bennett et al., 2007) , protein-protein interaction prediction (Qi et al., 2006) , and knowledge graph completion (Nickel et al., 2015) , etc. Traditional link prediction approaches include heuristic methods, embedding methods, and featurebased methods. Heuristic methods compute some heuristic node similarity scores as the likelihood of links (Liben-Nowell & Kleinberg, 2007) , such as common neighbors, preferential attachment (Barabási & Albert, 1999), and Katz index (Katz, 1953) , which can be regarded as some predefined graph structure features. Embedding methods, including matrix factorization (MF) and Node2vec (Grover & Leskovec, 2016) , learn free-parameter node embeddings from the observed network transductively, thus do not generalize to unseen nodes and networks. Feature-based methods only use explicit node features yet do not consider the graph structure. Recently, graph neural networks (GNNs) emerged to be powerful tools for learning over graph-structured data (Scarselli et al., 2009; Bruna et al., 2013; Duvenaud et al., 2015; Li et al., 2015; Kipf & Welling, 2016a; Niepert et al., 2016; Dai et al., 2016) , and have been successfully used in link prediction as well (Kipf & Welling, 2016b; Zhang & Chen, 2018; You et al., 2019; Chami et al., 2019; Li et al., 2020) . There are two main types of GNN-based link prediction methods. One is Graph Autoencoder (Kipf & Welling, 2016b) , where a GNN is first applied to the entire network to learn an embedding vector for each node. Then the embeddings of the source and target nodes are aggregated to predict the target link. The second type is SEAL (Zhang & Chen, 2018; Li et al., 2020) , where an enclosing subgraph is extracted around each target link. Then the nodes in each enclosing subgraph are labeled differently according to their distances to the source and target nodes. Finally a GNN is applied to each enclosing subgraph to learn a link representation for link prediction. At first glance, both methods seem to learn graph structure features associated with the target link, and leverage these structure features for link prediction. However, as we will see, the two methods have fundamentally different power in terms of learning the structural representations of links. We first show that by individually learning source and target node embeddings, GAE methods cannot differentiate links with different structural roles. To intuitively understand this, we give an example in Figure 1 . In this graph, nodes v 2 and v 3 have the same structural roles (symmetric/isomorphic to each other). A GAE will learn the same node embeddings for v 2 and v 3 , thus giving the same predicted probabilities for link (v 1 , v 2 ) and link (v 1 , v 3 ). However, the structural roles of link (v 1 , v 2 ) and link (v 1 , v 3 ) are apparently different -v 1 intuitively should have unequal probabilities connecting to v 2 and v 3 . Next, we propose a labeling trick, which gives a label to each node as its additional feature, where the source and target nodes are labeled differently from the rest. We show that combined with the labeling trick, a sufficiently expressive GNN can learn the same representations for two links if and only if their structural roles are the same within the graph. This way, (v 1 , v 2 ) and (v 1 , v 3 ) will be predicted differently in Figure 1 . We further show that SEAL is such an example. Finally, we give a more practical definition of isomorphism, called local isomorphism, which defines two nodes/links as isomorphic if their local neighborhood subgraphs are isomorphic. We argue that GNNs for link prediction should target on local-isomorphism-discriminating. We conduct a thorough comparison among different link prediction methods, including SEAL and various GAE and embedding methods, on the recent large-scale Open Graph Benchmark (OGB) datasets (Hu et al., 2020) . We show that SEAL with the labeling trick has up to 195% higher Hits@100 than GAE methods, achieving new state-of-the-art results on 3 out of 4 datasets.

2. PRELIMINARIES

In this section, we formally define the notions of graph, permutation, isomorphism, and GNN. Definition 1. (Graph). We consider an undirected graph G = (V, E, A), where V = {1, 2, . . . , n} is the set of n vertices, E ⊆ V × V is the set of edges, and A ∈ R n×n×k contains the node and edge features with its diagonal components A i,i,: denoting node attributes and off-diagonal components A i,j,: denoting edge attributes. We further use A ∈ {0, 1} n×n to denote the adjacency matrix of G with A i,j = 1 iff (i, j) ∈ E. If there are no node/edge features, we let A = A. Otherwise, A can be regarded as the first slice of A, i.e., A = A :,:,1 . Definition 2. (Permutation) A node permutation π is a bijective mapping from {1, 2, . . . , n} to {1, 2, . . . , n}. All n! possible π's constitute the permutation group Π n . We define π(S) = {π(i)|i ∈ S} when S is a subset of {1, 2, . . . , n}. We further define the permutation of A as π(A), where π(A) π(i),π(j),: = A i,j,: . In other words, π(A) i,j,: = A π -1 (i),π -1 (j),: . Definition 3. (Set isomorphism) Given two n-node graphs G = (V, E, A), G = (V , E , A ), and two node sets S ⊆ V , S ⊆ V , we say (S, A) and (S , A ) are isomorphic (denoted by (S, A) (S , A )) if ∃π ∈ Π n such that S = π(S ) and A = π(A ).

When (V, A)

(V , A ), we say two graphs G and G are isomorphic (abbreviated as A A because V = π(V ) for any π). Note that set isomorphism is more strict than graph isomorphism, because it not only requires graph isomorphism, but also requires the permutation maps a specific subset S to another subset S . When S ⊂ V and S ⊂ V , we are often more concerned with the case of A = A , where we are to find isomorphic node sets in the same graph (automorphism). For example, when S = {i}, S = {j} (single node case) and (i, A), (j, A) are isomorphic, it means i and j are on the same orbit of graph A (i.e., they have symmetric positions/same structural roles within the graph). An example is v 2 and v 3 in Figure 1 . Definition 4. (Invariant function) A function f defined over the space of (S, A) is invariant if ∀π ∈ Π n , f (S, A) = f (π(S), π(A)). Definition 5. (GNN) A GNN is an invariant function mapping from the space of (S, A) to R d . More specifically, a GNN first performs multiple invariant message passing operations to compute a node embedding z i = GNN(i, A) for all i ∈ S, and then performs a set aggregation (pooling) over {z i |i ∈ S}, written as AGG({z i |i ∈ S}), as the set S's representation GNN(S, A). Note that, when |S| = 1, the set aggregation is often an identity mapping. In graph classification (S = V ), we use a graph pooling layer over node embeddings to compute the graph representation.



Figure 1: The structural roles of link (v1, v2) and link (v1, v3) are different, but GAE will assign equal probabilities to them.

