ARE GRAPH CONVOLUTIONAL NETWORKS FULLY EXPLOITING THE GRAPH STRUCTURE? Anonymous authors Paper under double-blind review

Abstract

Graph Convolutional Networks (GCNs) represent the state-of-the-art for many graph related tasks. At every layer, GCNs rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GCNs is their inability to infer long-range dependencies. In fact, as the number of layers increases, information gets smoothed and node embeddings become indistinguishable, negatively affecting performance. In this paper we formalize four levels of injection of graph structural information, and use them to analyze the importance of long-range dependencies. We then propose a novel regularization technique based on random walks with restart, called RWRReg, which encourages the network to encode long-range information into node embeddings. RWRReg does not require additional operations at inference time, is model-agnostic, and is further supported by our theoretical analysis connecting it to the Weisfeiler-Leman algorithm. Our experimental analysis, on both transductive and inductive tasks, shows that the lack of long-range structural information greatly affects the performance of state-ofthe-art models, and that the long-range information exploited by RWRReg leads to an average accuracy improvement of more than 5% on all considered tasks.

1. INTRODUCTION

Graphs are a ubiquitous data representation of many real world phenomena, with applications ranging from social networks, to chemistry, biology, and recommendation systems (Zhou et al., 2018) . Graph Neural Networks (GNNs) are the generalization of deep learning for graph structured data, and have received a huge amount of attention from the research community. One class of GNN models, the Graph Convolutional Network (GCN), has demonstrated to be extremely effective and is the current state-of-the-art for tasks such as graph classification, node classification, and link prediction. GCNs adopt a message passing mechanism where at each layer every node in the graph receives a message (e.g. a feature vector) from its 1-hop neighbours. The massages are then aggregated with a permutation invariant function (e.g. by mean or sum) and are used to update the node's representation vector with a learnable, possibly non-linear, transformation. The final node embedding vectors are used to make predictions, and the whole process is trained end-to-end. Empirically, the best results are obtained when the message passing procedure is repeated 2 or 3 times, as a higher number of layers leads to over-smoothing (Li et al., 2018; Xu et al., 2018b) . Thus, GCNs are only leveraging the graph structure in the form of the 2-hop or 3-hop neighbourhood of each node. A direct consequence of this phenomenon is that GCNs are not capable of extracting and exploiting long-range dependencies between nodes. Random walks with restart (Page et al., 1998) have proven to be very effective at quantifying how closely related two nodes are (Tong et al., 2006) , regardless of their distance in the graph. In fact random walks with restart can capture the global structure of a graph, and have been used for many tasks including ranking, link prediction, and community detection (Jin et al., 2019) . On the other hand, random walks with restart do not consider node features, which are instead heavily exploited by GCNs. Combining GCNs and random walks with restart could then provide a powerful method to fully exploit the information contained in a graph. In this work we are not interested in defining new state-of-the-art results, or proposing novel GNN models. We focus on studying the impact of long-range dependencies, and identifying a first strategy, which can easily be applied to any existing model, to incorporate this information. Our Contribution. In more detail, we assess whether the injection of information on the graph structure that can not be captured by 2 or 3-hop neighbourhoods has a significant impact on the performance of several state-of-the-art GCN models. In this regard, our contributions are fourfold. Firstly, we propose and formalize four different levels of structural information injection. Secondly, we propose a novel and practical regularization strategy, Random Walk with Restart Regularization (RWRReg), to inject structural information using random walks with restart, allowing GCNs to leverage long-range dependencies. RWRReg does not require additional operations at inference time, maintains the permutation-invariance of GCN models, and leads to an average 5% increase in accuracy on both node classification, and graph classification. Thirdly, we prove a theoretical result linking random walks with restart and the Weisfeiler-Leman algorithm, providing a theoretical foundation for their use in GCNs. Fourthly, we test how the injection of structural information can impact the performance of 6 different GCN models on node classification, graph classification, and on the task of triangle counting. Results show that current state-of-the-art models lack the ability to extract long-range information, and this is severely affecting their performance.

2. INJECTING LONG-RANGE INFORMATION IN GCNS

To test if GCNs are missing on important information that is encoded in the structure of a graph, we inject additional structural information into existing GCN models, and test how the performance of these models changes in several graph related tasks. Intuitively, based on a model's performance when injected with different levels of structural information, we can understand how much information is not captured by GCNs, and if this additional knowledge can improve performance on the considered tasks. In the rest of this section we present the notation used throughout the paper, the four levels of structural information injection that we consider, and an analytical result proving the effectiveness of using information from random walks with restart.

2.1. PRELIMINARIES

We use uppercase bold letters for matrices (M ), and lowercase bold letters for vectors (v). We use plain letters with subscript indices to refer to a specific element of a matrix (M i,j ), or of a vector (v i ). We refer to the vector containing the i-th row of a matrix with the subscript "i, :" (M i,: ), while we refer to the i-th column with the subscript ":, i" (M :,i ). For a graph G = (V, E), where V = {1, .., n} is the set of nodes and E ⊆ V × V is the set of edges, the input is given by a tuple (X, A). X is an n × d matrix where the i-th row contains the d-dimensional feature vector of the i-th node, and A is the n × n adjacency matrix. For the sake of clarity we restrict our presentation to undirected graphs, but similar concepts can be applied to directed graphs.

2.2. STRUCTURAL INFORMATION INJECTION

We consider four different levels of structural information injection, briefly described below. We remark that not all the injection strategies presented in this section are made for practical use, as the scope of these strategies is to help us understand the importance of missing structural information. In particular, in Section 4 we study the impact of the different types of structural information injection, and hence quantify the information that is not exploited by current GCN models. We then discuss scalability and practicality aspects in Section 5. Adjacency Matrix. We concatenate each node's adjacency matrix row to its feature vector. This explicitly empowers the GCN model with the connectivity of each node, and allows for higher level structural reasoning when considering a neighbourhood (the model will have access to the connectivity of the whole neighbourhood when aggregating messages from neighbouring nodes).

