ARE GRAPH CONVOLUTIONAL NETWORKS FULLY EXPLOITING THE GRAPH STRUCTURE? Anonymous authors Paper under double-blind review

Abstract

Graph Convolutional Networks (GCNs) represent the state-of-the-art for many graph related tasks. At every layer, GCNs rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GCNs is their inability to infer long-range dependencies. In fact, as the number of layers increases, information gets smoothed and node embeddings become indistinguishable, negatively affecting performance. In this paper we formalize four levels of injection of graph structural information, and use them to analyze the importance of long-range dependencies. We then propose a novel regularization technique based on random walks with restart, called RWRReg, which encourages the network to encode long-range information into node embeddings. RWRReg does not require additional operations at inference time, is model-agnostic, and is further supported by our theoretical analysis connecting it to the Weisfeiler-Leman algorithm. Our experimental analysis, on both transductive and inductive tasks, shows that the lack of long-range structural information greatly affects the performance of state-ofthe-art models, and that the long-range information exploited by RWRReg leads to an average accuracy improvement of more than 5% on all considered tasks.

1. INTRODUCTION

Graphs are a ubiquitous data representation of many real world phenomena, with applications ranging from social networks, to chemistry, biology, and recommendation systems (Zhou et al., 2018) . Graph Neural Networks (GNNs) are the generalization of deep learning for graph structured data, and have received a huge amount of attention from the research community. One class of GNN models, the Graph Convolutional Network (GCN), has demonstrated to be extremely effective and is the current state-of-the-art for tasks such as graph classification, node classification, and link prediction. GCNs adopt a message passing mechanism where at each layer every node in the graph receives a message (e.g. a feature vector) from its 1-hop neighbours. The massages are then aggregated with a permutation invariant function (e.g. by mean or sum) and are used to update the node's representation vector with a learnable, possibly non-linear, transformation. The final node embedding vectors are used to make predictions, and the whole process is trained end-to-end. Empirically, the best results are obtained when the message passing procedure is repeated 2 or 3 times, as a higher number of layers leads to over-smoothing (Li et al., 2018; Xu et al., 2018b) . Thus, GCNs are only leveraging the graph structure in the form of the 2-hop or 3-hop neighbourhood of each node. A direct consequence of this phenomenon is that GCNs are not capable of extracting and exploiting long-range dependencies between nodes. Random walks with restart (Page et al., 1998) have proven to be very effective at quantifying how closely related two nodes are (Tong et al., 2006) , regardless of their distance in the graph. In fact random walks with restart can capture the global structure of a graph, and have been used for many tasks including ranking, link prediction, and community detection (Jin et al., 2019) . On the other hand, random walks with restart do not consider node features, which are instead heavily exploited by GCNs. Combining GCNs and random walks with restart could then provide a powerful method to fully exploit the information contained in a graph.

