THE SURPRISING POWER OF GRAPH NEURAL NET-WORKS WITH RANDOM NODE INITIALIZATION

Abstract

Graph neural networks (GNNs) are effective models for representation learning on graph-structured data. However, standard GNNs are limited in their expressive power, as they cannot distinguish graphs beyond the capability of the Weisfeiler-Leman (1-WL) graph isomorphism heuristic. This limitation motivated a large body of work, including higher-order GNNs, which are provably more powerful models. To date, higher-order invariant and equivariant networks are the only models with known universality results, but these results are practically hindered by prohibitive computational complexity. Thus, despite their limitations, standard GNNs are commonly used, due to their strong practical performance. In practice, GNNs have shown a promising performance when enhanced with random node initialization (RNI), where the idea is to train and run the models with randomized initial node features. In this paper, we analyze the expressive power of GNNs with RNI, and pose the following question: are GNNs with RNI more expressive than GNNs? We prove that this is indeed the case, by showing that GNNs with RNI are universal, a first such result for GNNs not relying on computationally demanding higher-order properties. We then empirically analyze the effect of RNI on GNNs, based on carefully constructed datasets. Our empirical findings support the superior performance of GNNs with RNI over standard GNNs. In fact, we demonstrate that the performance of GNNs with RNI is often comparable with or better than that of higher-order GNNs, while keeping the much lower memory requirements of standard GNNs. However, this improvement typically comes at the cost of slower model convergence. Somewhat surprisingly, we found that the convergence rate and the accuracy of the models can be improved by using only a partial random initialization regime.

1. INTRODUCTION

Graph neural networks (GNNs) (Scarselli et al., 2009; Gori et al., 2005) are neural architectures designed for learning functions over graph-structured data, and naturally encode desirable properties such as permutation invariance (resp., equivariance) relative to graph nodes, and node-level computation based on message passing between these nodes. These properties provide GNNs with a strong inductive bias, enabling them to effectively learn and combine both local and global graph features (Battaglia et al., 2018) . As a result, GNNs have been applied to a multitude of tasks, ranging from protein classification (Gilmer et al., 2017) and synthesis (You et al., 2018) , protein-protein interaction (Fout et al., 2017) , and social network analysis (Hamilton et al., 2017) , to recommender systems (Ying et al., 2018) and combinatorial optimization (Bengio et al., 2018; Selsam et al., 2019) . However, popular GNN architectures, primarily based on message passing (MPNNs), are limited in their expressive power. In particular, MPNNs are at most as powerful as the Weisfeiler-Leman (1-WL) graph isomorphism heuristic (Morris et al., 2019; Xu et al., 2019) , and thus cannot discern between several families of non-isomorphic graphs, e.g., sets of regular graphs (Cai et al., 1992) . To address this limitation, alternative GNN architectures with provably higher expressive power than MPNNs have been proposed. These models, which we refer to as higher-order GNNs, are inspired by the more powerful generalization of 1-WL to k-tuples of nodes, known as k-WL (Grohe, 2017). These models are the only GNNs with an established universality result, but these models are computationally very demanding. As a result, MPNNs, despite their limited expressiveness, remain the standard GNN model for graph learning applications.

