IN-DISTRIBUTION AND OUT-OF-DISTRIBUTION GENERALIZATION FOR GRAPH NEURAL NETWORKS

Abstract

Graph neural networks (GNNs) are models that allow learning with structured data of varying size. Despite their popularity, theoretical understanding of the generalization of GNNs is an under-explored topic. In this work, we expand the theoretical understanding of both in-distribution and out-of-distribution generalization of GNNs. Firstly, we improve upon the state-of-the-art PAC-Bayes (in-distribution) generalization bound primarily by reducing an exponential dependency on the node degree to a linear dependency. Secondly, utilizing tools from spectral graph theory, we prove some rigorous guarantees about the out-of-distribution (OOD) size generalization of GNNs, where graphs in the training set have different numbers of nodes and edges from those in the test set. To empirically verify our theoretical findings, we conduct experiments on both synthetic and real-world graph datasets. Our computed generalization gaps for the in-distribution case significantly improve the state-of-the-art PAC-Bayes results. For the OOD case, experiments on community classification tasks in large social networks show that GNNs achieve strong size generalization performance in cases guaranteed by our theory.

1. INTRODUCTION

Graph neural networks (GNNs), firstly proposed in Scarselli et al. (2008) , generalize artificial neural networks from processing fixed-size data to processing arbitrary graph-structured or relational data, which can vary in terms of the number of nodes, the number of edges, and so on. GNNs and their modern variants (Bronstein et al., 2017; Battaglia et al., 2018) have achieved state-of-the-art results in a wide range of application domains, including social networks (Hamilton et al., 2017) , material sciences (Xie & Grossman, 2018), drug discovery (Wieder et al., 2020 ), autonomous driving (Liang et al., 2020 ), quantum chemistry (Gilmer et al., 2020) , and particle physics (Shlomi et al., 2020) . Despite their empirical successes, the theoretical understanding of GNNs are somewhat limited. Existing works largely focus on analyzing the expressiveness of GNNs. In particular, Xu et al. (2018) show that GNNs are as powerful as the Weisfeiler-Lehman (WL) graph isomorphism test (Weisfeiler & Leman, 1968 ) in distinguishing graphs. Chen et al. (2019) further demonstrate an equivalence between graph isomorphism testing and universal approximation of permutation-invariant functions. Loukas (2019) show that GNNs with certain conditions (e.g., on depth and width) are Turing universal. Chen et al. ( 2020) and Xu et al. (2020a) respectively examine whether GNNs can count substructures and perform algorithmic reasoning. In the vein of statistical learning theory, generalization analyses for GNNs have been developed to bound the gap between training and testing errors using VC-dimension (Vapnik & Chervonenkis, 1971 ), Rademacher complexity (Bartlett & Mendelson, 2002) , algorithmic stability (Bousquet & Elisseeff, 2002), and PAC-Bayes (McAllester, 2003) (a Bayesian extension of PAC learning (Valiant, 1984) ). Depending on whether the problem setup is in-distribution (ID) or out-of-distribution (OOD), i.e., whether test data comes from the same distribution as training data, we categorize the literature into two groups.

