A PAC-BAYESIAN APPROACH TO GENERALIZATION BOUNDS FOR GRAPH NEURAL NETWORKS

Abstract

In this paper, we derive generalization bounds for two primary classes of graph neural networks (GNNs), namely graph convolutional networks (GCNs) and message passing GNNs (MPGNNs), via a PAC-Bayesian approach. Our result reveals that the maximum node degree and the spectral norm of the weights govern the generalization bounds of both models. We also show that our bound for GCNs is a natural generalization of the results developed in (Neyshabur et al., 2017) for fully-connected and convolutional neural networks. For MPGNNs, our PAC-Bayes bound improves over the Rademacher complexity based bound (Garg et al., 2020), showing a tighter dependency on the maximum node degree and the maximum hidden dimension. The key ingredients of our proofs are a perturbation analysis of GNNs and the generalization of PAC-Bayes analysis to non-homogeneous GNNs. We perform an empirical study on several synthetic and real-world graph datasets and verify that our PAC-Bayes bound is tighter than others.

1. INTRODUCTION

Graph neural networks (GNNs) (Gori et al., 2005; Scarselli et al., 2008; Bronstein et al., 2017; Battaglia et al., 2018) have become very popular recently due to their ability to learn powerful representations from graph-structured data, and have achieved state-of-the-art results in a variety of application domains such as social networks (Hamilton et al., 2017; Xu et al., 2018) , quantum chemistry (Gilmer et al., 2017; Chen et al., 2019a ), computer vision (Qi et al., 2017; Monti et al., 2017) , reinforcement learning (Sanchez-Gonzalez et al., 2018; Wang et al., 2018 ), robotics (Casas et al., 2019; Liang et al., 2020), and physics (Henrion et al., 2017) . Given a graph along with node/edge features, GNNs learn node/edge representations by propagating information on the graph via local computations shared across the nodes/edges. Based on the specific form of local computation employed, GNNs can be divided into two categories: graph convolution based GNNs (Bruna et al., 2013; Duvenaud et al., 2015; Kipf & Welling, 2016) and message passing based GNNs (Li et al., 2015; Dai et al., 2016; Gilmer et al., 2017) . The former generalizes the convolution operator from regular graphs (e.g., grids) to ones with arbitrary topology, whereas the latter mimics message passing algorithms and parameterizes the shared functions via neural networks. Due to the tremendous empirical success of GNNs, there is increasing interest in understanding their theoretical properties. For example, some recent works study their expressiveness (Maron et al., 2018; Xu et al., 2018; Chen et al., 2019b) , that is, what class of functions can be represented by GNNs. However, only few works investigate why GNNs generalize so well to unseen graphs. They are either restricted to a specific model variant (Verma & Zhang, 2019; Du et al., 2019; Garg et al., 2020) or have loose dependencies on graph statistics (Scarselli et al., 2018) . On the other hand, GNNs have close ties to standard feedforward neural networks, e.g., multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). In particular, if each i.i.d. sample is viewed as a node, then the whole dataset becomes a graph without edges. Therefore, GNNs can be seen as generalizations of MLPs/CNNs since they model not only the regularities within a sample but also the dependencies among samples as defined in the graph. It is therefore natural to ask if we can generalize the recent advancements on generalization bounds for MLPs/CNNs (Harvey et al., 2017; Neyshabur et al., 2017; Bartlett et al., 2017; Dziugaite & Roy, 2017; Arora et al., 2018; 2019) to GNNs, and how would graph structures affect the generalization bounds?

