ON SIZE GENERALIZATION IN GRAPH NEURAL NET-WORKS

Abstract

Graph neural networks (GNNs) can process graphs of different sizes but their capacity to generalize across sizes is still not well understood. Size generalization is key to numerous GNN applications, from solving combinatorial optimization problems to learning in molecular biology. In such problems, obtaining labels and training on large graphs can be prohibitively expensive, but training on smaller graphs is possible. This paper puts forward the size-generalization question and characterizes important aspects of that problem theoretically and empirically. We prove that even for very simple tasks, such as counting the number of nodes or edges in a graph, GNNs do not naturally generalize to graphs of larger size. Instead, their generalization performance is closely related to the distribution of local patterns of connectivity and features and how that distribution changes from small to large graphs. Specifically, we prove that for many tasks, there are weight assignments for GNNs that can perfectly solve the task on small graphs but fail on large graphs, if there is a discrepancy between their local patterns. We further demonstrate on several tasks, that training GNNs on small graphs results in solutions which do not generalize to larger graphs. We then formalize size generalization as a domainadaption problem and describe two learning setups where size generalization can be improved. First, as a self-supervised learning problem (SSL) over the target domain of large graphs. Second as a semi-supervised learning problem when few samples are available in the target domain. We demonstrate the efficacy of these solutions on a diverse set of benchmark graph datasets.

1. INTRODUCTION

Graphs are a flexible representation, widely used for representing diverse data and phenomena. Graph neural networks (GNNs) -Deep models that operate over graphs -have emerged as a prominent learning model (Bruna et al., 2013; Kipf and Welling, 2016; Veličković et al., 2017) . They are used in natural sciences (Gilmer et al., 2017) , social network analysis (Fan et al., 2019) , for solving difficult mathematical problems (Luz et al., 2020) and for approximating solutions to combinatorial optimization problems (Li et al., 2018) . In many domains, graphs data vary significantly in size. This is the case for molecular biology, where molecules -represented as graphs over atoms as nodes -span from small compounds to proteins with many thousands of nodes. It is even more severe in social networks, which can reach billions of nodes. The success of GNNs for such data stems from the fact that the same GNN model can process input graphs regardless of their size. Indeed, it has been proposed that GNNs can generalize to graphs whose size is different from what they were trained on , but it is largely unknown in what problems such generalization occurs. Empirically, several papers report good generalization performance on specific tasks (Li et al., 2018; Luz et al., 2020) . Other papers, like Veličković et al. (2019) , show that size generalization can fail on several simple graph algorithms, and can be improved by using task-specific training procedures and specific architectures. Given their flexibility to operate on variable-sized graphs, A fundamental question arises about generalization in GNNs: "When do GNNs trained on small graphs generalize to large graphs?" Aside from being an intriguing theoretical question, this problem has important practical implications. In many domains, it is hard to label large graphs. For instance, in combinatorial optimization problems, labeling a large graph boils down to solving large and hard optimization problems. In other domains, it is often very hard for human raters to correctly label complex networks. One approach to this problem could have been to resize graphs into a homogeneous size. This is the strategy taken in computer vision, where it is well understood how to resize an image while keeping its content. Unfortunately, there are no effective resizing procedures for graphs. It would therefore be extremely valuable to develop techniques that can generalize from training on small graphs. As we discuss below, a theoretical analysis of size generalization is very challenging because it depends on several different factors, including the task, the architecture, and the data. For tasks, we argue that it is important to distinguish two types of tasks, local and global. Local tasks can be solved by GNNs whose depth does not depend on the size of the input graph. For example, the task of finding a constant-size pattern. Global tasks require that the depth of the GNN grows with the size of the input graph. For example, calculating the diameter of a graph. While there are a few previous works that explore depth-dependant GNNs (e.g., Tang et al. (2020) , constant depth GNNs are by far the most widely used GNN models today and are therefore the focus of this paper. In this paper, we focus on GNNs with constant depth and study the ability of the most expressive message passing neural networks (Xu et al., 2018; Morris et al., 2019) to generalize to unseen sizes. Our key observation is that generalization to graphs of different sizes is strongly related to the distribution of patterns around nodes in the graphs of interest. These patterns, dubbed d-patterns (where d is the radius of the local neighborhood), describe the local feature-connectivity structure around each node, as seen by message-passing neural networks and are defined in Section 3. We study the role of d-patterns both empirically and theoretically. First, we theoretically show that when there is a significant discrepancy between the d-pattern distributions, GNNs have multiple global minima for graphs of a specific size range, out of which only a subset of models can generalize well to larger graphs. We complement our theoretical analysis with an experimental study and show that GNNs tend to converge to non-generalizing global minima, when d-patterns from the large graph distribution are not well-represented in the small graph distribution. Furthermore we demonstrate that the size generalization problem is accentuated in deeper GNNs. Following these observations, in the final part of this paper, we discuss two learning setups that help improve size-generalization by formulating the learning problem as a domain adaptation problem: (1) Training the GNNs on self-supervised tasks aimed at learning the d-pattern distribution of both the target (large graphs) and source (small graphs) domains. We also propose a novel SSL task that addresses over-fitting of d-patterns. (2) A semi-supervised learning setup with a limited number of labeled examples from the target domain. The idea behind both setups is to promote convergence of GNNs to local/global minima with good size generalization properties. We show that both setups are useful in a series of experiments on synthetic and real data. To summarize, this paper makes the following contributions. (1) We identify a size generalization problem when learning local tasks with GNNs and analyze it empirically and theoretically. (2) We link the size-generalization problem with the distribution of d-patterns and suggest to approach it as a domain adaptation problem (3) We empirically show how several learning setups help improve size generalization.

2. RELATED WORK

Size generalization in set and graph learning. Several papers observed successful generalization across graph sizes, but the underlying reasons were not investigated (Li et al., 2018; Maron et al., 2018; Luz et al., 2020) . More recently, (Veličković et al., 2019) showed that when training GNNs to perform simple graph algorithms step by step they generalize better to graphs of different sizes. Unfortunately, such training procedures cannot be easily applied to general tasks. Knyazev et al. (2019) studied the relationship between generalization and attention mechanisms. Tang et al. (2020) observed two issues that can harm generalization: (1) There are tasks for which a constant number of layers is not sufficient. (2) Some graph learning tasks are homogeneous functions. They then suggest a new GNN architecture to deal with these issues. Our work is complementary to these works as it explores another fundamental size generalization problem, focusing on constant depth GNNs. For more details on the distinction between constant depth and variable depth tasks see Appendix A. Several works also studied size generalization and expressivity when learning set-structured inputs (Zweig and Bruna, 2020; Bueno and Hylton, 2020) . On the more practical side, Joshi et al. ( 2019),

