ON SIZE GENERALIZATION IN GRAPH NEURAL NET-WORKS

Abstract

Graph neural networks (GNNs) can process graphs of different sizes but their capacity to generalize across sizes is still not well understood. Size generalization is key to numerous GNN applications, from solving combinatorial optimization problems to learning in molecular biology. In such problems, obtaining labels and training on large graphs can be prohibitively expensive, but training on smaller graphs is possible. This paper puts forward the size-generalization question and characterizes important aspects of that problem theoretically and empirically. We prove that even for very simple tasks, such as counting the number of nodes or edges in a graph, GNNs do not naturally generalize to graphs of larger size. Instead, their generalization performance is closely related to the distribution of local patterns of connectivity and features and how that distribution changes from small to large graphs. Specifically, we prove that for many tasks, there are weight assignments for GNNs that can perfectly solve the task on small graphs but fail on large graphs, if there is a discrepancy between their local patterns. We further demonstrate on several tasks, that training GNNs on small graphs results in solutions which do not generalize to larger graphs. We then formalize size generalization as a domainadaption problem and describe two learning setups where size generalization can be improved. First, as a self-supervised learning problem (SSL) over the target domain of large graphs. Second as a semi-supervised learning problem when few samples are available in the target domain. We demonstrate the efficacy of these solutions on a diverse set of benchmark graph datasets.

1. INTRODUCTION

Graphs are a flexible representation, widely used for representing diverse data and phenomena. Graph neural networks (GNNs) -Deep models that operate over graphs -have emerged as a prominent learning model (Bruna et al., 2013; Kipf and Welling, 2016; Veličković et al., 2017) . They are used in natural sciences (Gilmer et al., 2017) , social network analysis (Fan et al., 2019) , for solving difficult mathematical problems (Luz et al., 2020) and for approximating solutions to combinatorial optimization problems (Li et al., 2018) . In many domains, graphs data vary significantly in size. This is the case for molecular biology, where molecules -represented as graphs over atoms as nodes -span from small compounds to proteins with many thousands of nodes. It is even more severe in social networks, which can reach billions of nodes. The success of GNNs for such data stems from the fact that the same GNN model can process input graphs regardless of their size. Indeed, it has been proposed that GNNs can generalize to graphs whose size is different from what they were trained on , but it is largely unknown in what problems such generalization occurs. Empirically, several papers report good generalization performance on specific tasks (Li et al., 2018; Luz et al., 2020) . Other papers, like Veličković et al. (2019) , show that size generalization can fail on several simple graph algorithms, and can be improved by using task-specific training procedures and specific architectures. Given their flexibility to operate on variable-sized graphs, A fundamental question arises about generalization in GNNs: "When do GNNs trained on small graphs generalize to large graphs?" Aside from being an intriguing theoretical question, this problem has important practical implications. In many domains, it is hard to label large graphs. For instance, in combinatorial optimization

