IMPROVING GRAPH NEURAL NETWORK EXPRESSIV-ITY VIA SUBGRAPH ISOMORPHISM COUNTING

Abstract

While Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications, recent studies exposed important shortcomings in their ability to capture the structure of the underlying graph. It has been shown that the expressive power of standard GNNs is bounded by the Weisfeiler-Lehman (WL) graph isomorphism test, from which they inherit proven limitations such as the inability to detect and count graph substructures. On the other hand, there is significant empirical evidence, e.g. in network science and bioinformatics, that substructures are often informative for downstream tasks, suggesting that it is desirable to design GNNs capable of leveraging this important source of information. To this end, we propose a novel topologically-aware message passing scheme based on substructure encoding. We show that our architecture allows incorporating domain-specific inductive biases and that it is strictly more expressive than the WL test. Importantly, in contrast to recent works on the expressivity of GNNs, we do not attempt to adhere to the WL hierarchy; this allows us to retain multiple attractive properties of standard GNNs such as locality and linear network complexity, while being able to disambiguate even hard instances of graph isomorphism. We extensively evaluate our method on graph classification and regression tasks and show state-of-the-art results on multiple datasets including molecular graphs and social networks.

1. INTRODUCTION

The field of graph representation learning has undergone a rapid growth in the past few years. In particular, Graph Neural Networks (GNNs), a family of neural architectures designed for irregularly structured data, have been successfully applied to problems ranging from social networks and recommender systems (Ying et al., 2018a) to bioinformatics (Fout et al., 2017; Gainza et al., 2020) , chemistry (Duvenaud et al., 2015; Gilmer et al., 2017; Sanchez-Lengeling et al., 2019) and physics (Kipf et al., 2018; Battaglia et al., 2016) , to name a few. Most GNN architectures are based on message passing (Gilmer et al., 2017) , where at each layer the nodes are updated by information aggregated from their neighbours. A crucial difference from traditional neural networks operating on grid-structured data is the absence of canonical ordering of the nodes in a graph. To address this, the aggregation function is constructed to be invariant to neighbourhood permutations and, as a consequence, to graph isomorphism. This kind of symmetry is not always desirable and thus different inductive biases that disambiguate the neighbours have been proposed. For instance, in geometric graphs, such as 3D molecular graphs and meshes, directional biases are usually employed in order to model the positional information of the nodes (Masci et al., 2015; Monti et al., 2017; Bouritsas et al., 2019; Klicpera et al., 2020; de Haan et al., 2020b) ; for proteins, ordering information is used to disambiguate amino-acids at different positions in the sequence (Ingraham et al., 2019) ; in multi-relational knowledge graphs, a different aggregation is performed for each relation type (Schlichtkrull et al., 2018) . The structure of the graph itself does not usually explicitly take part in the aggregation function. In fact, most models rely on multiple message passing steps as a means for each node to discover the global structure of the graph. However, since message-passing GNNs are at most as powerful as the Weisfeiler Lehman test (WL) (Xu et al., 2019; Morris et al., 2019) , they are limited in their abilities to adequately exploit the graph structure, e.g. by counting substructures (Arvind et al., 2019; Chen et al., 2020) . This uncovers a crucial limitation of GNNs, as substructures have been widely recognised as important in the study of complex networks. For example, in molecular chemistry, functional groups and rings are related to a plethora of chemical properties, while cliques are related to protein complexes in Protein-Protein Interaction networks and community structure in social networks, respectively (Granovetter, 1982; Girvan & Newman, 2002) . Motivated by this observation, in this work we propose Graph Substructure Network (GSN), a new symmetry breaking mechanism for GNNs based on introducing structural biases in the aggregation function. In particular, each message is transformed differently depending on the topological relationship between the endpoint nodes. This relationship is expressed by counting the appearance of certain substructures, the choice of which allows us to provide the model with different inductive biases, based on the graph distribution at hand. The substructures are encoded as structural identifiers that are assigned to either the nodes or edges of the graph and can thus disambiguate the neighbouring nodes that take part in the aggregation. We characterise the expressivity of substructure encoding in GNNs, showing that GSNs are strictly more expressive than traditional GNNs for the vast majority of substructures while retaining the locality of message passing, as opposed to higher-order methods (Maron et al., 2019b; c; a; Morris et al., 2019) that follow the WL hierarchy (see Section 2). In the limit, our model can yield a unique representation for every isomorphism class and is thus universal. We provide an extensive experimental evaluation on hard instances of graph isomorphism testing (strongly regular graphs), as well as on real-world networks from the social and biological domains, including the recently introduced large-scale benchmarks (Dwivedi et al., 2020; Hu et al., 2020) . We observe that when choosing the structural inductive biases based on domain-specific knowledge, GSN achieves state-ofthe-art results.

2. PRELIMINARIES

Let G = (V G , E G ) be a graph with vertex set V G and undirected edge set E G . A subgraph G S = (V G S , E G S ) of G is any graph with V G S ⊆ V G , E G S ⊆ E G . When E G S includes all the edges of G with endpoints in V G S , i.e. E G S = {(v, u) ∈ E : v, u ∈ V G S }, the subgraph is said to be induced. Isomorphisms Two graphs G, H are isomorphic (denoted H G), if there exists an adjacencypreserving bijective mapping (isomorphism) f : V G → V H , i.e. (v, u) ∈ E G iff (f (v), f (u)) ∈ E H . Given some small graph H, the subgraph isomorphism problem amounts to finding a subgraph G S of G such that G S H. An automorphism of H is an isomorphism that maps H onto itself. The set of all the unique automorphisms form the automorphism group of the graph, denoted as Aut(H), contains all the possible symmetries of the graph. The automorphism group yields a partition of the vertices into disjoint subsets of V H called orbits. Intuitively, this concept allows us to group the vertices based on their structural roles, e.g. the end vertices of a path, or all the vertices of a cycle (see Figure 1 ). Formally, the orbit of a vertex v ∈ V H is the set of vertices to which it can be mapped via an automorphism: Orb(v) = {u ∈ V H : ∃g ∈ Aut(H) s.t. g(u) = v}, and the set of all orbits H \ Aut(H) = {Orb(v) : v ∈ V H } is usually called the quotient of the automorphism when it acts on the graph H. We are interested in the unique elements of this set that we will denote as {O V H,1 , O V H,2 , . . . , O V H,d H } , where d H is the cardinality of the quotient. Analogously, we define edge structural roles via edge automorphisms, i.e. bijective mappings from the edge set onto itself, that preserve edge adjacency (two edges are adjacent if they share a common endpoint). In particular, every vertex automorphism g induces an edge automorphism by mapping each edge {u, v} to {g(u), g(v)}.foot_0 In the same way as before, we construct the edge automorphism group, from which we deduce the partition of the edge set in edge orbits {O 1968) , also known as naive vertex refinement, 1-WL, or just WL), is a fast heuristic to decide if two graphs are isomorphic or not. The WL test proceeds as follows: every vertex v is initially assigned a colour c 0 (v) that is later iteratively refined by aggregating neighbouring information:



Note that the edge automorphism group is larger than that of induced automorphisms, but strictly larger only for 3 trivial cases(Whitney, 1932). However, induced automorphisms provide a more natural way to express edge structural roles.



E H,1 , O E H,2 , . . . , O E H,d H }. Weisfeiler-Lehman tests: The Weisfeiler-Lehman graph-isomorphism test (Weisfeiler & Leman,

