ON THE EXPRESSIVE POWER OF GEOMETRIC GRAPH NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the lens of the Weisfeiler-Leman (WL) graph isomorphism test. Yet, many graphs in scientific and engineering applications come embedded in Euclidean space with an additional notion of geometric isomorphism, which is not covered by the WL framework. In this work, we propose a geometric version of the WL test (GWL) for discriminating geometric graphs while respecting the underlying physical symmetries: permutations, rotation, reflection, and translation. We use GWL to characterise the expressive power of GNNs that are invariant or equivariant to physical symmetries in terms of the classes of geometric graphs they can distinguish. This allows us to formalise the advantages of equivariant GNNs over invariant GNNs: equivariant layers have greater expressive power as they enable propagating geometric information beyond local neighbourhoods, while invariant layers cannot distinguish graphs that are locally similar, highlighting their inability to compute global geometric quantities. Finally, we prove the equivalence between the universal approximation properties of geometric GNNs and our more granular discrimination-based perspective.

1. INTRODUCTION

Systems in biochemistry (Jamasb et al., 2022) , material science (Chanussot et al., 2021) , physical simulations (Sanchez-Gonzalez et al., 2020) , and multiagent robotics (Li et al., 2020) contain both geometry and relational structure. Such systems can be modelled via geometric graphs embedded in Euclidean space. For example, molecules are represented as a set of nodes which contain information about each atom and its 3D spatial coordinates as well as other geometric quantities such as velocity or acceleration. Notably, the geometric attributes transform along with Euclidean transformations of the system, i.e. they are equivariant to symmetry groups of rotations, reflections, and translation. Standard Graph Neural Networks (GNNs) which do not take spatial symmetries into account are ill-suited for geometric graphs, as the geometric attributes would no longer retain their physical meaning and transformation behaviour (Bogatskiy et al., 2022; Bronstein et al., 2021) . Addressed R1.2, R4.1 GNNs specialised for geometric graphs follow the message passing paradigm (Gilmer et al., 2017) where node features are updated in a permutation equivariant manner by aggregating features from local neighbourhoods. Crucially, in addition to permutations, the geometric attributes of the nodes transform along with Euclidean transformations of the system, i.e. they are equivariant to the Lie group of rotations (SO(d)) or rotations and reflections (O(d) ). We use G as a generic symbol for these Lie groups. We consider two classes of GNNs for geometric graphs: (1) G-equivariant mod-Addresses R2.1 els, where the intermediate features and propagated messages are equivariant geometric quantities such as vectors or tensors (Thomas et al., 2018; Anderson et al., 2019; Jing et al., 2020; Satorras et al., 2021; Brandstetter et al., 2022) ; and (2) G-invariant models, which only propagate local invariant scalar features such as distances and angles (Schütt et al., 2018; Xie & Grossman, 2018; Gasteiger et al., 2020) . Despite promising empirical results for both classes of architectures, key theoretical questions remain unanswered: (1) How to characterise the expressive power of geometric GNNs? And (2) what is the tradeoff between G-equivariant and G-invariant GNNs? The graph isomorphism problem (Read & Corneil, 1977) and the Weisfeiler-Leman (WL) (Weisfeiler & Leman, 1968 ) test for distinguishing non-isomorphic graphs have become a powerful tool for analysing the expressive power of non-geometric GNNs (Xu et al., 2019; Morris et al., 2019) . Figure 1 : Geometric Weisfeiler-Leman Test. GWL distinguishes non-isomorphic geometric graphs G 1 and G 2 by injectively assigning colours to distinct neighbourhood patterns, up to global symmetries (here G = O(d)). Each iteration expands the neighbourhood from which geometric information can be gathered (shaded for node i). Example inspired by Schütt et al. (2021) . The WL framework has been a major driver of progress in graph representation learning (Chen et al., 2019; Maron et al., 2019; Dwivedi et al., 2020; Bodnar et al., 2021b; a) . However, the WL framework does not directly apply to geometric graphs as they exhibit a stronger notion of isomorphism that also takes spatial symmetries into account. Contributions. In this work, we study the expressive power of geometric GNNs from the perspective of discriminating non-isomorphic geometric graphs: • In Section 3, we propose a geometric version of the Weisfeiler-Leman graph isomorphism test, termed GWL. We use GWL to formally characterise classes of graphs that can and cannot be distinguished by G-invariant and G-equivariant GNNs. We show how invariant models have limited expressive power as they cannot distinguish graphs where one-hop local neighbourhoods are similar, while equivariant models distinguish a larger class of graphs by propagating geometric vector quantities beyond local neighbourhoods. • In Section 4, we study the design space of geometric GNNs using GWL, highlighting their theoretical limitations in terms of depth and body order, as well as discussing practical implications. We show that G-invariant models cannot compute global geometric properties such as volume, area, centroid, etc. Synthetic experiments in Appendix C supplement our theory and highlight practical challenges in building geometric GNNs. Addresses R1.4, R2.5, R3.1, R4.1 • In Section 5, we follow Chen et al. (2019) and prove an equivalence between a model's ability to discriminate geometric graphs and its ability to universally approximate G-invariant functions. While universality is binary, GWL's discrimination-based perspective provides a more granular and practically insightful lens to study geoemtric GNNs.

2. BACKGROUND

Graph Isomorphism and Weisfeiler-Leman. An attributed graph G = (A, S) with a node set V of size n consists of an n × n adjacency matrix A and a matrix of scalar features S ∈ R n×f . Two attributed graphs G, H are isomorphic if there exists an edge-preserving bijection b : V(G) → V(H) such that s (G) i = s (H) b(i) , where the subscripts index rows and columns in the corresponding matrices. The Weisfeiler-Leman test (WL) is an algorithm for testing whether two (attributed) graphs are isomorphic (Weisfeiler & Leman, 1968) . At iteration zero the algorithm assigns a colour c (0) i ∈ C from a countable space of colours C to each node i. Nodes are coloured the same if their features are the same, otherwise, they are coloured differently. In subsequent iterations t, WL iteratively updates the node colouring by producing a new c (t) i ∈ C: c (t) i := HASH c (t-1) i , {{c (t-1) j | j ∈ N i }} , ( ) where HASH is an injective map (i.e. a perfect hash map) that assigns a unique colour to each input and {{•}} denotes a multiset -a set that allows for repeated elements. The test terminates when the partition of the nodes induced by the colours becomes stable. Given two graphs G and H, if there exists some iteration t for which {{c (t) i | i ∈ V(G)}} ̸ = {{c (t) i | i ∈ V(H)}}, then the graphs are not isomorphic. Otherwise, the WL test is inconclusive, and we say it cannot distinguish the two graphs. Group Theory. We assume basic familiarity with group theory, see Zee (2016) for an overview. We denote the action of the group G on a space X by g • x. If G acts on spaces X and Y , we say a function f : X → Y is G-equivariant if f (g • x) = g • f (x). A function f : X → Y is G-invariant if f (g • x) = f (x). The G-orbit of x ∈ X is O G (x) = {g • x | g ∈ G} ⊆ X. When x and x ′ are part of the same orbit, we write x ≃ x ′ . We say a function f : X → Y is G-orbit injective if we have f (x 1 ) = f (x 2 ) if and only if x 1 ≃ x 2 for any x 1 , x 2 ∈ X. Necessarily, such a function is G-invariant, since f (g • x) = f (x). We work with the permutation group over n elements S n and the Lie groups G = SO(d) or G = O(d). Invariance to the translation group T (d) is conventionally handled using relative positions. Given one of the standard groups above, for an element g we denote by M g (or another capital letter) its standard matrix representation. Addresses R2.1 Geometric graphs. A geometric graph G = (A, S, ⃗ V , ⃗ X) with a node set V is an attributed graph that is also decorated with geometric attributes: node coordinates ⃗ X ∈ R n×d and (optionally) vector features ⃗ V ∈ R n×d (e.g. velocity, acceleration). The geometric attributes transform as follows under the action of the relevant groups: (1) S n acts on the graph via P σ G := (P σ AP ⊤ σ , P σ S, P σ ⃗ V , P σ ⃗ X); (2) Orthogonal transformations Q g ∈ G act on ⃗ V , ⃗ X via ⃗ V Q g , ⃗ XQ g ; and (3) Translations ⃗ t ∈ T (d) act on the coordinates ⃗ X via ⃗ x i + ⃗ t for all nodes i. Without loss of generality, we work with a single vector feature per node. Our results generalise to multiple vector features or higher-order tensors, in which case we would replace the matrix group representation Q g with a more generic ρ(g). Two geometric graphs G and H are geometrically iso-Addresses R2.2 morphic (denoted G ≃ H) if there exists an attributed graph isomorphism b such that the geometric attributes are equivalent, up to global group actions Q g ∈ G and ⃗ t ∈ T (d): s (G) i , ⃗ v (G) i , ⃗ x (G) i = s (H) b(i) , Q g ⃗ v (H) b(i) , Q g (⃗ x (H) b(i) + ⃗ t) for all i ∈ V(G). Geometric graph isomorphism and distinguishing (sub-)graph geometries has important practical implications for representation learning. For e.g., in molecular systems, an ideal architecture should map distinct local structural environments around atoms to distinct embeddings in representation space (Bartók et al., 2013; Pozdnyakov et al., 2020) . Geometric Graph Neural Networks. We consider two broad classes of geometric GNN architectures. G-equivariant GNN layers update scalar and vector features from iteration t to t + 1 via learnable aggregate and update functions, AGG and UPD, respectively: s (t+1) i , ⃗ v (t+1) i := UPD (s (t) i , ⃗ v (t) i ) , AGG {{(s (t) i , s (t) j , ⃗ v (t) i , ⃗ v (t) j , ⃗ x ij ) | j ∈ N i }} . (3) where ⃗ x ij = ⃗ x i -⃗ x j denote relative position vectors. Alternatively, G-invariant GNN layers do not Addresses R4.2 update vector features and only aggregate scalar quantities from local neighbourhoods: s (t+1) i := UPD s (t) i , AGG {{(s (t) i , s (t) j , ⃗ v i , ⃗ v j , ⃗ x ij ) | j ∈ N i }} . ( ) For both models, the scalar features at the final iteration are mapped to graph-level predictions via a permutation-invariant readout f : R n×f → R f ′ . See Appendix B for concrete examples of geometric GNNs covered by our framework. Addresses R2.2, R4.3

3. THE GEOMETRIC WEISFEILER-LEMAN TEST

Assumptions. Analogous to the WL test, the geometric and scalar features the nodes are equipped with come from countable subsets C ⊂ R d and C ′ ⊂ R, respectively. As a result, when we require functions to be injective, we require them to be injective over the countable set of G-orbits that are obtained by acting with G on the dataset. Intuition. For an intuition of how to generalise WL to geometric graphs, we note that WL uses a local, node-centric, procedure to update the colour of each node i using the colours of its the 1hop neighbourhood N i . In the geometric setting, N i is an attributed point cloud around the central node i. As a result, each neighbourhood carries two types of information: (1) neighbourhood type (invariant to G) and (2) neighbourhood geometric orientation (equivariant to G). From an axiomatic point of view, our generalisation of the WL aggregation procedure must meet two properties: Property 1: Orbit injectivity of colours. If two neighbourhoods are the same up to an action of G (e.g. rotation), then the colours of the corresponding central nodes should be the same. Thus, the colouring must be G-orbit injective -which also makes it G-invariant -over the countable set of all orbits of neighbourhoods in our dataset. Property 2: Preservation of local geometry. A key property of WL is that the aggregation is injective. A G-invariant colouring procedure that purely satisfies Property 1 is not sufficient because, by definition, it loses spatial properties of each neighbourhood such as the relative pose or orientation (Hinton et al., 2011) . Thus, we must additionally update auxiliary geometric information variables in a way that is G-equivariant and injective. Geometric Weisfeiler-Leman (GWL). These intuitions motivate the following definition of the GWL test. At initialisation, we assign to each node i ∈ V a scalar node colour c i ∈ C ′ and an auxiliary object g i containing the geometric information associated to it: c (0) i := HASH(s i ), g i := c (0) i , ⃗ v i , where HASH denotes an injective map over the scalar attributes s i of node i. To define the inductive step, assume we have the colours of the nodes and the associated geometric objects at iteration t -1. Then, we can aggregate the geometric information around node i into a new object as follows: g (t) i := (c (t-1) i , g (t-1) i ) , {{(c (t-1) j , g (t-1) j , ⃗ x ij ) | j ∈ N i }} , Importantly, the group G can act on the geometric objects above inductively by acting on the geometric information inside it. This amounts to rotating (or reflecting) the entire t-hop neighbourhood contained inside: g • g (0) i := c (0) i , Q g ⃗ v i , g • g (t) i := (c (t-1) i , g • g (t-1) i ), {{(c (t-1) j , g • g (t-1) j , Q g ⃗ x ij ) | j ∈ N i }} Clearly, the aggregation building g i for any time-step t is injective and G-equivariant. Finally, we can compute the node colours at iteration t for all i ∈ V by aggregating the geometric information in the neighbourhood around node i: c (t) i := I-HASH (t) g (t) i , by using a G-orbit injective and G-invariant function that we denote by I-HASH. That is for any geometric objects g, g ′ , I-HASH(g) = I-HASH(g ′ ) if and only if there exists g ∈ G such that g = g • g ′ . Note that I-HASH is an idealised G-orbit injective function, similar to the HASH function used in WL, which is not necessarily continuous. Addresses R4.4 Overview. With each iteration, g i aggregates geometric information in progressively larger t-hop subgraph neighbourhoods N (t) i around the node i. The node colours summarise the structure of these t-hops via the G-invariant aggregation performed by I-HASH. The procedure terminates when the partitions of the nodes induced by the colours do not change from the previous iteration. Finally, given two geometric graphs G and H, if there exists some iteration t for which {{c (t) i | i ∈ V(G)}} ̸ = {{c (t) i | i ∈ V(H) }}, then GWL deems the two graphs as being geometrically non-isomorphic. Otherwise, we say the test cannot distinguish the two graphs. Invariant GWL. Since we are interested in understanding the role of G-equivariance, we also consider a more restrictive Invariant GWL (IGWL) that only updates node colours using the G-orbit injective I-HASH function and does not propagate geometric information: c (t) i := I-HASH (c (t-1) i , ⃗ v i ) , {{(c (t-1) j , ⃗ v j , ⃗ x ij ) | j ∈ N i }} . ( ) IGWL with k-body scalars. In order to further analyse the construction of the node colouring function I-HASH, we consider IGWL (k) based on the maximum number of nodes involved in the computation of G-invariant scalars (also known as the 'body order' (Batatia et al., 2022a) ): c (t) i := I-HASH (k) (c (t-1) i , ⃗ v i ) , {{(c (t-1) j , ⃗ v j , ⃗ x ij ) | j ∈ N i }} , Figure 2 : Invariant GWL Test. IGWL cannot distinguish G 1 and G 2 as they are 1-hop identical: The G-orbit of the 1-hop neighbourhood around each node is the same, and IGWL cannot propagate geometric orientation information beyond 1-hop (here G = O(d)). and I-HASH (k+1) is defined as: HASH {{I-HASH (c (t-1) i , ⃗ v i ), {{(c (t-1) j1 , ⃗ v j1 , ⃗ x ij1 ), . . . , (c (t-1) j k , ⃗ v j k , ⃗ x ij k )}} | j ∈ (N i ) k }} , where j = [j 1 , . . . , j k ] are all possible k-tuples formed of elements of N i . Therefore, IGWL (k) is now constrained to extract information only from all the possible k-sized tuples of nodes (including the central node) in a neighbourhood. For instance, I-HASH (2) can identify neighbourhoods only up to pairwise distances among the central node and any of its neighbours (i.e. a 2-body scalar), while I-HASH (3) up to distances and angles formed by any two edges (i.e. a 3-body scalar). No-Addresses R4.4 tably, distances and angles alone are incomplete descriptors of local geometry (Bartók et al., 2013; Pozdnyakov et al., 2020) . Therefore, I-HASH (k) with lower k makes the colouring weaker.

3.1. WHAT GEOMETRIC GRAPHS CAN GWL AND IGWL DISTINGUISH?

In order to formalise the expressive power of GWL and IGWL, let us consider what geometric graphs can and cannot be distinguished by the tests. As a simple first observation, we note that when all coordinates and vectors are set equal to zero GWL coincides with the standard WL. In this edge case, GWL has the same expressive power as WL. Next, let us consider consider the simplified setting of two geometric graphs G 1 = (A 1 , S 1 , ⃗ V 1 , ⃗ X 1 ) and G 2 = (A 2 , S 2 , ⃗ V 2 , ⃗ X 2 ) such that the underlying attributed graphs (A 1 , S 1 ) and (A 2 , S 2 ) are isomorphic. This case frequently occurs in chemistry, where molecules occur in different conformations, but with the same graph topology given by the covalent bonding structure. Recall that each iteration of GWL aggregates geometric information g i . We say G 1 and G 2 are k-hop distinct if for all graph isomorphisms b, there is some node Proposition 2. Up to k iterations, GWL cannot distinguish any k-hop identical geometric graphs G 1 and G 2 where the underlying attributed graphs are isomorphic. i ∈ V 1 , b(i) ∈ V 2 such that the corresponding k-hop subgraphs N (k) i and N (k) b(i) are distinct. Additionally, we can state the following results about the more constrained IGWL. Proposition 3. IGWL can distinguish any 1-hop distinct geometric graphs G 1 and G 2 where the underlying attributed graphs are isomorphic, and 1 iteration is sufficient. Proposition 4. Any number of iterations of IGWL cannot distinguish any 1-hop identical geometric graphs G 1 and G 2 where the underlying attributed graphs are isomorphic. An example illustrating Propositions 1 and 4 is shown in Figures 1 and 2 , respectively. We can now consider the more general case where the underlying attributed graphs for G 1 = (A 1 , S 1 , ⃗ V 1 , ⃗ X 1 ) and G 2 = (A 2 , S 2 , ⃗ V 2 , ⃗ X 2 ) are non-isomorphic and constructed from point clouds using radial cutoffs, as conventional for biochemistry and material science applications. Proposition 5. Assuming geometric graphs are constructed from point clouds using radial cutoffs, GWL can distinguish any geometric graphs G 1 and G 2 where the underlying attributed graphs are non-isomorphic. At most k Max iterations are sufficient, where k Max is the maximum graph diameter among G 1 and G 2 . These results enable us to compare the expressive powers of GWL and IGWL. Theorem 6. GWL is strictly more powerful than IGWL. This statement formalises the advantage of G-equivariant intermediate layers for graphs and geometric data, as prescribed in the Geometric Deep Learning blueprint (Bronstein et al., 2021) , in addition to echoing similar intuitions in the computer vision community. As remarked by Hinton et al. (2011) , translation invariant models do not understand the relationship between the various parts of an image (colloquially called the "Picasso problem"). Similarly, our results point to IGWL failing to understand how the various 1-hops of a graph are stitched together. Finally, we identify a setting where this distinction between the two approaches disappears. Proposition 7. IGWL has the same expressive power as GWL for fully connected geometric graphs.

3.2. CHARACTERISING THE EXPRESSIVE POWER OF GEOMETRIC GNNS

We would like to characterise the maximum expressive power of geometric GNNs based on the GWL test. Firstly, we show that any message passing G-equivariant GNN can be at most as powerful as GWL in distinguishing non-isomorphic geometric graphs. Proofs are available in Appendix E. Theorem 8. Any pair of geometric graphs distinguishable by a G-equivariant GNN is also distinguishable by GWL. With a sufficient number of iterations, the output of G-equivariant GNNs can be equivalent to GWL if certain conditions are met regarding the aggregate, update and readout functions. Proposition 9. G-equivariant GNNs have the same expressive power as GWL if the following conditions hold: (1) The aggregation AGG is an injective, G-equivariant multiset function. (2) The scalar part of the update UPD s is a G-orbit injective, G-invariant multiset function. (3) The vector part of the update UPD v is an injective, G-equivariant multiset function. (4) The graph-level readout f is an injective multiset function. Similar statements can be made for G-invariant GNNs and IGWL. Thus, we can directly transfer Addresses R1.3 our results about GWL and IGWL to the class of GNNs bounded by the respective tests. This has several interesting practical implications, discussed subsequently.

4. UNDERSTANDING THE DESIGN SPACE OF GEOMETRIC GNNS VIA GWL

Overview. We now use the GWL framework to better understand key design choices for building geometric GNNs (Batatia et al., 2022a) : (1) Depth or number of layers; and (2) Body order of invariant scalars. In doing so, we formalise theoretical limitations of current architectures and provide practical implications. Proofs are available in Appendix F.

4.1. ROLE OF DEPTH: PROPAGATING GEOMETRIC INFORMATION

Each iteration of GWL expands the neighbourhood from which geoemtric information can be gathered. We leveraged this construction in Section 3.1 to formalise the number of GWL iterations required to distinguish classes of geometric graphs. Figure 3 : Geometric Computation Trees for GWL and IGWL. Unlike GWL, geometric orientation information cannot flow from the leaves to the root in IGWL, restricting its expressive power. IGWL cannot distinguish G 1 and G 2 as all 1-hop neighbourhoods are computationally identical. Consequently, stacking multiple G-equivariant GNN layers enables the computation of compositional geometric features. This can be understood via a geometric version of computation trees (Garg et al., 2020) . A computation tree T (t) i represents the maximum information contained in GWL/IGWL colours or GNN features at iteration t by an 'unrolling' of the message passing procedure. Geometric computation trees are constructed recursively: T (0) i = (s i , ⃗ v i ) for all i ∈ V. For t > 0, we start with a root node (s i , ⃗ v i ) and add a child subtree T (t-1) j for all j ∈ N i along with the relative position ⃗ x ij along the edge, as shown in Figure 3 . To obtain the root node's embedding or colour, both scalar and geometric information is propagated from the leaves up to the root. Thus, if two nodes have identical geometric computation trees, they will be mapped to the same node embedding or colour. Critically, geometric orientation information cannot flow from one level to another in the computation trees for IGWL and G-invariant GNNs, as they only update scalar information. In the recursive construction procedure, we must insert a connector node (s j , ⃗ v j ) before adding the child subtree T (t-1) j for all j ∈ N i and prevent geometric information propagation between them. Addresses R2.3 As a result, even the most powerful G-invariant GNNs are restricted in their ability to compute global and non-local geometric properties. Proposition 10. IGWL and G-invariant GNNs cannot decide several geometric graph properties: (1) perimeter, surface area, and volume of the bounding box/sphere enclosing the geometric graph; (2) distance from the centroid or centre of mass; and (3) dihedral angles. Practical Implications. Proposition 10, together with Propositions 1 and 4, highlight critical theoretical limitations of G-invariant GNNs. Our results suggest that G-equivariant GNNs should be preferred when working with large geometric graphs such as macromolecules with thousands of nodes, where message passing is restricted to local radial neighbourhoods around each node. Motivated by these limitations, two straightforward approaches to improving G-invariant GNNs may be: (1) pre-computing non-local geometric properties as input features, e.g. models such as GemNet (Gasteiger et al., 2021) and GearNet (Zhang et al., 2022) already use two-hop dihedral angles. And (2) working with fully connected geometric graphs, as Proposition 7 suggests that G-equivariant and G-invariant GNNs can be made equally powerful when performing all-to-all message passing. This is supported by the empirical success of recent G-invariant 'Graph Transformers' (Joshi, 2020; Shi et al., 2022) for small molecules with tens of nodes, where working with full graphs is tractable.

4.2. ROLE OF BODY ORDER: DISTINGUISHING G-ORBITS

At each iteration of GWL and IGWL, the I-HASH function assigns a G-invariant colouring to distinct geometric neighbourhood patterns. I-HASH is an idealised G-orbit injective function which is not necessarily continous. In geometric GNNs, this corresponds to scalarising local geometric information when updating the scalar features; examples are shown in equation 11 and equation 12. We can analyse the construction of the I-HASH function and the scalarisation step in geometric GNNs via the k-body variations IGWL (k) . Firstly, we formalise the relationship between the injectivity of I-HASH (k) and the maximum cardinality of local neighbourhoods in a given dataset. Proposition 11. I-HASH (m) is G-orbit injective for m = max({|N i | | i ∈ V}), the maximum cardinality of all local neighbourhoods N i in a given dataset. Practical Implications. While building provably injective I-HASH (k) functions may require intractably high k, the hierarchy of IGWL (k) tests enable us to study the expressive power of practical G-invariant aggregators used in current geometric GNN layers, e.g. SchNet (Schütt et al., 2018) , Addresses R2.7 E-GNN (Satorras et al., 2021) , and TFN (Thomas et al., 2018) use distances; DimeNet (Gasteiger et al., 2020) uses distances and angles. Notably, MACE (Batatia et al., 2022b ) constructs a complete basis of scalars up to arbitrary body order k via Atomic Cluster Expansion (Dusson et al., 2019) , which can be G-orbit injective if the conditions in Proposition 11 are met. We can state the Addresses R4.4 following about the IGWL (k) hierarchy and the corresponding GNNs. Proposition 12. IGWL (k) is at least as powerful as IGWL (k-1) . For k ≤ 5, IGWL (k) is strictly more powerful than IGWL (k-1) . Finally, we show that IGWL (2) is equivalent to WL when all the pairwise distances between the nodes are the same. A similar observation was recently made by Pozdnyakov & Ceriotti (2022) . Proposition 13. Let G 1 = (A 1 , S 1 , ⃗ X 1 ) and G 2 = (A 2 , S 2 , ⃗ 2 ) be two geometric graphs with the property that all edges have equal length. Then, IGWL (2) distinguishes the two graphs if and only if WL can distinguish the attributed graphs (A 1 , S 1 ) and (A 1 , S 1 ). This equivalence points to limitations of distance-based G-invariant models like SchNet (Schütt et al., 2018) . These models suffer from all well-known failure cases of WL, e.g. they cannot distinguish two equilateral triangles from the regular hexagon (Gasteiger et al., 2020) . Synthetic Experiments. Appendix C contains additional synthetic experiments supplementing our results and highlighting practical challenges in building powerful geometric GNNs, s.a. oversmoothing and oversquashing with increased depth, as well as designing efficient higher order aggregators. Addresses R1.4, R2.5, R3.1, R4.1

5. DISCRIMINATION AND UNIVERSALITY

Overview. Following Chen et al. ( 2019), we study the equivalence between the universal approximation capabilities of geometric GNN models (Dym & Maron, 2020) and perspective of discriminating geometric graphs introduced by GWL, generalising their results to any isomorphism induced by a compact Lie group G. We further study the number of invariant aggregators that are required in a continuous setting to distinguish any two neighbourhoods. Proofs are available in Appendix G. In the interest of generality, we use a general space X acted upon by a compact group G and we are interested in the capacity of G-invariant functions over X to separate points in Y . The restriction to a smaller subset Y is useful because we would like to separately consider the case when Y is countable due to the use of countable features. Therefore, in general, the action of G on Y might not be strictly defined since it might yield elements outside Y . For our setting, the reader could take Addresses R2.4 X = (R d ×R f ) n×n to be the space of n×n geometric graphs and Y = X n×n , where X ⊆ R d ×R f . Definition 14. Let G be a compact group and C a collection of G-invariant functions from a set X to R. For a subset Y ⊆ X, we say the C is pairwise Y G discriminating if for any y 1 , y 2 ∈ Y such that y 1 ̸ ≃ y 2 , there exists a function h ∈ C such that h(y 1 ) ̸ = h(y 2 ). We note here that h is not necessarily injective, i.e. there might be y ′ 1 , y ′ 2 for which h(y ′ 1 ) = h(y ′ 1 ). Therefore, pairwise discrimination is a weaker notion of discrimination than the one GWL uses. Addresses R2.4 Definition 15. Let G be a compact group and C a collection of G-invariant functions from X to R. For Y ⊆ X, we say the C is universally approximating over Y if for all G-invariant functions f from X to R and for all ϵ > 0, there exists h ε,f ∈ C such that ∥f -h ε,f ∥ Y := sup y∈Y |f (y) -h(y)| < ε. Countable Features. We first focus on the countable feature setting, which is also the setting where the GWL test operates. Therefore, we will assume that Y is a countable subset of X. Theorem 16. If C is universally approximating over Y , then C is also pairwise Y G discriminating. This result further motivates the interest in discriminating geometric graphs, since a model that cannot distinguish two non-isomorphic geometric graphs is not universal. By further assuming that Y is finite, we obtain a result in the opposite direction. Given a collection of functions C, we define like in Chen et al. (2019) the class C +L given by all the functions of the form MLP([f 1 (x), . . . , f k (x)]) with f i ∈ C and finite k, where the MLP has L layers with ReLU hidden activations. Theorem 17. If C is pairwise Y G discriminating, then C +2 is universally approximating over Y . Continous Features. The symmetries characterising geometric graphs are naturally continuous (e.g. rotations). Therefore, it is natural to ask how the results above translate to continuous Ginvariant functions over a continuous subspace Y . Therefore, for the rest of this section, we assume that (X, d) is a metric space, Y is a compact subset of X and G acts continuously on X. We now produce an estimate for the number of aggregators needed to learn continous orbit-space injective functions on a manifold X based on results from differential geometry (Lee, 2013). A Addresses R2.6 group G acts freely on X if gx = x implies g = e G , where e G is the identity element in G. Theorem 20. Let X be a smooth n-dim manifold and G an m-dim compact Lie group acting continuously on X. Suppose there exists a smooth submanifold Y of X of the same dimension such that G acts freely on it. Then, any G-orbit injective function f : X → R d requires that d ≥ n -m. We now apply this theorem to the local aggregation operation performed by geometric GNNs. Let X = R n×d and G = S n × O(d) or S n × SO(d). Let P g and Q g be the permutation matrix and the orthogonal matrix associated with the group element g ∈ G. Then g acts on matrices X ∈ X continuously via P g XQ ⊤ g . Then, G orbit-space injective functions on X are functions on point clouds of size n that can distinguish any two different point clouds. Theorem 21. For n ≥ d -1 > 0 or n = d = 1, any continuous S n × SO(d) orbit-space injective function f : R n×d → R q requires that q ≥ nd -d(d -1)/2. We can also generalise this to O(d), with the slightly stronger assumption that n ≥ d. Theorem 22. For n ≥ d > 0, any continuous S n ×O(d) orbit-space injective function f : R n×d → R q requires that q ≥ nd -d(d -1)/2. Overall, these results show that when working with point clouds in R 3 as is common in molecular or physical applications, at least q = 3(n -1) aggregators are required. This result argues why a bigger representational width can help distinguish neighbourhoods. Finally, in the particular case of the zero-dimensional subgroup S n × {e SO(d) } ≃ S n we obtain a statement holding for all n and generalising a result from PNA Corso et al. (2020) regarding the aggregators for non-geometric GNNs. The original PNA result considers the case d = 1 and here we extend it to arbitrary d. Proposition 23. Any S n -invariant injective function f : R n×d → R q requires q ≥ nd.

6. DISCUSSION

This work proposes a geometric version of the Weisfeiler-Leman graph isomorphism test (GWL) for discriminating geometric graphs while respecting the underlying spatial symmetries. We use GWL to characterise the expressive power of geometric GNNs and connect the universal approximation properties of these models to discriminating geometric graphs. GWL provides an abstraction to study the limits of geometric GNNs. In practice it is challenging to build maximally powerful GNNs that satisfy the conditions of Proposition 9 as GWL relies on perfect colouring and aggregation functions to identify distinct neighbourhoods and propogate their geometric orientiation information, respectively. Based on the intuitions gained from GWL, future work will explore building provably powerful, practical geometric GNNs for applications in biochemistry, material science, and multiagent robotics, and better characterise the trade-offs related to practical implementation choices.

A RELATED WORK

Literature on the completeness of atom-centred interatomic potentials has focused on distinguishing 1-hop local neighbourhoods (point clouds) around atoms by building spanning sets for continuous, G-equivariant multiset functions (Shapeev, 2016; Drautz, 2019; Dusson et al., 2019; Pozdnyakov et al., 2020) . Recent theoretical work on geometric GNNs and their universality has shown that architectures such as TFN, GemNet and GVP-GNN (Dym & Maron, 2020; Villar et al., 2021; Gasteiger et al., 2021; Jing et al., 2020) can be universal approximators of continuous, G-equivariant or G-invariant multiset functions over point clouds, i.e. fully connected graphs. In contrast, the GWL framework studies the expressive power of geometric GNNs operating on sparse graphs from the perspective of discriminating geometric graphs and the graph isomorphism problem. The discrimination lens is potentially more granular and practically insightful than universality. A model may either be universal or not. On the other hand, there could be multiple degrees of discrimination depending on the classes of geometric graphs that can and cannot be distinguished, which our work aims to formalise. Addresses R2.4

B ADDITIONAL BACKGROUND ON GEOMETRIC GNNS

The GWL framework can be used to characterise the expressive power and theoretical limitations of two broad classes of geometric GNNs. G-invariant GNNs. G-invariant GNN layers aggregate scalar quantities from local neighbourhoods via scalarising the geometric information. Scalar features are update from iteration t to t + 1 via learnable aggregate and update functions, AGG and UPD, respectively: s (t+1) i := UPD s (t) i , AGG {{(s (t) i , s (t) j , ⃗ v i , ⃗ v j , ⃗ x ij ) | j ∈ N i }} . ( ) For e.g., SchNet (Schütt et al., 2018) uses relative distances ∥⃗ x ij ∥ to scalarise local geometric information, while DimeNet (Gasteiger et al., 2020) uses both distances as well as angles ⃗ x ij • ⃗ x ik among triplets, as follows: Addresses R4.3 s (t+1) i := s (t) i + j∈Ni f 1 s (t) j , ∥⃗ x ij ∥ (SchNet) (11) s (t+1) i := j∈Ni f 1 s (t) i , s (t) j , k∈Ni\{j} f 2 s (t) j , s (t) k , ∥⃗ x ij ∥, ⃗ x ij • ⃗ x ik (DimeNet) (12) G-equivariant GNNs. G-equivariant GNN layers update both scalar and vector features by propagating scalar as well as vector messages, m i and ⃗ m (t) i , respectively: m (t) i , ⃗ m (t) i := AGG {{(s (t) i , s (t) j , ⃗ v (t) i , ⃗ v (t) j , ⃗ x ij ) | j ∈ N i }} (Aggregate) s (t+1) i , ⃗ v (t+1) i := UPD (s (t) i , ⃗ v (t) i ) , (m (t) i , ⃗ m (t) i ) (Update) For e.g., PaiNN (Schütt et al., 2021) interaction layers aggregate scalar and vector features via learnt filters conditioned on the relative distance: m (t) i := s (t) i + j∈Ni f 1 s (t) j , ∥⃗ x ij ∥ (15) ⃗ m (t) i := ⃗ v (t) i + j∈Ni f 2 s (t) j , ∥⃗ x ij ∥ ⊙ ⃗ v (t) j + j∈Ni f 3 s (t) j , ∥⃗ x ij ∥ ⊙ ⃗ x ij E-GNN (Satorras et al., 2021) and GVP-GNN (Jing et al., 2020 ) use similar operations. The update step applies a gated non-linearity (Weiler et al., 2018) on the vector features, which learns to scale their magnitude using their norm concatenated with the scalar features: s (t+1) i := m (t) i + f 4 m (t) i , ∥ ⃗ m (t) i ∥ , ⃗ v (t+1) i := ⃗ m (t) i + f 5 m (t) i , ∥ ⃗ m (t) i ∥ ⊙ ⃗ m (t) i . ( ) The updated scalar features are both G-invariant and T (d)-invariant as the only geometric information used is the relative distances, while the updated vector features are G-equivariant and T (d)invariant as they aggregate G-equivariant, T (d)-invariant vector quantities from the neighbours. Addresses R4.3 Another example of G-equivariant GNNs is the e3nn framework (Geiger & Smidt, 2022) , which can be used to instantiate Tensor Field Network (Thomas et al., 2018) , Cormorant (Anderson et al., 2019) , SEGNN (Brandstetter et al., 2022) , and MACE (Batatia et al., 2022b) . These models use higher order spherical tensors hi,l ∈ R 2l+1×f as node feature, starting from order l = 0 up to arbitrary l = L. The first two orders correspond to scalar and vector features, respectively. The Addresses R2.2 higher order tensors hi are updated via tensor products of neighbourhood features hj for all j ∈ N i with the higher order spherical harmonic representations Y of the relative displacement ⃗ xij ∥⃗ xij ∥ = xij : h(t+1) i := h(t) i + j∈Ni Y ( xij ) ⊗ w h(t) j , where the weights w of the tensor product are computed via a learnt radial basis function of the relative distance, i.e. w = f (∥⃗ x ij ∥). To obtain the entry m 3 ∈ {-l 3 , . . . , +l 3 } for the order-l 3 part of the updated higher order tensors h(t+1) i , we can expand the tensor product in equation 18 as: h(t+1) i,l3m3 := h(t) i,l3m3 + l3m3 l1m1,l2m2 C l3m3 l1m1,l2m2 j∈Ni f l1l2l3 (∥⃗ x ij ∥) Y m1 l1 ( xij ) h(t) j,l2m2 , where C l3m3 l1m1,l2m2 are the Clebsch-Gordan coefficients ensuring that the updated features are equivariant. Notably, when restricting the tensor product to only scalars (l = 0) and vectors (l = 1), we obtain updates of the form similar to equation 15, equation 16 and equation 17. Applications. Systems in biochemistry, material science, physical simulations, and multiagent robotics can be modelled using geometric GNNs. Invariant GNNs have shown strong performance for protein design (Zhang et al., 2022; Dauparas et al., 2022) and electrocatalysis (Gasteiger et al., 2021; Shi et al., 2022) , while equivariant GNNs are being used within learnt interatomic potentials for molecular dynamics (Schütt et al., 2021; Batzner et al., 2022; Batatia et al., 2022b) . Addressed R4.1 C SYNTHETIC EXPERIMENTS FOR GEOMETRIC GNN DESIGN SPACE Addresses R1.4, R2.5, R3.1, R4.1 We perform three simple synthetic experiments to highlight the practical challenges of building maximally powerful geometric GNNs. We hope that our synthetic experiments and associated code can be a pedagogical tool for exploring the geometric GNN design space in future work. Setup and Hyperparameters. We experiment with the following models: (1) SchNet (Schütt et al., 2018) and DimeNet (Gasteiger et al., 2020) as representative G-invariant GNNs; (2) E-GNN (Satorras et al., 2021) and GVP-GNN (Jing et al., 2020) as representative G-equivariant GNNs which use scalars and vectors in R 3 ; and (3) TFN (Thomas et al., 2018) and MACE (Batatia et al., 2022b) to study higher order G-equivariant GNNs using spherical tensors. For SchNet and DimeNet, we use the implementation from PyTorch Geometric (Fey & Lenssen, 2019) . For E-GNN, GVP-GNN, and MACE, we adapt implementations from the respective authors. Our TFN implementation is based on e3nn (Geiger & Smidt, 2022) , and we also re-implement MACE by incorporating the EquivariantProductBasisBlock from its authors into our TFN layer. We set scalar feature channels to 128 for SchNet, DimeNet, and EGNN. We set scalar/vector/tensor feature channels to 64 for GVP, TFN, MACE. TFN and MACE use order L = 2 tensors by default. MACE uses local body order 4 by default. We train all models for 100 epochs using the Adam optimiser, with an initial learning rate 1e -4, which we reduce by a factor of 0.9 and a patience of 25 epochs when the performance plateaus. All results are averaged across 10 random seeds. Identifying neighbourhood fingerprints: counterexamples from Pozdnyakov et al. (2020) . GWL uses a node colouring function I-HASH for distinguishing G-orbits of neighbourhoods, i.e. a neighbourhood fingerprint. In geometric GNNs, this corresponds to a scalarisation step where local geometric information from subsets of neighbours is aggregated to compute G-invariant scalars (termed the body order). In Table 1 , we train single layer geometric GNNs to distinguish the counterexamples using updated scalar features. Unsurprisingly, we find that most layers computiong 2 or 3 body scalarisations fail the task. Notably, training higher body order MACE layers to distinguish the chiral and non-chiral 4-body counterexamples should be theoretically possible, but proved challenging in practice. This highlights the difficulty of designing as well as optimising continuous, high body order neighbourhood fingerprints. Identifying neighbourhood orientation: rotationally symmetric structures. GWL is able to perfectly aggregate G-equivariant geometric information without losing neighbourhood orientation by making use of an auxiliary nested geometric object g i . On the other hand, G-equivariant GNNs 2020). We train single layer geometric GNNs to distinguish each counterexample pair of local neighbourhoods that are indistinguishable using k-body scalarisation. Most current geometric GNN layers are restricted to body order 2 or 3 and fail the tasks. Distinguishing the 4-body counterexamples should be theoretically possible with higher body order MACE layers, but proved challenging in practice. This highlights the difficulty of designing as well as optimising high body order neighbourhood fingerprints beyond simple distances and angles. Anomolous results are marked in red and expected results in green . 2 : Rotationally symmetric structures. We train single layer G-equivariant GNNs to distinguish two distinct rotated versions of each L-fold symmetric structure. We find that layers using order L tensors are unable to identify the orientation of structures with rotation symmetry higher than L-fold. This issue is particularly prevalent for E-GNN and GVP-GNN (tensor order 1). aggregate geometric information via summing neighbourhood features in fixed dimensional spaces using either cartesian vectors or higher order spherical tensors, which come with tradeoffs between tractability and empirical performance.

Rotational symmetry GNN Layer

In Table 2 , we study how rotational symmetries interact with tensor order in G-equivariant GNNs. We evaluate current G-equivariant layers on their ability to distinguish the orientation of structures with rotational symmetry. An L-fold symmetric structure does not change when rotated by an angle 2π L around a point (in 2D) or axis (3D). We consider two distinct rotated versions of each L-fold symmetric structure and train single layer G-equivariant GNNs to classify the two orientations using the updated geometric features. We find that layers using order L tensors are unable to identify the orientation of structures with rotation symmetry higher than L-fold. This observation can be (k = 4-chains) Number of layers GNN Layer ⌊ k 2 ⌋ ⌊ k 2 ⌋ + 1 = 3 ⌊ k 2 ⌋ + 2 ⌊ k 2 ⌋ + 3 ⌊ k 2 ⌋ + 4 Inv. IGWL 50% 50% 50% 50% 50% SchNet 50.0 ± 0.00 50.0 ± 0.00 50.0 ± 0.00 50.0 ± 0.00 50.0 ± 0.00 DimeNet 50.0 ± 0.00 50.0 ± 0.00 50.0 ± 0.00 50.0 ± 0.00 50.0 ± 0.00 Equiv. Table 3 : k-chain geometric graphs. k-chains are (⌊ k 2 ⌋ + 1)-hop distinguishable and ⌋ + 1) are theoretically sufficient to distinguish them. We train geometric GNNs with an increasing number of layers to distinguish k = 4-chains. G-equivariant GNNs may require more iterations that prescribed by GWL, pointing to preliminary evidence of oversmoothing and oversquashing when geometric information is propogated across multiple layers using fixed dimensional feature spaces. IGWL and G-invariant GNNs are unable to distinguish k-chains for any k ≥ 2 and G = O(3). attributed to the spherical harmonics which exhibits rotational symmetry themselves and used as the underlying basis. Layers such as E-GNN and GVP-GNN using cartesian vectors (corresponding to tensor order 1) are popular as working with higher order tensors can be computationally intractable for many applications. However, E-GNN and GVP-GNN are particularly poor at disciminating orientation of rotationally symmetric structures. This may have implications for modelling of periodic materials which naturally exhibit such symmetries (Levine & Steinhardt, 1984) .

GWL

Propogating geometric information: k-chains. In addition to perfect aggregation, GWL also assumes perfect propogation of G-equivariant geometric information, which implies that the test can be run for any number of iterations without loss of information. In geometric GNNs, G-equivariant Addresses R1.4 information is propogated via summing features from multiple layers in fixed dimensional spaces, which may lead to distortion or loss of information from distant nodes. To study the practical implications of depth in propagating geometric information beyond local neighbourhoods, we consider k-chain geometric graphs which generalise the examples from Schütt et al. (2021) . Each pair of k-chains consists of k + 2 nodes with k nodes arranged in a line and differentiated by the orientation of the 2 end points. Thus, k-chain graphs are (⌊ k 2 ⌋ + 1)-hop distinguishable, and (⌊ k 2 ⌋ + 1) GWL iterations are theoretically sufficient to distinguish them. In Table 3 , we train G-equivariant and G-invariant GNNs with an increasing number of layers to distinguish k-chains. Despite the supposed simplicity of the task, especially for small chain lengths, we find that popular G-equivariant GNNs such as E-GNN and TFN may require more iterations that prescribed by GWL. Notably, as the length of the chain gets larger than k = 4, all G-equivariant GNNs tended to lose performance and required more than (⌊ k 2 ⌋ + 1) iterations to solve the task. IGWL and G-invariant GNNs are unable to distinguish k-chains. Table 3 , together with Table 2, points to preliminary evidence of the oversmoothing and oversquashing phenomenon (Nt & Maehara, 2019; Alon & Yahav, 2021; Topping et al., 2022) for geometric GNNs. These issues are most evident for E-GNN, which uses a single vector feature to aggregate Addresses R1.4 and propogate geometric information. This may have implications in modelling macromolecules where long-range interactions often play important roles. Studying both these issues are exciting avenues for future work towards building provably powerful, practical geometric GNNs.

D PROOFS FOR WHAT GWL AND IGWL CAN DISTINGUISH

The following results are a consequence of the construction of GWL as well as the definitions of k-hop distinct and k-hop identical geometric graphs. Note that k-hop distinct geometric graphs are also (k + 1)-hop distinct. Similarly, k-hop identical geometric graphs are also (k -1)-hop identical, but not necessarily (k + 1)-hop distinct. Given two distinct neighbourhoods N 1 and N 2 , the G-orbits of the corresponding geometric multisets g 1 and g 2 are mutually exclusive, i.e. O G (g 1 ) ∩ O G (g 2 ) ≡ ∅. By the properties of I-HASH this implies c 1 ̸ = c 2 . Conversely, if N 1 and N 2 were identical up to group actions, their G-orbits would overlap, i.e. g 1 = g g 2 for some g ∈ G and O G (g 1 ) = O G (g 2 ) ⇒ c 1 = c 2 . Proposition 1. GWL can distinguish any k-hop distinct geometric graphs G 1 and G 2 where the underlying attributed graphs are isomorphic, and k iterations are sufficient. Proof of Proposition 1. The k-th iteration of GWL identifies the G-orbit of the k-hop subgraph N (k) i at each node i via the geometric multiset g (k) i . 1 and G 2 being k-hop distinct implies that there exists some bijection b and some node i ∈ V 1 , b(i) ∈ V 2 such that the corresponding k-hop subgraphs N (k) i and N (k) b(i) are distinct. Thus, the G-orbits of the corresponding geometric multisets g  (k) i and g (k) b(i) are mutually exclusive, i.e. O G (g (k) i ) ∩ O G (g (k) b(i) ) ≡ ∅ ⇒ c (k) i ̸ = c (k) b(i) . Thus, k iterations of GWL are sufficient to distinguish G 1 and G 2 . (k) i ) = O G (g (k) b(i) ) ⇒ c (k) i = c (k) b(i) . Thus, up to k iterations of GWL cannot distinguish G 1 and G 2 . Proposition 3. IGWL can distinguish any 1-hop distinct geometric graphs G 1 and G 2 where the underlying attributed graphs are isomorphic, and 1 iteration is sufficient. Proof of Proposition 3. Each iteration of IGWL identifies the G-orbit of the 1-hop local neighbourhood N (k=1) i at each node i. G 1 and G 2 being 1-hop distinct implies that there exists some bijection b and some node i ∈ V 1 , b(i) ∈ V 2 such that the corresponding 1-hop local neighbourhoods N (1) i and N (1) b(i) are distinct. Thus, the G-orbits of the corresponding geometric multisets g (1) i and g (1) b(i) are mutually exclusive, i.e. O G (g (1) i ) ∩ O G (g (1) b(i) ) ≡ ∅ ⇒ c (1) i ̸ = c (1) b(i) . Thus, 1 iteration of IGWL is sufficient to distinguish G 1 and G 2 . Proposition 4. Any number of iterations of IGWL cannot distinguish any 1-hop identical geometric graphs G 1 and G 2 where the underlying attributed graphs are isomorphic. Proof of Proposition 4. Each iteration of IGWL identifies the G-orbit of the 1-hop local neighbourhood N (k=1) i at each node i, but cannot identify G-orbits beyond 1-hop by the construction of IGWL as no geometric information is propagated. G 1 and G 2 being 1-hop identical implies that for all bijections b and all nodes i ∈ V 1 , b(i) ∈ V 2 , the corresponding 1-hop local neighbourhoods N (k) i and N (k) b(i) are identical up to group actions. Thus, the G-orbits of the corresponding geometric multisets g (1) i and g (1) b(i) overlap, i.e. O G (g (1) i ) = O G (g (1) b(i) ) ⇒ c (k) i = c (k) b(i) . Thus, any number of IGWL iterations cannot distinguish G 1 and G 2 . Proposition 5. Assuming geometric graphs are constructed from point clouds using radial cutoffs, GWL can distinguish any geometric graphs G 1 and G 2 where the underlying attributed graphs are non-isomorphic. At most k Max iterations are sufficient, where k Max is the maximum graph diameter among G 1 and G 2 . Proof of Proposition 5. We assume that a geometric graph G = (A, S, ⃗ , ⃗ X) is constructed from a point cloud (S, ⃗ V , ⃗ X) using a radial cutoff r. Thus, the adjacency matrix is defined as a ij = 1 if ∥⃗ x i -⃗ x j ∥ 2 ≤ r, or 0 otherwise, for all a ij ∈ A. Such construction procedures are conventional for geometric graphs in biochemistry and material science. Given geometric graphs G 1 and G 2 where the underlying attributed graphs are non-isomorphic, identify k Max the maximum of the graph diameters of G 1 and G 2 , and chose any arbitrary nodes i ∈ V 1 , j ∈ V 2 . We can define the k Max -hop subgraphs N (kMax) i and N (kMax) j at i and j, respectively. Thus, N were identical up to group actions, the sets  (kMax) i = V 1 for all i ∈ V 1 , (S 1 , ⃗ V 1 , ⃗ X 1 ) and (S 2 , ⃗ V 2 , ⃗ (kMax) i ) ∩ O G (g (kMax) j ) ≡ ∅ ⇒ c (kMax) i ̸ = c (kMax) j . Thus, k Max iterations of GWL are sufficient to distinguish G 1 and G 2 . Theorem 6. GWL is strictly more powerful than IGWL. Proof of Theorem 6. Firstly, we can show that the GWL class contains IGWL if GWL can learn the identity when updating g i for all i ∈ V, i.e. g (t) i = g (t-1) i = g (0) i ≡ (s i , ⃗ v i ). Thus, GWL is at least as powerful as IGWL, which does not update g i . Secondly, to show that GWL is strictly more powerful than IGWL, it suffices to show that there exist a pair of geometric graphs that can be distinguished by GWL but not by IGWL. We may consider any k-hop distinct geometric graphs for k > 1, where the underlying attributed graphs are isomorphic. Proposition 1 states that GWL can distinguish any such graphs, while Proposition 4 states that IGWL cannot distinguish them. An example is the pair of graphs in Figures 1 and 2 . Proposition 7. IGWL has the same expressive power as GWL for fully connected geometric graphs. Proof of Proposition 7. We will prove by contradiction. Assume that there exist a pair of fully connected geometric graphs G 1 and G 2 which GWL can distinguish, but IGWL cannot. If the underlying attributed graphs of G 1 and G 2 are isomorphic, by Proposition 1 and Proposition 4, G 1 and G 2 are 1-hop identical but k-hop distinct for some k > 1. For all bijections b and all nodes i ∈ V 1 , b(i) ∈ V 2 , the local neighbourhoods N (1) i and N (1) b(i) are identical up to group actions, and O G (g (1) i ) = O G (g (1) b(i) ) ⇒ c (1) i = c (1) b(i) . Additionally, there exists some bijection b and some nodes i ∈ V 1 , b(i) ∈ V 2 such that the k-hop subgraphs N (k) i and N (k) b(i) are distinct, and O G (g (k) i ) ∩ O G (g (k) b(i) ) ≡ ∅ ⇒ c (k) i ̸ = c (k) b(i) . However, as G 1 and G 2 are fully connected, for any k, N (1) i = N (k) i and N (1) b(i) = N (k) b(i) are identical up to group actions. Thus, O G (g (1) i ) = O G (g (k) i ) = O G (g (1) b(i) ) = O G (g (k) b(i) ) ⇒ c (1) i = c (k) i = c (k) b(i) = c (k) b(i) . This is a contradiction. If G 1 and G 2 are non-isomorphic and fully connected, for any arbitrary i ∈ V 1 , j ∈ V 2 and any k-hop neighbourhood, we know that N (1) i = N (k) i and N (1) j = N (k) j . Thus, a single iteration of GWL and IGWL identify the same G-orbits and assign the same node colours, i.e. O G (g (1) i ) = O G (g (k) i ) ⇒ c (1) i = c (k) i and O G (g (1) j ) = O G (g (k) j ) ⇒ c (1) j = c (k) j . This is a contradiction.

E PROOFS FOR EQUIVALENCE BETWEEN GWL AND GEOMETRIC GNNS

Our proofs adapt the techniques used in (Xu et al., 2019; Morris et al., 2019) for connecting WL with GNNs. Note that we omit including the relative position vectors ⃗ x ij in GWL and geometric GNN updates for brevity, as relative positions vectors can be merged into the vector features. Theorem 8. Any pair geometric graphs distinguishable by a G-equivariant GNN is also distinguishable by GWL. Proof of Theorem 8. Consider two geometric graphs and H. The theorem implies that if the GNN graph-level readout outputs f (G) ̸ = f (H), then the GWL test will always determine G and H to be non-isomorphic, i.e. G ̸ = H. We will prove by contradiction. Suppose after T iterations, a GNN graph-level readout outputs f (G) ̸ = f (H), but the GWL test cannot decide G and H are non-isomorphic, i.e. G and H always have the same collection of node colours for iterations 0 to T . Thus, for iteration t and t + 1 for any t = 0 . . . T -1, G and H have the same collection of node colours {c (t) i } as well as the same collection of neighbourhood geometric multisets (c (t) i , g (t) i ) , {{(c (t) j , g (t) j ) | j ∈ N i }} up to group actions. Otherwise, the GWL test would have produced different node colours at iteration t + 1 for G and H as different geometric multisets get unique new colours. We will show that on the same graph for nodes i and k, if (c (t) i , g (t) i ) = (c (t) k , g • g (t) k ), we always have GNN features (s (t) i , ⃗ v (t) i ) = (s (t) k , Q g ⃗ v (t) k ) for any iteration t. This holds for t = 0 because GWL and the GNN start with the same initialisation. Suppose this holds for iteration t. At iteration t + 1, if for any i and k, (c (t+1) i , g (t+1) i ) = (c (t+1) k , g • g (t+1) k ), then: (c (t) i , g (t) i ) , {{(c (t) j , g (t) j ) | j ∈ N i }} = (c (t) k , g • g (t) k ) , {{(c (t) j , g • g (t) j ) | j ∈ N k }} By our assumption on iteration t, (s (t) i , ⃗ v (t) i ) , {{(s (t) j , ⃗ v (t) j ) | j ∈ N i }} = (s (t) k , Q g ⃗ v (t) k ) , {{(s (t) j , Q g ⃗ v (t) j ) | j ∈ N k }} As the same aggregate and update operations are applied at each node within the GNN, the same inputs, i.e. neighbourhood features, are mapped to the same output. Thus, (s (t+1) i , ⃗ v (t+1) i ) = (s (t+1) k , Q g ⃗ v (t+1) k ). By induction, if i ) = (c (t) k , g•g (t) k ), we always have GNN node features (s (t) i , ⃗ v (t) i ) = (s (t) k , Q g ⃗ v (t) k ) for any iteration t. This creates valid mappings ϕ s , ϕ v such that s (t) i = ϕ s (c (t) i ) and ⃗ v (t) i = ϕ v (c i ) for any i ∈ V. Thus, if G and H have the same collection of node colours and geometric multisets, then G and H also have the same collection of GNN neighbourhood features (s (t) i , ⃗ v (t) i ) , {{(s (t) j , ⃗ v (t) j ) | j ∈ N i }} = (ϕ s (c (t) i ), ϕ v (c (t) i , g i )) , {{(ϕ s (c (t) j ), ϕ v (c (t) i , g (t) i )) | j ∈ N i }} Thus, the GNN will output the same collection of node scalar features {s (T ) i } for G and H and the permutation-invariant graph-level readout will output f (G) = f (H). This is a contradiction. Similarly, G-invariant GNNs of the form in Equation 4 can be at most as powerful as IGWL. Theorem 24. Any pair of geometric graphs distinguishable by a G-invariant GNN is also distinguishable by IGWL. Proof. The proof follows similarly to the proof for Theorem 8. Proof of Theorem 9. Consider a GNN where the conditions hold. We will show that, with a sufficient number of iterations t, the output of this GNN is equivalent to GWL, i.e. s (t) ≡ c (t) . Let G and H be any geometric graphs which the GWL test decides as non-isomorphic at iteration T . Because the graph-level readout function is injective, i.e. it maps multiset of node scalar features into unique embeddings, it suffices to show that the GNN's neighbourhood aggregation with sufficient iterations, embeds G and H into different multisets of node features. For this proof, we replace G-orbit injective functions with injective functions over the equivalence class generated by the actions of G. Thus, all elements belonging to the same G-orbit will first be mapped to the same representative of the equivalence class, denoted by the square brackets [. . . ], followed by an injective map. The result is G-orbit injective. Let us assume the GNN updates node scalar and vector features as: s (t) i = UPD s (s (t-1) i , ⃗ v (t-1) i ) , AGG {{(s (t-1) i , s (t-1) j , ⃗ v (t-1) i , ⃗ v (t-1) j ) | j ∈ N i }} (22) ⃗ v (t) i = UPD v (s (t-1) i , ⃗ v (t-1) i ) , AGG {{(s (t-1) i , s (t-1) j , ⃗ v (t-1) i , ⃗ v (t-1) j ) | j ∈ N i }} with the aggregation function AGG being G-equivariant and injective, the scalar update function UPD s being G-invariant and injective, and the vector update function UPD v being G-equivariant and injective. The GWL test updates the node colour c i and geometric multiset g (t) i as: c (t) i = h s (c (t-1) i , g (t-1) i ) , {{(c (t-1) j , g (t-1) j ) | j ∈ N i }} , ) | j ∈ N i }} , where h s is a G-invariant and injective map, and h v is a G-equivariant and injective operation (e.g. in equation 6, expanding the geometric multiset by copying). We will show by induction that at any iteration t, there always exist injective functions φ s and φ v such that s (t) i = φ s (c (t) i ) and ⃗ v (t) i = φ v (c (t) i , g i ). This holds for t = 0 because the initial node features are the same for GWL and GNN, c (0) i ≡ s (0) i and g (0) i ≡ (s (0) i , ⃗ v i ) for all i ∈ V(G), V(H). Suppose this holds for iteration t. At iteration t + 1, substituting s (t) i with φ s (c (t) i ), and ⃗ v (t) i with φ v (c (t) i , g (t) i ) gives us s (t+1) i = UPD s (φ s (c (t) i ), φ v (c (t) i , g (t) i )) , AGG {{(φ s (c (t) i ), φ s (c (t) j ), φ v (c (t) i , g (t) i ), φ v (c (t) j , g (t) j )) | j ∈ N i }} ⃗ v (t+1) i = UPD v (φ s (c (t) i ), φ v (c (t) i , g (t) i )) , AGG {{(φ s (c (t) i ), φ s (c (t) j ), φ v (c (t) i , g (t) i ), φ v (c (t) j , g (t) j )) | j ∈ N i }} The composition of multiple injective functions is injective. Therefore, there exist some injective functions g s and g v such that: s (t+1) i = g s (c (t) i , g (t) i ) , {{(c (t) j , g (t) j ) | j ∈ N i }} , ( ) ⃗ v (t+1) i = g v (c (t) i , g i ) , {{(c (t) j , g (t) j ) | j ∈ N i }} , We can then consider: s (t+1) i = g s • h -1 s h s (c (t) i , g i ) , {{(c (t) j , g (t) j ) | j ∈ N i }} , ( ) ⃗ v (t+1) i = g v • h -1 v h v (c (t) i , g i ) , {{(c (t) j , g (t) j ) | j ∈ N i }} , Then, we can denote φ s = g s •h -1 s and φ v = g v •h -1 v as injective functions because the composition of injective functions is injective. Hence, for any iteration t + 1, there exist injective functions φ s and φ v such that s (t+1) i = φ s c (t+1) i and ⃗ v (t+1) i = φ v c (t+1) i , g . At the T -th iteration, the GWL test decides that G and H are non-isomorphic, which means the multisets of node colours {c A weaker set of conditions is sufficient for a G-invariant GNN to be at least as expressive as IGWL. Proposition 25. G-invariant GNNs have the same expressive power as IGWL if the following conditions hold: (1) The aggregation ψ and update ϕ are G-orbit injective, G-invariant multiset functions. (2) The graph-level readout f is an injective multiset function. Proof. The proof follows similarly the proof for Theorem 9. F GEOMETRIC GNN DESIGN SPACE PROOFS Proposition 10. IGWL and G-invariant GNNs cannot decide several geometric graph properties: (1) perimeter, surface area, and volume of the bounding box/sphere enclosing the geometric graph; (2) distance from the centroid or centre of mass; and (3) dihedral angles. Proof of Proposition 10. Following Garg et al. (2020) , we say that a class of models decides a geometric graph property if there exists a model belonging to this class such that for any two geometric graphs that differ in the property, the model is able to distinguish the two geometric graphs. In Figure 4 we provide an example of two geometric graphs that demonstrate the proposition. G 1 and G 2 differ in the following geometric graph properties: • Perimeter, surface area, and volume of the bounding box enclosing the geometric graphfoot_0 : (32 units, 40 units 2 , 16 units 3 ) vs. (28 units, 24 units 2 , 8 units 3 ). • Multiset of distances from the centroid or centre of mass: {0.00, 1.00, 1.00, 2.45, 2.45} vs. {0.40, 1.08, 1.08, 2.32, 2.32}. • Dihedral angles: ∠(ljkm) = (⃗ x jk ×⃗ x lj )•(⃗ x jk ×⃗ x mk ) |⃗ x jk ×⃗ x lj ||⃗ x jk ×⃗ x mk | are clearly different for the two graphs. However, according to Proposition 4 and Theorem 24, both IGWL and G-invariant GNNs cannot distinguish these two geometric graphs, and therefore, cannot decide all these properties. We can also show this by constructing geometric computation trees for any number of IGWL or G-invariant GNN iterations, as illustrated in Figure 3 . We observe that the geometric computation trees of any pair of isomorphic nodes are identical, as all 1-hop neighbourhoods are computationally identical. Therefore, the set of node colours or node scalar features will also be identical, which implies that G 1 and G 2 cannot be distinguished. Proposition 11. I-HASH (m) is G-orbit injective for m = max({|N i | | i ∈ V}), the maximum cardinality of all local neighbourhoods N i in a given dataset. Proof of Proposition 11. As m is the maximum cardinality of all local neighbourhoods N i under consideration, any distinct neighbourhoods N 1 and N 2 must have distinct multisets of m-body scalars. As I-HASH (m) computes scalars involving up to m nodes, it will be able to distinguish any such N 1 and N 2 . Thus, I-HASH (m) is G-orbit injective. Proposition 12. IGWL (k) is at least as powerful as IGWL (k-1) . For k ≤ 5, IGWL (k) is strictly more powerful than IGWL (k-1) . Proof of Proposition 12. By construction, I-HASH (k) computes G-invariant scalars from all possible tuples of up to k formed by the elements of a neighbourhood and the central node. Thus, the I-HASH (k) class contains I-HASH (k-1) , and I-HASH (k) is at least as powerful as I-HASH (k-1) . Thus, the corresponding test IGWL (k) is at least as powerful as IGWL (k-1) . Secondly, to show that IGWL (k) is strictly more powerful than IGWL (k-1) for k ≤ 5, it suffices to show that there exist a pair of geometric neighbourhoods that can be distinguished by IGWL (k) but not by IGWL (k-1) : • Proposition 13. Let G 1 = (A 1 , S 1 , ⃗ X 1 ) and G 2 = (A 2 , S 2 , ⃗ X 2 ) be two geometric graphs with the property that all edges have equal length. Then, IGWL (2) distinguishes the two graphs if and only if WL can distinguish the attributed graphs (A 1 , S 1 ) and (A 1 , S 1 ). Proof of Proposition 13. Let c and k the colours produced by IGWL (2) and WL, respectively, and let i and j be two nodes belonging to any two graphs like in the statement of the result. We prove the statement inductively. Clearly, c (0) i = k (0) i for all nodes i and c (0) i = c (0) j if and only if k (0) i = k (0) j . Now, assume that the statement holds for iteration t. That is c (t) i = c (t) j if and only if k (t) i = k (t) j holds for all i. Note that c (t+1) i = c (t+1) j if and only if c (t) i = c (t) j and {{(c (t) p , ∥⃗ x ip ∥) | p ∈ N i }} = {{(c (t) p , ∥⃗ x jp ∥) | p ∈ N j }}, since the norm of the relative vectors is the only injective invariant that IGWL (2) can compute (up to a scaling). Since all the norms are equal, by the induction hypothesis, this is equivalent to k Proof. Given any y ∈ Y , we can construct the G-invariant function over X, δ y (x) = 0 if y ≃ x and 1 otherwise. Therefore, δ y can be approximated with some ϵ < 0.5 over Y by some function h ∈ C. Hence, h(y) ̸ = h(y ′ ) for any y, y ′ ∈ Y and C is pairwise Y G discriminating. (t) i = k (t) j and {{k (t) p | p ∈ N i }} = {{k (t) | p ∈ N j }}. Therefore, this is equivalent to k (t+1) i = k (t+1) j The following two Lemmas follow from Chen et al. ( 2019) with minor adaptations. Lemma 26. If C is pairwise Y G discriminating, then for all y ∈ Y , there exists a function δ y ∈ C +1 such that for all y ′ , δ y (y ′ ) = 0 if and only if y ≃ y ′ . Proof. For any y 1 , y 2 ∈ Y such that y 1 ̸ ≃ y 2 , let δ y1,y2 be the function that distinguishes y 1 , y 2 . That is δ y1,y2 (y 1 ) ̸ = δ y1,y2 (y 2 ). Then, we can define a function δ y,y ′ ∈ C: δ y,y ′ (x) = |δ y,y ′ (x) -δ y,y ′ (y)| →    = 0 if x ≃ y > 0 if x ≃ y ′ ≥ 0 otherwise (30) This function is already similar to the δ y function whose existence we want to prove. To obtain a function that is strictly positive over all the x ∈ Y with x ̸ ≃ y, we can construct δ y as a sum over all the δ y,y ′ : δ y (x) = y ′ ∈Y,y ′ ̸ ≃y δ y,y ′ (x) →    = 0 if x ≃ y > 0 if x ̸ ≃ y and x ∈ O G (Y ) ⊇ Y ≥ 0 otherwise (31) Given the finite set of functions {δ y,y ′ }, notice that δ y,y ′ (x) = ReLU δ y,y ′ (x) -δ y,y ′ (y) + ReLU δ y,y ′ (y) -δ y,y ′ (x) . Then δ y is obtained by summing all these functions over y ′ ∈ Y with y ′ ̸ ≃ y, so δ y ∈ C +1 . Lemma 27. Let C is a class of G-invariant functions from X → R such that for any y, y ′ ∈ Y ⊆ X, where Y is finite, there is a δ y ∈ C with the property δ y (y ′ ) = 0 if and only if y ≃ y ′ . Then C +1 is universally approximating over Y . Proof. For y ∈ Y , define r y := 1 2 min y ′ ∈Y,y ′ ̸ ≃y δ y (y ′ ). Define the bump function with radius r > 0, b r : R → R as b r (s) = ψ( s r ), where ψ(z) = ReLU(z + 1) + ReLU(1 -z) -2ReLU(z). Define k y := |Y ∩ O G (y)| -1 . Since Y is finite and the intersection with the orbit of y contains y, k y is finite and well-defined. We can define the G-invariant function h from X to R as: h(x) = y∈Y k y f (y)b ry (δ y (x)) (32) Notice that h| Y = f | Y and h ∈ C +1 . Therefore, C +1 is universally approximating. Theorem 17. If C is pairwise Y G discriminating, then C +2 is universally approximating over Y . Proof. Result follows directly from the two Lemmas above. Lemma 28. Let X, Y be topological spaces and h : X × Y → R a continuous function. Then, if Y is compact, f (x) = inf y∈Y h(x, y) is continuous. Proof. The open sets (-∞, a) and (b, ∞) form a basis for the topology of R. Thus, we show that their preimage under f is open. First, notice x ∈ f -1 ((-∞, a)) if and only if (x, y) ∈ h -1 ((-∞, a)) for some y ∈ Y . Therefore, f -1 ((-∞, a)) = p X (h -1 ((-∞, a))), where p X : X × Y → X is the function projecting in the first argument. Since p X is continuous and open, it follows p X (h -1 ((-∞, a))) is open. When x ∈ f -1 ((b, ∞)), it implies that for all y ∈ Y, h(x, y) > b. This means that for all x ∈ f -1 ((b, ∞)) and y ∈ Y , we have (x, y) ∈ h -1 ((b, ∞)). Since h -1 ((b, ∞) is open, then there exists an open box U x,y ×V x,y ⊆ h -1 ((b, ∞) containing (x, y). Then, the union ∪ y∈Y U x,y ×V x,y covers {x} × Y . Since Y is compact, there exists a finite subcover ∪ Kx y k U x,y k × V x,y k of size K x . Then notice that the open set A x := ∩ Kx y k U x,y k is a neighbourhood around x and A x × Y ⊆ h -1 ((b, ∞)). Therefore, A x ⊆ f -1 ((b, ∞)) and since x ∈ f -1 ((b, ∞)) was chosen arbitrarily, f -1 ((b, ∞)) is open. Lemma 31. If C, a class of functions over a compact set Y , can locate every isomorphism class, then C +2 is universal approximating over Y . Proof. Consider any continuous and G-invariant function on X. Since Y ⊆ X is compact, then f is uniformly continuous when restricted to Y . In other words, for all ε > 0, there exists r > 0, such that for all y 1 , y 2 ∈ Y , if d(y 1 , y 2 ) < r, then |f (y 1 ) -f (y 2 )| < ε. Let y ∈ Y and define B G (y, r) := y ′ ∈O G (y) B(y ′ , r) to be the union of all the open balls of radius r on the orbit of y. Using the function δ y from Definition 29, there exists r y such that δ -1 y ([0, r y )) ⊆ B G (y, r) for any y ∈ Y . Since δ y is continuous, δ -1 y ([0, r y )) is open. Therefore, {δ -1 y ([0, r y ))} y∈Y is an open cover for Y . Since Y is compact, we can find a finite subcover {δ -1 y ([0, r y ))} y∈Y0 , where Y 0 is a finite subset of Y . We can now use the functions δ y to construct a set of continuous G-invariant functions that forms a partition of unity for this finite cover. For y 0 ∈ Y 0 we construct the function ϕ y0 (y ′ ) = max(r y0δ y0 (y ′ ), 0) and the function ϕ(y ′ ) = y * ∈Y0 ϕ y * (y ′ ), both of which are continuous. Noticing that supp(ϕ y0 ) = δ -1 y0 ([0, r y0 )) and that ϕ y * (y ′ ) > 0 for any y ′ ∈ Y , the set of functions ψ y0 (y ′ ) = ϕy 0 (y ′ ) ϕ(y ′ ) form a partition of unity with y0∈Y0 ψ y0 (y ′ ) = 1 for all y ′ ∈ Y . Notice that we can write any G-invariant function f as: f (y ′ ) = f (y ′ ) y0∈Y0 ψ y0 (y ′ ) = y0∈Y0 : y ′ ∈δ -1 y 0 ([0,ry 0 )) f (y ′ )ψ y0 (y ′ ) The intuition is that because f is continuous, we can approximate f (y ′ ) in the expression above by the value of f (y 0 ) since y ′ is in the neighbourhood of some y 0 . Thus, the function that approximates f is h(y ′ ) = y0∈Y0 f (y 0 )ψ y0 (y ′ ). We now show that h can approximate f with arbitrary accuracy. If y ′ ∈ δ -1 y0 ([0, r y0 )), then there exists g ∈ G such that d(y ′ , g • y 0 ) < r. Using the fact that f is continous, this implies |f (y ′ ) -f (g • y 0 )| < ε. Because f is invariant, f (y 0 ) = f (g • y 0 ), which implies |f (y ′ ) -f (y 0 )| < ε. Then we have: f (y ′ ) - y0∈Y0 f (y 0 )ψ y0 (y ′ ) = f (y ′ ) - y0∈Y0 : y ′ ∈δ -1 y 0 ([0,ry 0 )) f (y 0 )ψ y0 (y ′ ) (34) = y0∈Y0 : y ′ ∈δ -1 y 0 ([0,ry 0 )) |f (y ′ ) -f (y 0 )| ψ y0 (y ′ ) < ε Finally, to see that h is in C + 2, we can use an MLP with one hidden layer to approximate ψ y0 followed by one final layer to compute the linear combination of the ψ y0 . Theorem 19. If C, a class of functions over Y , is pairwise Y G discriminating, then C +2 can also universally approximate any continuous function over Y . Proof. The proof follows from the two Lemmas above.

G.2 NUMBER OF AGGREGATORS IN CONTINOUS SETTING

Theorem 20. Let X be a smooth n-dim manifold and G an m-dim compact Lie group acting continuously on X. Suppose there exists a smooth submanifold Y of X of the same dimension such that G acts freely on it. Then, any G-orbit injective function f : X → R d requires that d ≥ n -m. Proof. Suppose for the sake of contradiction that there exists an orbit-space injective and continuous function f : X → R m , with m < d. Since Y is a submanifold of the same dimension as X, then f must also be injective over Y . By the Quotient Manifold Theorem (Lee, 2013), Y /G is a topological manifold of dimension d = dim X -dim G. The map f induces an injective function g : Y /G → Addresses R2.6 R m . This map is also continuous because for an open set V ∈ R m , g -1 (V ) = π Y (f -1 (V )). Because f is continuous and π Y is an open map, this set is open. Y R d Y /G R m π Y f ψ -1 g Because Y /G is a manifold, there exist an open set U ⊆ Y /G and a homeomorphism ψ : U → R d . Then the composition h = g • ψ -1 is a continuous and injective map from ψ(U ) ⊆ R m to R m . By the Invariance of Domain Theorem (Bredon, 2013, Corollary 19.9) , h is open and it is a homeomorphism onto its image h(ψ(U )) ⊆ R m . By the Invariance of Dimension Theorem (Bredon, 2013, Corollary 19.10) , d = m. Theorem 21. For n ≥ d -1 > 0 or n = d = 1, any continuous S n × SO(d) orbit-space injective function f : R n×d → R q requires that q ≥ nd -d(d -1)/2. Proof. We now consider the case when G = S n × SO(d). First, notice that the proof above also holds for this group since it is a subgroup of S n × O(d). However, we can obtain a stronger result and show the result holds for n ≥ d -1. In what follows, we reuse the notation from the proof above. We define the set Z ′ = {X ∈ X | ∃ 1 ≤ i 1 < . . . < i d-1 ≤ n s.t x i1 , . . . , x i d-1 are linearly independent }, containing d -1 row-vectors that are linearly independent. Define M X to be the set of all (d -1) × (d -1) minors of the matrix X. Then, we can construct a continuous function h(X) = max m∈M X |m| and notice that Z ′ coincides with the open set h -1 ((0, ∞)). Then, the set V = Y ∩ Z ′ is also open and non-empty. Therefore, V is a submanifold of X of the same dimension and the action of G is well-defined and continuous on W . We can show again this action is free. As in the proof above, we have that P g = I n , so it remains to inspect the case X = XQ g . Any non-trivial rotation Q g must rotate at least a two-dimensional subspace of R d . Since the rows of the matrix X span a (d -1)-dimensional subspace of R d , then Q g cannot leave X invariant unless Q g = I d . Applying Theorem 20 again yields the result. For n = d = 1, ∥•∥ : R 1×d → R is, as before, G orbit-space injective. Theorem 22. For n ≥ d > 0, any continuous S n ×O(d) orbit-space injective function f : R n×d → R q requires that q ≥ nd -d(d -1)/2. Proof. First, suppose that n ≥ d > 1. Consider the subspace Y = {X ∈ X | ∥x i ∥ ̸ = ∥x j ∥, ∀i < j}, where the norm is just the standard Euclidean norm. Consider the function g : X → R given by g(X) = min i<j |∥x i ∥ -∥x j ∥|. By standard analysis, this function is continuous and notice that Y = g -1 ((0, ∞)), which means that Y is open in X. We also define the set Z = {X ∈ X | ∃ i 1 < . . . < i d < n, |det(x i1 , . . . , x i d )| > 0}, containing row-vectors that span R d . As above, this set is the preimage of the absolute determinant over (0, ∞), which makes Z open in X. Then, the set W = Y ∩ Z is also open and non-empty. Therefore, W is a submanifold of X of the same dimension and the action of G is well-defined and continuous on W . We can show this action is free. We investigate the solutions of the equation P g XQ ⊤ g = X ⇐⇒ P g X = XQ g for X ∈ W . Since orthogonal transformations preserve norms and the rows of X have different norms, it follows that P g = I n ⇒ X = XQ g . We know that a subset of the rows of X span the whole of R d . Define the sub-matrix of X containing these rows by X * ∈ R d×d . Then, we have X * Q g = X * ⇒ Q g = (X * ) -1 X * = I d . This proves that the action is free and applying Theorem 20, yields the result. For the trivial case when n = 1, notice that ∥•∥ : R 1×d → R is G orbit-space injective. Proposition 23. Any S n -invariant injective function f : R n×d → R q requires q ≥ nd. Proof. Reusing the notation from above, notice that for all n ≥ 1, S n acts freely on the sub-manifold Y as shown above. Seeing S n as a zero-dimensional Lie group and applying Theorem 20 yields the result.

H ALTERNATIVE GWL FORMULATION FOR SO(2)

The version of GWL presented in the main text makes use of a geometric neighbourhood object g that keeps increasing in size at each iteration. A natural question to ask is whether a more compact encoding of the orientation of a neighbourhood can be achieved. In this section, we show this is possible for SO(2), which has some particularly convenient properties: it is commutative and its (discrete) subgroups are particularly simple. First, we point out why when working with the standard representation of SO(d) acting on vectors in R d , the orientation of a neighbourhood cannot be (injectively) encoded into another vector obtained from permutation invariant aggregation. Definition 32 (Representation). A representation of a group G on a vector space V over is a group homomorphism from G to GL(V ), the general linear group on V . That is, a representation is a map ρ : G → GL(V ), such that ρ(g 1 g 2 ) = ρ(g 1 )ρ(g 2 ). Denote by ρ 0 the usual representation of O(d) on R d . Let X ⊂ R d be a countable feature space, X = {x 1 , . . . , x n } ⊂ X a multi-set of features and associated representations ρ 1 , . . . , ρ n : G → SO(d). Furthermore assume that G acts on R n×d via gX := [ρ 1 (x 1 ), . . . , ρ n (x n )]. We want to find a multi-set function f : X n → R d that is O(d)-equivariant and injective over the orbit of X, denoted by O G (X). A first observation is that if O(d) acts (faithfully) via ρ 0 on R d , then there is no such function due to the presence of discrete rotational symmetries in certain sets of input vectors (i.e. snowflake like structures). Proposition 33 (The Discrete Symmetry Problem). Let X ∈ R n×2 be a point cloud with n points having (discrete) rotational symmetry, i.e. there exists π ∈ S n and g ̸ = id ∈ O(2) s.t. πX = gX. Then, any permutation-invariant and O(d)-equivariant function f : R n×2 → R 2 has the property that f (X) = f (gX) = 0 for all g. Proof. πX = gX =⇒ f (πX) = f (gX) =⇒ f (X) = ρ 0 (g)f (X) =⇒ f (X) = 0 This result easily extends to any dimension d by finding such a set of vectors that span a two dimensional plane and performing a rotation in that plane. The proposition above shows that we need to find an alternative representation ρ. We show that this is possible for G = SO(2) and for the rest of this section assume d = 2. Theorem 34. There exists a representation ρ : G → SO(R 2 ) and a multi-set function HASH v sending X to R 2 \ {0} that is unique for each multi-set X ⊂ X of bounded size and HASH v (gX) = ρ(g)HASH v (X). Furthermore, for any two X 1 , X 2 ⊂ X we have that ∥HASH v (X 1 )∥ = ∥HASH v (X 2 )∥ iff there is g ∈ G such that gX 1 = X 2 . Proof. Let O G (X) be an orbit of X generated by the action of the group G = SO(2) acting on R n×d /S n (i.e. the space of multi-sets of 2D vectors) as in the statement of Theorem 34. Then, SO(2) acts transitively on O G (X) by construction. O G (X) SO(2)/G(X) SO(R 2 ) S 1 ⊂ R 2 SO(2) ϕ X η ωu π ρ σu Denoting by G(X) its isotropy/stabilizer group of X, the function ϕ X : O(X) → SO(2)/G(X) given by ϕ X (Y ) = ϕ X (gX) = G(X) with g ∈ SO(2) is well-defined and, moreover, it is an equivariant homeomorphism of G-spaces (e.g. see Proposition 4.1 in Bredon (1972) ). Notice that because SO(2) is abelian, G(X) is a normal subgroup and SO(2)/G(X) is not only a manifold, but also a (Lie) group. We will show that we can find a faithful orthogonal representation of this quotient group. Because we are working with finite neighbourhoods, the group G(X) is a finite cyclic subgroup (standard result) if X ̸ = 0. Otherwise, G(X) = SO(2). Therefore, let us denote by α(gG(X)) the smallest angle of a rotation in the equivalence class gG(X). If G(X) is a finite cyclic subgroup generated by rotations of an angle θ, then α(SO(2)/G(X)) = [0, θ). If G(X) = SO(2), then α(SO(2)/G(X)) = 0. Thus, we can



The same result applies for the bounding sphere, not shown in the figure.



i, and distinguishes (sub-)graphs via comparing G-orbit injective colouring of g (k)

Theorem 18. If C can universally approximate any continuous G-invariant functions on Y , then C is also pairwise Y G discriminating. With the additional assumption that X = Y , we can also prove the converse. Theorem 19. If C, a class of functions over Y , is pairwise Y G discriminating, then C +2 can also universally approximate any continuous function over Y .

practical implications of scalarisation body order, we evaluate current geometric GNN layers on their ability to discriminate counterexamples from Pozdnyakov et al. (2020). Each counterexample consists of a pair of local neighbourhoods that are indistinguishable when comparing their set of k-body scalars, i.e. I-HASH (k) and geometric GNN layers with body order k cannot distinguish the neighbourhoods. The 3-body counterexample corresponds to Fig.1(b) in (Pozdnyakov et al., 2020), 4-body chiral to Fig.2(e), and 4-body non-chiral to Fig.2(f); the 2-body counterexample is based on the two local neighbourhoods in our running example.

L=10 100.0 ± 0.0 100.0 ± 0.0 100.0 ± 0.0 100.0 ± 0.0 Table

20.0 95.0 ± 15.0 95.0 ± 15.0

Up to k iterations, GWL cannot distinguish any k-hop identical geometric G 1 and G 2 where the underlying attributed graphs are isomorphic. Proof of Proposition 2. The k-th iteration of GWL identifies the G-orbit of the k-hop subgraph N (k) i at each node i via the geometric multiset g (k)i . G 1 and G 2 being k-hop identical implies that for all bijections b and all nodes i ∈ V 1 , b(i) ∈ V 2 , the corresponding k-hop subgraphs N are identical up to group actions. Thus, the G-orbits of the corresponding geometric multisets g overlap, i.e. O G (g

and N (kMax) j = V 2 for all j ∈ V 2 . Due to the assumed construction procedure of geometric graphs, N

2 ) would have yielded isomorphic graphs.The k Max -th iteration of GWL identifies the G-orbit of the k Max -hop subgraph N (kMax) i at each node i via the geometric multiset g j are distinct for any arbitrary nodes i ∈ V 1 , j ∈ V 2 , the G-orbits of the corresponding geometric multisets g .e. O G (g

Proposition 9. G-equivariant GNNs have the same expressive power as GWL if the following conditions hold: (1) The aggregation AGG is an injective, G-equivariant multiset function. (2) The scalar part of the update UPD s is a G-orbit injective, G-invariant multiset function. (3) The vector part of the update UPD v is an injective, G-equivariant multiset function. (4) The graph-level readout f is an injective multiset function.

} are different for G and H. The GNN's node scalar features {s (T ) i } = {φ s (c (T ) i )} must also be different for G and H because of the injectivity of φ s .

Figure 4: Two geometric graphs for which IGWL and GNNs cannot distinguish their perimeter, surface area, volume of the bounding box/sphere, distance from the centroid, and dihedral angles. The centroid is denoted by a red point and distances from it are denoted by dotted red lines. The bounding box enclosing the geometric graph is denoted by the dotted green lines.

For k = 3 and G = O(3) or SO(3), for the local neighbourhood from Figure 1 in Schütt et al. (2021), two configurations with different angles between the neighbouring nodes can be distinguished by IGWL (3) but not by IGWL (2) . • For k = 4 and G = O(3) or SO(3), the pair of local neighbourhoods from Figure 1 in Pozdnyakov et al. (2020) can be distinguished by IGWL (4) but not by IGWL (3) . • For k = 5 and G = O(3), the pair of local neighbourhoods from Figure 2(e) in Pozdnyakov et al. (2020) can be distinguished by IGWL (5) but not by IGWL (4) . • For k = 5 and G = SO(3), the pair of local neighbourhoods from Figure 2(f) in Pozdnyakov et al. (2020) can be distinguished by IGWL (5) but not by IGWL (4) .

UNIVERSALITY AND DISCRIMINATION PROOFS G.1 EQUIVALENCE BETWEEN UNIVERSALITY AND DISCRIMINATION The results in this subsection use the proofs from Chen et al. (2019) with minor adaptations. Theorem 16. If C is universally approximating over Y , then C is also pairwise Y G discriminating.

Otherwise, we say G 1 and G 2 are k-hop identical if all N

Counterexamples from Pozdnyakov et al. (

annex

Theorem 18. If C can universally approximate any continuous G-invariant functions on Y , then C is also pairwise Y G discriminating.Proof. Consider y, y ′ ∈ Y such that y ̸ ≃ y ′ . Then, the function δ y (x) = inf g∈G d(y, gx) = min g∈G d(y, gx) > 0, where the second equality follows from the compactness of G. This function is G-invariant. To show that it is continuous, notice that the function h(x, g) = d(y, gx) is given by the composition d y • a, where a : X × G → X is the continuous group action and d y : X → R is given by d y (x) = d(y, x), which is also continuous. Since composition of continuous functions is continuous and δ y (x) = inf g∈G h(x, G), it follows from Lemma 28 that δ y is a continuous function.Given a universally approximating class of functions C, we can find a function f approximating δ y with precision ϵ < δy(y ′ ) 2and, therefore, f (y) ̸ = f (y ′ ).Definition 29. Let C be a class of functions X → R and Y ⊆ X. We say that C can locate every orbit over Y if for any y ∈ Y and any ε > 0 there exists δ y ∈ C such that:1. For all y ′ ∈ Y, δ y (y ′ ) ≥ 0.2. For all y ′ ∈ Y , if y ≃ y ′ , then δ y (y ′ ) = 0.3. There exists r y > 0 such that if δ y (y ′ ) < r y for any y ′ ∈ Y , then there is a g ∈ G such that d(y ′ , g • y) < ε.Notice that since δ y ∈ C, it is G-invariant and then for any y * ∈ O G (y ′ ), δ y (y ′ ) = δ y (y * ) and there exists g ∈ G such that d(y * , g • y) < ε. Therefore, intuitively one should see δ y as some sort of "distance function" measuring how far all y * ∈ O G (y ′ ) are from the orbit of y. In other words, when δ y (y * ) is low, it means that the entire orbit of y * is close to the orbit of y.Proof. Select an arbitrary y ∈ Y . For y ′ ̸ ≃ y, let δ y,y ′ be the function in C separating y and y ′ . Consider the radius r y,y ′ := 1 2 |δ y,y ′ (y) -δ y,y ′ (y ′ )| > 0 and define the set), the open ball in X centred at y with radius ε. Clearly, y ′ ∈Y A y ′ forms a cover for Y . Since Y is compact, then there exists a finite subcover given by a finite subset Y 0 ⊆ Y such that y ′ ∈Y0 A y ′ .We construct the function δ y (y ′ ) over X as δ y (y ′ ) := y ′ ∈Y0\O G (y * ) δy,y ′ (y * ), where δy,y ′ (y * ) := max( 4 3 r y,y ′ -|δ y,y ′ (y * ) -δ y,y ′ (y ′ )|, 0). Since δ y,y ′ is continuous and G-invariant, so is δ y . Finally, it can be shown that δ y can indeed locate the orbit of y over Y .1. Clearly, δ y (x) ≥ 0 for any x ∈ X.

2.. For any y

Then, there must be a y ′ ∈ Y 0 \ O G (y), such that y * ∈ A y ′ . Therefore, |δ y,y ′ (y * ) -δ y,y ′ (y ′ )| < r y,y ′ < 4 3 r y,y ′ . Then, we have 4 3 r y,y ′ -|δ y,y ′ (y * )δ y,y ′ (y ′ )| > 4 3 r y,y ′ -r y,y ′ = 1 3 r y,y ′ > 0 ⇒ δy,y ′ (y * ) > 1 3 r y,y ′ . Therefore, we can set r y := 1 3 min y ′ ∈Y0\O G (y) r y,y ′ > 0. If δ y (y * ) < r y it follows that for all y ′ ∈ Y 0 \ O G (y * ), δy,y ′ (y * ) < 1 3 r y,y ′ , which implies y * ∈ g∈G B(g • y, ε). Finally, this proves there is a g ∈ G such that d(y * , g • y) < ε.Since the absolute value function can be realised using ReLU activations, it is easy to see that δ y ∈ C +1 . construct the orthogonal representation η : SO(2)/G(X) → SO(R d ) given by η(gG(X)) = R α(SO(2)/G(X))2π/θ if G(X) is finite and η(gG(X)) = I, otherwise. It can be checked that in either case, η is an injective group homomorphism and, therefore, a faithful representation of the group.Given u = [1, 0] ∈ S 1 ⊂ R 2 , we can define a map ω u : SO(R d ) → S 1 given by ω u (R α ) = R α u, which is another homeomorphism. Thus, the composition isis an injective function. Moreover, because the quotient map π : SO(2) → SO(2)/G(X) is a group homomorphism, we can compose it with η to lift the representation of SO(2)/G(X) to a representation ρ = η • π of SO(2). Therefore, it follows that ψ is also equivariant with respect to this representation of SO(2).Finally, we can extend ψ to a function that is injective for any multi-set by constructing a ψ O G (X) for each orbit O G (X), where X is a representative of the orbit, by using a map ω O G (X) where u = [I-HASH(X), 0] (i.e. the norm encodes the orbit).Alternative SO(2) Geometric Weisfeiler-Leman Test. Given a geometric graph (G, X, H), the SO(2)-GWL algorithm consists of three steps:1. For each node v ∈ G and u ∈ N (v), initialise:2. For each node v ∈ G, update m l+1 v := HASH v (m l u1 , . . . , m l u d(v) ). 3. Go back to step 2 until the partition induced by {{∥m l v ∥ : v ∈ G}} becomes stable. 4. Return the colours of the nodes {{∥m L v ∥ : v ∈ G}} at the last iteration L.

