SPATIAL ATTENTION KINETIC NETWORK WITH E(N)-EQUIVARIANCE

Abstract

Neural networks that are equivariant to rotations, translations, reflections, and permutations on n-dimensional geometric space have shown promise in physical modeling-from modeling potential energy surfaces to forecasting the time evolution of dynamical systems. Current state-of-the-art methods employ spherical harmonics to encode higher-order interactions among particles, which are computationally expensive. In this paper, we propose a simple alternative functional form that uses neurally parametrized linear combinations of edge vectors to achieve equivariance while still universally approximating node environments. Incorporating this insight, we design spatial attention kinetic networks with E(n)-equivariance, or SAKE, which are competitive in many-body system modeling tasks while being significantly faster.

1. INTRODUCTION

Encoding the relevant symmetries of systems of interest into the inductive biases of deep learning architectures has been shown to be crucial in physical modeling. Graph neural networks (GNNs) (Kipf and Welling, 2016; Xu et al., 2018; Gilmer et al., 2017; Hamilton et al., 2017; Battaglia et al., 2018) , for instance, preserve permutation equivariance by applying indexing-invariant pooling functions among nodes (particles) and edges (pair-wise interactions) and have emerged to become a powerful workhorse in a wide range of modeling tasks for many-body system (Satorras et al., 2021) . When describing not only the topology of the system but also the geometry of the state, relevant symmetry groups for three-dimensional systems are SO(3) (rotational equivariance), SE(3) (rotational and translational equivariance), and E(3) (additionally with reflectional equivariance). A ubiquitous and naturally invariant first attempt to encode the geometry of such systems is to employ only radial information, i.e., interparticle distances. This alone has empirically shown utility in predicting quantum chemical potential energies, and considerable effort has been made in the fine-tuning of radial filters to achieve quantum chemical accuracy-1 kcal/mol, the empirical threshold to qualitatively reliably predict the behavior of a quantum mechanical system-and beyond (Schütt et al., 2017) . Nonetheless, radial information alone is not sufficient to fully describe node environments-the spatial distribution of neighbors around individual particles. The relative locations of particles around a central node could drastically change despite maintaining distances to these neighbors unaltered. To describe node environments with completeness, one needs to address these remaining degrees of freedom. Current state-of-the-art approaches encode angular distributions by employing a truncated series of spherical harmonics to generate higher-order feature representations; while these models have been shown to be data efficient for learning properties of physical systems, these features are expensive to compute, with the expense growing rapidly with the order of harmonics included (Thomas et al., 2018; Klicpera et al., 2021a; Fuchs et al., 2020; Batzner et al., 2021; Anderson et al., 2019) . The prohibitive cost would prevent this otherwise performant class of models from being employed in materials and drug design, where rapid simulations of large systems are crucial to provide quantitative insights. Here, we design a simple functional form, which we call spatial attention, that uses the norm of a set of neurally parametrized linear combinations of edge vectors to describe the node environment. Though simple in form, easy to engineer, and ultra-fast to compute, spatial attention is capable of universally approximating any functions defined on local node environment while preserving E(n)-invariance/equivariance in arbitrary n-dimensional space. After demonstrating the approximation universality and invariance of spatial attention, we incorporate it into a novel neural network architecture that uses spatial attention to parametrize fictitious velocity and positions equivariantly, which we call a spatial attention kinetic network with E(n)-Equivariance, or SAKE (pronounced saké (sah-keh), like the Japanese rice wine)foot_0 . To demonstrate the robustness and versatility of SAKE, we benchmark its performance on potential energy approximation and dynamical system forecasting and sampling tasks. For all popular benchmarks, compared to stateof-the-art models, SAKE achieves competitive performance on a wide range of invariant (MD17: Table 1 , QM9: Table 3 , ISO17: Table 2 ) and equivariant (N-body charged particle Table 4 , walking motion: Table 6 ) while requiring only a fraction of their training and inference time.

2. BACKGROUND

In this section, we provide some theoretical background on physical modeling, equivariance, and graph neural networks to lay the groundwork for the exposition of spatial attention networks.

2.1. EQUIVARIANCE: PERMUTATIONAL, ROTATIONAL, TRANSLATIONAL, AND REFLECTIONAL

A function f : X → Y is said to be equivariant to a symmetry group G if f (T g (x)) = S g (f (x)) ∀ g ∈ G and some equivalent transformations on the two spaces respectively T g : X → X and S g : Y → Y. If on a n-dimensional space X = Y = R n , we have permutation P and T g (x) = P x, S g (y) = P y satisfying Equation 1, we say f is permutationally equivariant; if T g (x) = xR where R ∈ R n×n is a rotation matrix RR T = I, and S g (y) = yR we say f is rotationally equivariant; if T g (x) = x + ∆x and S g (y) = y + ∆x, where x ∈ R n we say f is translationally equivariant; finally, if  T g (x) = Ref θ (x) v denoting the feature of node v at the k-th layer (or k-th round of message-passing) and h 0 v ∈ R C the initial node feature on the embedding space, the k-th message-passing step of a GNN can be written as three steps: First, an edge update, h (k+1) euv = ϕ e h (k) u , h (k) v , h (k) euv , (2) where the feature embeddings h u of two connected nodes u and v update their edge feature embedding h euv , followed by neighborhood aggregation, a (k+1) v = ρ e→v ({h (k) euv , u ∈ N (v)}), where edges incident to a node v pool their embeddings to form aggregated neighbor embedding a v , and finally a node update, h (k+1) v = ϕ v (a (k+1) v , h (k) v )



Implementation: https://github.com/choderalab/sake



and S g (y) = Ref θ (y), and Ref θ is a reflection on n-dimensional space, we say f is reflectionally equivariant.2.2 GRAPH NEURAL NETWORKSModern GNNs, which exchanges and summarizes information among nodes and edges, are better analyzed through the spatial rather than spectral lens, according to Wu et al. (2019)'s classification. Following the framework from Gilmer et al. (2017); Xu et al. (2018); Battaglia et al. (2018), for a node v with neighbors u ∈ N (v), in a graph G, with h

