NEURAL FIELD DISCOVERY DISENTANGLES EQUIV-ARIANCE IN INTERACTING DYNAMICAL SYSTEMS

Abstract

Systems of interacting objects often evolve under the influence of underlying field effects that govern their dynamics, e.g. electromagnetic fields in physics, or map topologies and traffic rules in traffic scenes. While the interactions between objects depend on local information, the underlying fields depend on global states. Pedestrians and vehicles in traffic scenes, for example, follow different traffic rules and social norms depending on their absolute geolocation. The entanglement of global and local effects makes recently popularized equivariant networks inapplicable, since they fail to capture global information. To address this, in this work, we propose to disentangle local object interactions -which are equivariant to global roto-translations and depend on relative positions and orientations-from external global field effects -which depend on absolute positions and orientations. We theorize the presence of latent fields, which we aim to discover without directly observing them, but infer them instead from the dynamics alone. We propose neural fields to learn the latent fields, and model the interactions with equivariant graph networks operating in local coordinate frames. We combine the two components in a graph network that transforms field effects in local frames and operates solely there. Our experiments show that we can accurately discover the underlying fields in charged particles settings, traffic scenes, and gravitational n-body problems, and effectively use them to learn the system and forecast future trajectories.

Input Trajectories

Discovered Field Figure 1 : N-body system simulation with underlying gravitational field. We uncover fields that underlie interacting systems using trajectories only. Systems of interacting objects are omnipresent in nature, with examples ranging from the subatomic to the astronomical scale -including colliding particles and n-body systems of celestial objects-as well as human-centric settings like traffic scenes, governed by social dynamics. The majority of these systems does not evolve in a vacuum, they instead evolve under the influences of underlying fields. For example, electromagnetic fields may govern the dynamics of charged particles. In traffic scenes, the road network and traffic rules govern the actions of traffic scene participants. N-body systems might swirl around supermassive black holes that create gravitational fields. Earlier work on learning interacting systems proposed graph networks (Kipf et al., 2018; Battaglia et al., 2016; Sanchez-Gonzalez et al., 2020) , while state-of-the-art methods for interacting systems propose equivariant graph networks (Walters et al., 2021; Satorras et al., 2021; Kofinas et al., 2021; Brandstetter et al., 2022) to model dynamics while respecting their underlying symmetries. These networks exhibit increased robustness and performance, while maintaining parameter efficiency due to weight sharing. They are, however, not compatible with underlying field effects, since they can only capture local states, such as relative positions, while fields depend on absolute states (positions or orientations). In other words, global fields violate the strict equivariance hypothesis. We are, thus, in need of an augmented notion of equivariance that encapsulates both local and global effects. In many real-world settings, strict SE(3) equivariance, -equivariance to the special Euclidean group of translations and rotations-does not hold. A function f that predicts trajectories in interacting systems is equivariant to translations if f (x) + τ = f (x + τ ) for a translation vector τ , and equivariant to rotations if Qf (x) = f (Qx) for a rotation matrix Q. That is, even if the symmetries exist in a particular setting, they only manifest themselves in local interactions, yet they are entangled with global effects that stem from absolute states. For instance, pedestrians and vehicles in traffic scenes operate within egocentric perspectives; individual local coordinate frames allow for differentiating objects based on relative states. Objects, however, behave differently depending on their absolute geolocation, e.g. people in different countries or cities follow different traffic rules and social norms, and exhibit different driving habits. In such cases, global transformations are not even properly defined; it would be illogical to perform a translation or a rotation within the global coordinate frame of the Earth, since this would coincide with a different location on the globe. Even within a local region of space, though, certain effects may only depend on global object states. N-body systems from physics, for example, exhibit roto-translational symmetries, since gravitational forces only depend on relative positions. Dynamics, however, may be influenced by external gravitational fields, which are either unknown or not subject to transformations. Thus, strict equivariance is violated, since equivariant object interactions are entangled with global field effects. In this work, we disentangle the interactions between objects -which are equivariant to global rototranslations and depend on relative positions and orientations-from external field effects -which depend on absolute positions and orientations. Furthermore, we propose neural fields to model the underlying field effects. Neural fields depend on absolute positions, and potentially orientations, and predict latent force fields. We model discrete object interactions with equivariant local coordinate frame graph networks, and continuous field effects with neural fields. We make the following contributions. First, we introduce the notion of entangled equivariance that intertwines global and local effects, and propose a novel architecture that disentangles equivariant local object interactions from global field effects. Second, we introduce neural fields to discover global latent fields in interacting dynamical systems, and infer them by observing the dynamics alone. Third, we propose an approximately equivariant graph network that extends local coordinate frame graph networks by introducing an auxiliary origin node, resulting in a mixture of global and local coordinate frames. Finally, we conduct experiments on a number of field settings, and observe that explicitly modelling fields is mandatory for effective future forecasting, while their unsupervised discovery opens a window for model explainability. We term our method Aether, inspired by the postulated medium that permeates all throughout space and allows for the propagation of light.

2. BACKGROUND

In this section, we introduce background knowledge on interacting systems, local coordinate frame graph networks, and neural fields, which will serve as a foundation for our method.

2.1. INTERACTING DYNAMICAL SYSTEMS

An interacting dynamical system comprises trajectories of N objects, recorded for T timesteps. The snapshot of the i-th object at timestep t describes the state x t i = [p t i , u t i ], i ∈ {1, . . . , N }, t ∈ {1, . . . , T }, where p denotes the position and u denotes the velocity, using [•, •] to denote vector concatenation along the feature dimension. Interacting dynamical systems can be naturally formalized as spatio-temporal geometric graphs (Battaglia et al., 2016; Kipf et al., 2018; Graber & Schwing, 2020) , G = {G t } T t=1 , with graph snapshots G t = (V t , E t ) at different time steps. The set of graph nodes V t = {v t 1 , . . . , v t N } describes the objects in the system; v t i corresponds to x t i . The set of edges E t ⊆ v t j , v t i | v t j , v t i ∈ V t × V t describes pair-wise object interactions; v t j , v t i corresponds to an interaction from node j to node i. Finally, we denote the graph neighbors of node v i with N (i).

2.2. LOCAL COORDINATE FRAME GRAPH NETWORKS

Local coordinate frame graph networks have been popularized in recent years (Kofinas et al., 2021; Luo et al., 2022) as a method to achieve SE(3) equivariance, due to their low computational overhead and high performance. Kofinas et al. (2021) proposed LoCS and introduced local coordinate frames

