AUTOREGRESSIVE GRAPH NETWORK FOR LEARNING MULTI-STEP PHYSICS

Abstract

In this work, we propose a Autoregressive Graph Network (AGN) that learns forward physics using a temporal inductive bias. Currently, temporal state space information is provided as additional input to a GN when generating roll-out physics simulations. While this relatively increases the network's predictive performance over multiple time steps, a temporal model enables the network to induce and learn temporal biases. In dynamical systems, the arrow of time simplifies possible interactions in the sense that we can assume current observations to be dependent on preceding states. Our proposed GN encodes temporal state information using an autoregressive encoder that can parallelly compute latent temporal embeddings over multiple time steps during a single forward pass. We perform case studies that compare multi-step forward predictions against baseline data-driven one-step GNs as well as multi-step sequential models across diverse datasets that feature different particle interactions. Our approach outperforms the baseline GN and physics-induced GNs in 8 out of 10 and in 8 out of 10 particle physics datasets respectively when conditioned on optimal historical states. Further, through an energy analysis we find that our method not only accumulates the least roll-out error but also conserves energy more efficiently than baseline Graph Transformer Network while having an order of magnitude lesser parameters.

1. INTRODUCTION

In the recent years, there has been a growing interest in learning physics with the help of deep learning coupled with other techniques such as inductive biases, physics informed loss functions and meta-learning (Fragkiadaki et al. (2016) ; Battaglia et al. (2016) ; Xu et al. (2019); Hall et al. (2021) ). Relational networks such as Graph Networks (GNs) can decompose and learn the dynamics of a physics system on the basis of particle interactions within their neighborhoods (Battaglia et al. (2016); Li et al. (2018); Sanchez-Gonzalez et al. (2020) ). Across science and engineering, particle states often contain system and particle specific properties such as mass, density, velocity, particle type, etc. that are required to approximate the dynamics of a system. In general, given the current state of a system of particles along with particle specific local properties and global system properties, it is possible to apply GNs to predict the trajectory of the system (Sanchez-Gonzalez et al. (2018; 2020) ). Often referred to as the forward problem, it assumes knowledge about the physical properties of particles and therefore utilizes the observations to construct a suitable model that predicts the trajectory of the system of particles. The solution to a typical forward dynamics problem governed by an ODE involving particles can be parameterized using a GNN by learning from the current state or by using a history of previous particle states. There are strong benefits to training on entire sequences or multiple time-steps (Mohajerin (2017); Xu et al. ( 2019)) as one-step GNs tend to be unstable and accumulate error in the long-run. While prior work has shown that concatenating history of previous states enables a trained simulator such as a GNN to predict the next state more accurately, a sequential model captures certain symmetries, e.g., arrow of time, conservation of energy, momentum .etc. Sequential models such as RNN, LSTM, GRU and Transformers have been applied to 1D time series and N -body systems (Chen et al. ( 2018 2022)). While appealing choices to model dynamical systems due to their implicit memory mechanisms, they require sequential computations that come with significant memory overhead as the lookback length and/or the dimensionality of the problem increases. 2022). Further, to improve predictive performance, the original feature space is often mapped to a hidden latent space on which Transformers and the family of sequential models operate. The map from the orginal feature space to a latent space at each instant in time is often performed using linear layers such as a MLP. Much of the previous work on Physics based deep learning has found success in learning a latent linear representation of the dynamics in a latent feature space. While MLPs can model the relational information between features as an undirected and fully connected graph, a recurrent or autoregressive learns a causal graph between them. In a particle based system, such a causal graph can encode numerous dependencies between state space variables. Fig. 1 illustrates the common causal relations learned by sequential models and the structural differences in graphs encoded in the latent space. The key difference between using an autoregressive and non-autoregressive graph to map a state space to a latent space condenses between mapping a particle's single k × d dimensional feature vector to a latent space instead of mapping k embeddings. Each of these additional embeddings capture a different causal relation between state space variables. Our central hypothesis therefore is that latent space encoded with a causal graph enables particle based ML surrogate models to learn long range dependencies that obey conservation principles much better than an undirected graph. Therefore, in the absence of a structure on the state space variables, inductive biases such as the next states of a system of particles affecting the initial states are learned, when in practice, such scenarios are not encountered when learning to approximate the dynamics of a forward simulation problem. Figure 1 : Structure of the state space with/without auto-regressive property. The temporal state nodes constitute an asymmetric or a DAG-like temporal graph which further contains a state space graph within each node. The directed edges do not allow interactions between the previous and the next state-variables, while the undirected edges allows such interactions. The key contributions of the proposed approach are as follows: • An Autoregressive Graph Encoder (AGN) that explicitly induces the arrow of time (i.e. previous states affect the future states) bias to capture causal relations between state space variables on the latent space. • The induced temporal bias enables the Graph Network to achieve superior energy conservation and roll-out error accumulation performance across long time steps. • Comparable multi-step prediction performance against a Graph Transformer model whilst requiring an order of magnitude lower parameters.



); Zhang et al. (2020); Han et al. (

To avoiding sequential computations while retaining relational information about previous states, masking has been employed as a successful strategy in enabling feed-forward neural networks to enjoy the best of both worlds (Germain et al. (2015); Van Den Oord et al. (2016); Papamakarios et al. (2017)). Autoregressive models have been widely used across Machine Learning to learn conditional dependencies to model distributions Rezende & Mohamed (2015); Papamakarios et al. (2017) and learn long range predictions using masked/causal Transformer like models (Ghazvininejad et al. (2019); Shi et al. (2020); Han et al. (

