EFFICIENT DISCOVERY OF DYNAMICAL LAWS IN SYMBOLIC FORM

Abstract

We propose a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from time-series data of a single observed solution trajectory of the ODE. Our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing laws of a new observed solution in a few forward passes of the model. First, we generate and make available a large dataset of more than 3M ODEs together with more than 63M numerical solutions for different initial conditions that may serve as a useful benchmark for future work on machine learning for dynamical systems. Then we show that our model performs better or on par with existing methods in various test cases in terms of accurate symbolic recovery of the ODE, especially for more complex expressions. Reliably recovering the symbolic form of dynamical laws is important as it allows for further dissemination of the inferred dynamics as well as meaningful modifications for predictions under interventions.

1. INTRODUCTION

Science is commonly described as the "discovery of natural laws through experimentation and observation". Researchers in the natural sciences increasingly turn to machine learning (ML) to aid the discovery of natural laws from observational data alone, which is often abundantly available, hoping to bypass expensive and cumbersome targeted experimentation. While there may be fundamental limitations to what can be extracted from observations alone, recent successes of ML in the entire range of natural sciences provide ample reason for excitement. In this work, we focus on ordinary differential equations, a ubiquitous description of dynamical natural laws in physics, chemistry, and systems biology. For a first order ODE ẏ := ∂y /∂t = f (y, t), we call f (which uniquely defines the ODE) the underlying dynamical law. Informally, our goal is then to infer f in symbolic form given discrete time-series observations of a single solution {y i := y(t i )} n i=1 of the underlying ODE. Contrary to "black-box-techniques" such as Neural Ordinary Differential Equations (NODE) (Chen et al., 2018 ) that aim at inferring a possible f as an arguably opaque neural network, we focus specifically on symbolic regression. From the perspective of the sciences, a law of nature is useful insofar as it is more broadly applicable than to merely describe a single observation. In particular, the reason to learn a dynamical law in the first place is to dissect and understand it as well as to make predictions about situations that differ from the observed one. From this perspective, a symbolic representation of the law (in our case the function f ) has several advantages over block-box representations: they are compact and directly interpretable, they are amenable to analytic analysis, they allow for meaningful changes and thus enable assessment of interventions and counterfactuals. In this work, we develop Neural Symbolic Ordinary Differential Equation (NSODE), a sequence-tosequence transformer to efficiently infer governing ODEs in symbolic form from a single observed solution trajectory that makes use of massive pretraining. We first (randomly) generate a total of >3M scalar, autonomous, non-linear, first-order ODEs, together with a total of >63M numerical solutions from various (random) initial conditions. All solutions are carefully checked for convergence of the numerical integration. This dataset is unprecedented in both its scale and diversity and will be made publicly available alongside the code that was used to generate it. We then devise NSODE, a sequence-to-sequence transformer that maps observed trajectories, i.e., numeric sequences of the form {(t i , y i )} n i=1 , directly to symbolic equations as strings, e.g., "y ** 2+1.64 * cos(y)", which is the prediction for f . This example directly highlights the benefit 1

