HOMOTOPY-BASED TRAINING OF NEURALODES FOR ACCURATE DYNAMICS DISCOVERY

Abstract

Conceptually, Neural Ordinary Differential Equations (NeuralODEs) pose an attractive way to extract dynamical laws from time series data, as they are natural extensions of the traditional differential equation-based modeling paradigm of the physical sciences. In practice, NeuralODEs display long training times and suboptimal results, especially for longer duration data where they may fail to fit the data altogether. While methods have been proposed to stabilize NeuralODE training, many of these involve placing a strong constraint on the functional form the trained NeuralODE can take that the actual underlying governing equation does not guarantee satisfaction. In this work, we present a novel NeuralODE training algorithm that leverages tools from the chaos and mathematical optimization communities -synchronization and homotopy optimization -for a breakthrough in tackling the NeuralODE training obstacle. We demonstrate architectural changes are unnecessary for effective NeuralODE training. Compared to the conventional training methods, our algorithm achieves drastically lower loss values without any changes to the model architectures. Experiments on both simulated and real systems with complex temporal behaviors demonstrate NeuralODEs trained with our algorithm are able to accurately capture true long term behaviors and correctly extrapolate into the future.

1. INTRODUCTION

Predicting the evolution of a time varying system and discovering mathematical models that govern it is paramount to both deeper scientific understanding and potential engineering applications. The centuries-old paradigm to tackle this problem was to either ingeniously deduce empirical rules from experimental data, or mathematically derive physical laws from first principles. However, the complexities of the systems of interest have grown so much that these traditional approaches are now often insufficient. This has led to a growing interest in using machine learning methods to infer dynamical laws from data. One school of thought, such as the seminal work of Schmidt & Lipson (2009) or Brunton et al. (2016) , focuses on deducing the exact symbolic form of the governing equations from data using techniques such as genetic algorithm or sparse regression. While these methods are powerful in that they output mathematical equations that are directly human-interpretable, they require prior information on the possible terms that may enter the underlying equation. This hinders the application of symbolic approaches to scenarios where there is insufficient prior information on the possible candidate terms, or complex, nonlinear systems whose governing equations involve non-elementary functions. On the other hand, neural network-based methods, such as Raissi et al. (2018) , leverage the universal approximation capabilities of neural networks to model the underlying dynamics of the system without explicitly involving mathematical formulae. Of the various architectual designs in literature, Neural Ordinary Differential Equations(NeuralODEs) Chen et al. ( 2018) stand out in particular because these seamlessly incorporate neural networks inside ordinary differential equations (ODES), thus bridging the expressibility and flexibility of neural networks with the de facto mathematical language of the physical sciences. Subsequent works have expanded on this idea, including blending NeuralODEs with partial information on the form of the governing equation to produce "grey-box" dynamics model (Rackauckas et al., 2021) , and endowing NeuralODEs with mathematical structures that the system must satisfy (Greydanus et al., 2019; Finzi et al., 2020) . However, despite the conceptual elegance of NeuralODEs, training these models tend to result in long training times and sub-optimal results, a problem that is further exacerbated as the length of the training data grows (Ghosh et al., 2020; Finlay et al., 2020) . Different methods have been proposed to tackle the problem, but majority of these approaches to date involve placing either strong (Choromanski et al., 2020; Hasani et al., 2021) , or semi-strong constraints (Finlay et al., 2020; Kidger et al., 2021) to the functional form the NeuralODE can take -something the underlying governing equation does not guarantee satisfying. Contributions. We introduce a novel training algorithm that does not require architectural constraints to accurately train NeuralODEs on long time series data. As our algorithm is inspired by ideas from the chaos and mathematical optimization literature, we provide a background survey on the ideas involved before providing both a general framework and a specific implementation for our algorithm. Experiments on various systems of difficulties demonstrate that our method always outperforms conventional gradient-descent based training, with resulting trained NeuralODEs having both higher interpolation and extrapolation capabilities than their counterparts. Especially, for the relatively simple Lotka-Volterra system, we report a ×10 2 improvement for interpolation error and a staggering ×10 7 improvement in extrapolation error, showcasing the power of our new approach.

2. BACKGROUND

2.1 NEURAL ORDINARY DIFFERENTIAL EQUATIONS A NeuralODE (Chen et al., 2018) is a model of the form, du dt = U (t, u; θ), u(t = t 0 ) = u 0 (1) where u 0 ∈ R n is the initial condition or input given to the model, and U (...; θ) : R × R n → R n is neural network with parameters θ ∈ R m that governs the dynamics of the model state u ∈ R n over time t ∈ R. The value of the model state at a given time can then be evaluated by numerically integrating equation 1 starting from the initial conditions. In this paper, we concern ourselves with the problem of training NeuralODEs on time series data. Specifically, given an monotonically increasing sequence of time points {t (i) } N i=0 and the corresponding vector-valued measurements { û(i) ∈ R n } N i=0 , we wish to train a NeuralODE on the data to learn the underlying governing equation and forecast future data. Conventionally, NeuralODE training starts with using an ordinary differential equation (ODE) solver to numerically integrate equation 1 to obtain the model state u at given time points : {u (i) (θ)} N i=0 = ODESolve du dt = U (t, u; θ), {t (i) } N i=0 , u 0 with u (i) (θ) being a shorthand for u(t (i) ; θ). Afterwards, the loss function L(θ) : R m → R is computed according to L(θ) = 1 N + 1 i l u (i) (θ) -û(i) where l(u, û) is the pairwise loss function. In this paper, we adopt the widely used mean-squared error function l(u, û) = ||u -û|| 2 /n, but other metrics such as the L1 loss can be used (Finzi et al., 2020; Kim et al., 2021) . Training is performed by minimizing equation 3 via gradient descent. A non-trivial aspect of this process is that computing ∇ θ L requires differentiating the ODESolve operation. This can be done by either directly backpropagating through the internals of the ODE solver algorithm -which returns accurate gradients but is memory intensive -or by the "adjoint method", which computes an auxiliary set of ODEs to obtain gradients at a low memory cost, but can yield inaccurate gradients. In this paper, we embrace recent advances in the field and use the "symplectic-adjoint method", which brings the best of both worlds by having both low memory footprint and improved accuracy guarantees (Matsubara et al., 2021) .

