S-SOLVER: NUMERICALLY STABLE ADAPTIVE STEP SIZE SOLVER FOR NEURAL ODES

Abstract

A neural ordinary differential equation (ODE) is a relation between an unknown function and its derivatives, where the ODE is parameterized by a neural network. Therefore, to obtain a solution to a neural ODE requires a solver that performs numerical integration. Dopri5 is one of the most popular neural ODE solvers and also the default solver in torchdiffeq, a PyTorch library of ODE solvers. It is an adaptive step size solver based on the Runge-Kutta (RK) numerical methods. These methods rely on estimation of the local truncation error to select and adjust integration step size, which determines the numerical stability of the solution. A step size that is too large leads to numerical instability, while a step size that is too small may cause the solver to take unnecessarily many steps, which is computationally expensive and may even cause rounding error build up. Therefore, accurate local truncation error estimation is paramount for choosing an appropriate step size to obtain an accurate, numerically stable, and fast solution to the ODE. In this paper we propose a novel local truncation error approximation that is the first to consider solutions of four different RK orders to obtain a more reliable error estimate. This leads to a novel solver S-SOLVER (Stable Solver), which is more numerically stable; and therefore accurate. We demonstrate S-SOLVER's competitive performance in experiments on image recognition with ODE-Net, learning hamiltonian dynamics with Symplectic ODE-Net, and continuous normalizing flows (CNF).

1. INTRODUCTION

Neural ODEs are continuous depth deep learning models that combine neural networks and ODEs. Since their first introduction in (Chen et al., 2018) , they have been used in many applications such as: stochastic differential equations (Li et al., 2020) , physically informed modeling (Sanchez-Gonzalez et al., 2019; Zhong et al., 2020) , free-form continuous generative models (Grathwohl et al., 2019; Finlay et al., 2020) , mean-field games (Ruthotto et al., 2020) , and irregularly sampled time-series (Rubanova et al., 2019) . Neural ODEs parameterize the derivative of the hidden state using a neural network; and therefore, learn non-linear mappings via differential equations. A differential equation is a relation between an unknown function and its derivatives. Ordinary differential equations describe the change of only one variable (as opposed to multiple) with respect to time, i.e.: dx/dt = f (t, x). Typically, an ODE is formulated as an initial value problem (IVP), which has the following form. Given a function derivative dx/dt, a time interval t = (a, b) and an initial value (e.i.: x at time t = a), the solution to the IVP yields x evaluated at time t = b. The method for approximating x(b) is numerical integration; therefore, all the various ODE solvers include different methods for performing integration. Adaptive step size solvers are amongst the most popular solvers for neural ODEs. In fact, the default solver in torchdiffeq (a library of ODE solvers implemented in PyTorch) is Dopri5, the Dormand-Prince 5(4) embedded adaptive step size method of the Runge-Kutta (RK) family. Adaptive step size RK solvers perform two approximations: one of order p and another of p -1 and compare them to obtain the local truncation error, which is used to determine the integration step size. Specifically, the error is used to make a decision whether to accept or reject the solution step under the current step size and to decide how to modify the step size for the next step. A step size that is too large leads 1

