NEURAL JUMP ORDINARY DIFFERENTIAL EQUATIONS: CONSISTENT CONTINUOUS-TIME PREDICTION AND FILTERING

Abstract

Combinations of neural ODEs with recurrent neural networks (RNN), like GRU-ODE-Bayes or ODE-RNN are well suited to model irregularly observed time series. While those models outperform existing discrete-time approaches, no theoretical guarantees for their predictive capabilities are available. Assuming that the irregularly-sampled time series data originates from a continuous stochastic process, the L 2 -optimal online prediction is the conditional expectation given the currently available information. We introduce the Neural Jump ODE (NJ-ODE) that provides a data-driven approach to learn, continuously in time, the conditional expectation of a stochastic process. Our approach models the conditional expectation between two observations with a neural ODE and jumps whenever a new observation is made. We define a novel training framework, which allows us to prove theoretical guarantees for the first time. In particular, we show that the output of our model converges to the L 2 -optimal prediction. This can be interpreted as solution to a special filtering problem. We provide experiments showing that the theoretical results also hold empirically. Moreover, we experimentally show that our model outperforms the baselines in more complex learning tasks and give comparisons on real-world datasets.

1. INTRODUCTION

Stochastic processes are widely used in many fields to model time series that exhibit a random behaviour. In this work, we focus on processes that can be expressed as solutions of stochastic differential equations (SDE) of the form dX t = µ(t, X t )dt + σ(t, X t )dW t , with certain assumptions on the drift µ and the diffusion σ. With respect to the L 2 -norm, the best prediction of a future value of the process is provided by the conditional expectation given the current value. If the drift and diffusion are known or a good estimation is available, the conditional expectation can be approximated by a Monte Carlo (MC) simulation. However, since µ and σ are usually unknown, this approach strongly depends on the assumptions made on their parametric form. A more flexible approach is given by neural SDEs, where the drift µ and diffusion σ are modelled by neural networks (Tzen & Raginsky, 2019; Li et al., 2020; Jia & Benson, 2019) . Nevertheless, modelling the diffusion can be avoided if one is only interested in forecasting the behaviour instead of sampling new paths. An alternative widely used approach is to use Recurrent Neural Networks (RNN), where a neural network dynamically updates a latent variable with the observations of a discrete input time-series. RNNs are successfully applied to tasks for which time-series are regularly sampled, as for example speech or text recognition. However, often observations are irregularly observed in time. The standard approach of dividing the time-line into equally-sized intervals and imputing or aggregating observations might lead to a significant loss of information (Rubanova et al., 2019) . Frameworks that overcome this issue are the GRU-ODE-Bayes (Brouwer et al., 2019) and the ODE-RNN (Rubanova et al., 2019) , which combine a RNN with a neural ODE (Chen et al., 2018) . In standard RNNs, the 1

