NEURAL ODE PROCESSES

Abstract

Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data-points, a fundamental requirement for real-time applications imposed by the natural direction of time. Second, time-series are often composed of a sparse set of measurements that could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. In contrast, Neural Processes (NPs) are a new class of stochastic processes providing uncertainty estimation and fast data-adaptation, but lack an explicit treatment of the flow of time. To address these problems, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive data-dependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of low-dimensional systems from just a few data-points. At the same time, we demonstrate that NDPs scale up to challenging high-dimensional time-series with unknown latent dynamics such as rotating MNIST digits.

1. INTRODUCTION

Many time-series that arise in the natural world, such as the state of a harmonic oscillator, the populations in an ecological network or the spread of a disease, are the product of some underlying dynamics. Sometimes, as in the case of a video of a swinging pendulum, these dynamics are latent and do not manifest directly in the observation space. Neural Ordinary Differential Equations (NODEs) (Chen et al., 2018) , which use a neural network to parametrise the derivative of an ODE, have become a natural choice for capturing the dynamics of such time-series (C ¸agatay Yıldız et al., 2019; Rubanova et al., 2019; Norcliffe et al., 2020; Kidger et al., 2020; Morrill et al., 2020) . However, despite their fundamental connection to dynamics-governed time-series, NODEs present certain limitations that hinder their adoption in these settings. Firstly, NODEs cannot adjust predictions as more data is collected without retraining the model. This ability is particularly important for real-time applications, where it is desirable that models adapt to incoming data points as time passes and more data is collected. Secondly, without a larger number of regularly spaced measurements, there is usually a range of plausible underlying dynamics that can explain the data. However, NODEs do not capture this uncertainty in the dynamics. As many real-world time-series are comprised of sparse sets of measurements, often irregularly sampled, the model can fail to represent the diversity of suitable solutions. In contrast, the Neural Process (Garnelo et al., 2018a; b) family offers a class of (neural) stochastic processes designed for uncertainty estimation and fast adaptation to changes in the observed data. However, NPs modelling time-indexed random functions lack an explicit treatment of time. Designed for the general case of an arbitrary input domain, they treat time as an unordered set and do not explicitly consider the time-delay between different observations. To address these limitations, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes governed by stochastic data-adaptive dynamics. Our probabilistic Neural ODE formulation relies on and extends the framework provided by NPs, and runs parallel to other attempts to incorporate application-specific inductive biases in this class of models such as Attentive NPs (Kim et al., 2019 ), ConvCNPs (Gordon et al., 2019 ), and MPNPs (Day et al., 2020) . We demonstrate that NDPs can adaptively capture many potential dynamics of low-dimensional systems when faced with limited amounts of data. Additionally, we show that our approach scales to high-dimensional time series with latent dynamics such as rotating MNIST digits (Casale et al., 2018) . Our code and datasets are available at https://github.com/crisbodnar/ndp.

2. BACKGROUND AND FORMAL PROBLEM STATEMENT

Problem Statement We consider modelling random functions F : T → Y, where T = [t 0 , ∞) represents time and Y ⊂ R d is a compact subset of R d . We assume F has a distribution D, induced by another distribution D over some underlying dynamics that govern the time-series. Given a specific instantation F of F , let C = {(t C i , y C i )} i∈I C be a set of samples from F with some indexing set I C . We refer to C as the context points, as denoted by the superscript C. For a given context C, the task is to predict the values {y T j } j∈I T that F takes at a set of target times {t T j } j∈I T , where I T is another index set. We call T = {(t T j , y T j )} the target set. Additionally let t C = {t i |i ∈ I C } and similarly define y C , t T and y T . Conventionally, as in Garnelo et al. (2018b) , the target set forms a superset of the context set and we have C ⊆ T. Optionally, it might also be natural to consider that the initial time and observation (t 0 , y 0 ) are always included in C. During training, we let the model learn from a dataset of (potentially irregular) time-series sampled from F . We are interested in learning the underlying distribution over the dynamics as well as the induced distribution over functions. We note that when the dynamics are not latent and manifest directly in the observation space Y, the distribution over ODE trajectories and the distribution over functions coincide. Neural ODEs NODEs are a class of models that parametrize the velocity ż of a state z with the help of a neural network ż = f θ (z, t). Given the initial time t 0 and target time t T i , NODEs predict the corresponding state ŷT i by performing the following integration and decoding operations: z(t 0 ) = h 1 (y 0 ), z(t T i ) = z(t 0 ) + t T i t0 f θ (z(t), t)dt, ŷT i = h 2 (z(t T i )), where h 1 and h 2 can be neural networks. When the dimensionality of z is greater than that of y and h 1 , h 2 are linear, the resulting model is an Augmented Neural ODE (Dupont et al., 2019) with input layer augmentation (Massaroli et al., 2020) . The extra dimensions offer the model additional flexibility as well as the ability to learn higher-order dynamics (Norcliffe et al., 2020) . Neural Processes (NPs) NPs model a random function F : X → Y, where X ⊆ R d1 and Y ⊆ R d2 . The NP represents a given instantiation F of F through the global latent variable z,



Figure 1: Schematic diagram of Neural ODE Processes. Left: Observations from a time series, the context set , are encoded and aggregated to form r which parametrises the latent variables D and L 0 . Middle: A sample is drawn from L 0 and D, initialising and conditioning the ODE, respectively. Each sample produces a plausible, coherent trajectory. Right: Predictions at a target time, t T i , are made by decoding the state of the ODE, l(t T i ) together with t T i . An example is shown with the connected from the ODE position plot to the Predictions plot. Middle & right: the bold lines in each plot refer to the same sample, fainter lines to other samples. All: The plots are illustrations only.

