ANAMNESIC NEURAL DIFFERENTIAL EQUATIONS WITH ORTHOGONAL POLYNOMIALS PROJECTIONS

Abstract

Neural ordinary differential equations (Neural ODEs) are an effective framework for learning dynamical systems from irregularly sampled time series data. These models provide a continuous-time latent representation of the underlying dynamical system where new observations at arbitrary time points can be used to update the latent representation of the dynamical system. Existing parameterizations for the dynamics functions of Neural ODEs limit the ability of the model to retain global information about the time series; specifically, a piece-wise integration of the latent process between observations can result in a loss of memory on the dynamic patterns of previously observed data points. We propose PolyODE, a Neural ODE that models the latent continuous-time process as a projection onto a basis of orthogonal polynomials. This formulation enforces long-range memory and preserves a global representation of the underlying dynamical system. Our construction is backed by favourable theoretical guarantees and in a series of experiments, we demonstrate that it outperforms previous works in the reconstruction of past and future data, and in downstream prediction tasks. Our code is available at https://github.com/edebrouwer/polyode.

1. INTRODUCTION

Time series are ubiquitous in many fields of science and as such, represent an important but challenging data modality for machine learning. Indeed, their temporal nature, along with the potentially high dimensionality makes them arduous to manipulate as mathematical objects. A long-standing line of research has thus focused on efforts in learning informative time series representations, such as simple vectors, that are capable of capturing local and global structure in such data (Franceschi et al., 2019; Gu et al., 2020) . Such architectures include recurrent neural networks (Malhotra et al., 2017 ), temporal transformers (Zhou et al., 2021) and neural ordinary differential equations (neural ODEs) (Chen et al., 2018) . In particular, neural ODEs have emerged as a popular choice for time series modelling due to their sequential nature and their ability to handle irregularly sampled time-series data. By positing an underlying continuous time dynamic process, neural ODEs sequentially process irregularly sampled time series via piece-wise numerical integration of the dynamics between observations. The flexibility of this model family arises from the use of neural networks to parameterize the temporal derivative, and different choices of parameterizations lead to different properties. For instance, bounding the output of the neural networks can enforce Lipschitz constants over the temporal process (Onken et al., 2021) . The problem this work tackles is that the piece-wise integration of the latent process between observations can fail to retain a global representation of the time series. Specifically, each change to the hidden state of the dynamical system from a new observation can result in a loss of memory about prior dynamical states the model was originally in. This pathology limits the utility of neural ODEs when there is a necessity to retain information about the recent and distant past; i.e. current neural ODE formulations are amnesic. We illustrate this effect in Figure 1 , where we see that backward integration of a learned neural ODE (that is competent at forecasting) quickly diverges, indicating the state only retains sufficient local information about the future dynamics.

Data points

Figure 1 : PolyODE: Illustration of the ability of PolyODE to reconstruct past trajectories. The solid lines show the forecasting trajectories conditioned on past observations for NODE (blue) and PolyODE (red). The dotted line represents the backward reconstruction for the past trajectories conditioned on the latent process at the last observation. We observe that PolyODE is able to accurately reconstruct the past trajectories while NODE quickly diverges. PolyODE is also more accurate in terms of forecasting. One strategy that has been explored in the past to address this pathology is to regularize the model to encourage it to capture long-range patterns by reconstructing the time series from the last observation, using an auto-encoder architecture (Rubanova et al., 2019) . This class of approaches results in higher complexity and does not provide any guarantees on the retention of the history of a time series. In contrast, our work proposes an alternative parameterization of the dynamics function that, by design, captures long-range memory within a neural ODE. Inspired by the recent successes of the HiPPO framework (Gu et al., 2020) , we achieve this by enforcing that the dynamics of the hidden process follow the dynamics of the projection of the observed temporal process onto a basis of orthogonal polynomials. The resulting model, PolyODE, is a new neural ODE architecture that encodes long-range past information in the latent process and is thus anamnesic. As depicted in Figure 1 , the resulting time series embeddings are able to reconstruct the past time series with significantly better accuracy. Contributions (1) We propose a novel dynamics function for a neural ODE resulting in PolyODE, a model that learns a global representation of high-dimensional time series and is capable of longterm forecasting and reconstruction by design. PolyODE is the first investigation of the potential of the HiPPO operator for neural ODEs architectures. (2) Methodologically, we highlight the practical challenges in learning PolyODE and show how adaptive solvers for ODEs can overcome them. Theoretically, we provide bounds characterizing the quality of reconstruction of time series when using PolyODE. (3) Empirically, we study the performance of our approach by assessing the ability of the learnt embeddings to reconstruct the past of the time series and by studying their utility as inputs for downstream predictive tasks. We show that our model provides better time series representations, relative to several existing neural ODEs architectures, based on the ability of the representations to accurately make predictions on several downstream tasks based on chaotic time series and irregularly sampled data from patients in intensive care unit.

Time series modelling in machine learning:

There is vast literature on the use of machine learning for time series modelling and we highlight some of the ideas that have been explored to adapt diverse kinds of models for irregular time series data. Although not naturally well suited to learning representations of such data, there have been modifications proposed to discrete-time models such as recurrent neural networks (Hochreiter and Schmidhuber, 1997; Cho et al., 2014) to handle such data. Models such as mTANs (Shukla and Marlin, 2021) leverage an attention-based approach to interpolate sequences to create discrete-time data from irregularly sampled data. Another strategy has been architectural modifications to the recurrence equations e.g. CT-GRU (Mozer et al., 2017) , GRU-D (Che et al., 2018) and Unitary RNNs (Arjovsky et al., 2016) . Much more closely aligned to our work, and a natural fit for irregularly sampled data is research that uses differential equations to model continuous-time processes (Chen et al., 2018) . By parameterizing the derivative of a time series using neural networks and integrating the dynamics over unobserved time points, this class of models is well suited to handle irregularly sampled data. This includes models such as ODE-RNN (Rubanova et al., 2019) , ODE-LSTM (Lechner and Hasani, 2020) and Neural CDE (Kidger

