LATENT NEURAL ODES WITH SPARSE BAYESIAN MULTIPLE SHOOTING

Abstract

Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple largescale benchmark datasets.

1. INTRODUCTION

Dynamical systems, from biological cells to weather, evolve according to their underlying mechanisms, often described by differential equations. In data-driven system identification we aim to learn the rules governing a dynamical system by observing the system for a time interval [0, T ], and fitting a model of the underlying dynamics to the observations by gradient descent. Such optimisation suffers from the curse of length: complexity of the loss function grows with the length of the observed trajectory (Ribeiro et al., 2020) . For even moderate T the loss landscape can become highly complex and gradient descent fails to produce a good fit (Metz et al., 2021) . To alleviate this problem previous works resort to cumbersome heuristics, such as iterative training and trajectory splitting (Yildiz et al., 2019; Kochkov et al., 2021; HAN et al., 2022; Lienen & Günnemann, 2022) . The optimal control literature has a long history of multiple shooting methods, where the trajectory fitting is split into piecewise segments that are easy to optimise, with constraints to ensure continuity across the segments (van Domselaar & Hemker, 1975; Bock & Plitt, 1984; Baake et al., 1992) . Multiple-shooting based models have simpler loss landscapes, and are practical to fit by gradient descent (Voss et al., 2004; Heiden et al., 2022; Turan & Jäschke, 2022; Hegde et al., 2022) . Inspired by this line of work, we develop a shooting-based latent neural ODE model (Chen et al., 2018; Rubanova et al., 2019; Yildiz et al., 2019; Massaroli et al., 2020) . Our multiple shooting formulation generalizes standard approaches by sparsifying the shooting variables in a probabilistic setting to account for irregularly sampled time grids and redundant shooting variables. We furthermore introduce an attention-based (Vaswani et al., 2017) encoder architecture for latent neural ODEs that is compatible with our sparse shooting formulation and can handle noisy and partially observed high-dimensional data. Consequently, our model produces state-of-the-art results, naturally handles the problem with long observation intervals, and is stable and quick to train. Our contributions are: • We introduce a latent neural ODE model with quick and stable training on long trajectories. • We derive sparse Bayesian multiple shooting -a Bayesian version of multiple shooting with efficient utilization of shooting variables and a continuity-inducing prior. • We introduce a transformer-based encoder with novel time-aware attention and relative positional encodings, which efficiently handles data observed at arbitrary time points.

