LEARNING CONTINUOUS-TIME DYNAMICS BY STOCHASTIC DIFFERENTIAL NETWORKS

Abstract

Learning continuous-time stochastic dynamics is a fundamental and essential problem in modeling sporadic time series, whose observations are irregular and sparse in both time and dimension. For a given system whose latent states and observed data are multivariate, it is generally impossible to derive a precise continuous-time stochastic process to describe the system behaviors. To solve the above problem, we apply Variational Bayesian method and propose a flexible continuous-time stochastic recurrent neural network named Variational Stochastic Differential Networks (VSDN), which embeds the complicated dynamics of the sporadic time series by neural Stochastic Differential Equations (SDE). VSDNs capture the stochastic dependency among latent states and observations by deep neural networks. We also incorporate two differential Evidence Lower Bounds to efficiently train the models. Through comprehensive experiments, we show that VSDNs outperform state-of-the-art continuous-time deep learning models and achieve remarkable performance on prediction and interpolation tasks for sporadic time series.

1. INTRODUCTION AND RELATED WORKS

Many real-world systems experience complicated stochastic dynamics over a continuous time period. The challenges on modeling the stochastic dynamics mainly come from two sources. First, the underlying state transitions of many systems are often uncertain, as they are placed in unpredictable environment with their states continuously affected by unknown disturbances. Second, the monitoring data collected may be sparse and at irregular intervals as a result of the sampling strategy or data corruption. The sporadic data sequence loses a large amount of information and system behaviors hidden behind the intervals of the observed data. In order to accurately model and analyze dynamics of these systems, it is important to reliably and efficiently represent the continuous-time stochastic process based on the discrete-time observations. In some domains, the derivation of the continuous-time stochastic model relies heavily on human knowledge and many studies focus on its inference problem (Ryder et al., 2018; Särkkä et al., 2015) . But in more domains (e.g., video analysis (Vondrick et al., 2016) and human activity detection (Rubanova et al., 2019) ), it is difficult and sometimes intractable to derive an accurate model to capture the underlying temporal evolution from the collected sequence of data. Although some studies have been made on approximating the stochastic process from the data collected, the majority of these methods define the system dynamics with a linear model (Macke et al., 2011; Yu et al., 2009b; a) , which can not well represent high-dimensional data with nonlinear relationship. Recently, the Neural Ordinary Differential Equation (ODE) studies (Chen et al., 2018; Rubanova et al., 2019; Jia & Benson, 2019; De Brouwer et al., 2019; Yildiz et al., 2019; Kidger et al., 2020) introduce deep learning models to learn an ODE and apply it to approximate continuous-time dynamics. Nevertheless, these methods generally neglect the randomness of the latent state trajectories and posit simplified assumptions on the data distribution (e.g. Gaussian), which strongly limits their capability of modeling complicated continuous-time stochastic processes. Compared to ODE, Stochastic Differential Equation (SDE) (Jørgensen et al., 2020 ) is a more practical solution in modeling the continuous-time stochastic process. Recently there have been some studies on bridging the gap between deep neural networks and SDEs (Ha et al., 2018) . In (Hegde et al., 2019; Liu et al., 2020; Peluchetti & Favaro, 2020; Wang et al., 2019; Kong et al., 2020) , SDEs are introduced to define more robust and accurate deep learning architectures for supervised learning problems (e.g. classification and regression). These studies focus on the design of neural network architectures, and are orthogonal to our work on the modeling of sporadic time series. In (Tzen & Raginsky, 2019a;b) the authors studied the theoretical guarantees of the optimization and inference problems of Neural SDEs. In (Li et al., 2020) , a stochastic adjoint method is proposed to efficiently compute the gradients for neural SDEs. The contributions of this paper are three-fold: 1. We incorporate the continuous-time variants of VAE and IWAE losses into VSDN to train the continuous-time stochastic neural networks with latent state trajectories. 2. We propose the efficient and flexible network architecture of VSDN which can model the complicated stochastic process under high-dimensional sporadic data sequences. 3. We conduct comprehensive experiments to show that VSDN outperforms state-of-the-art deep learning methods on modeling the continuous-time dynamics and achieves remarkable performance in the prediction and interpolation of irregular or sporadic time series. The rest of this paper is organized as follows. In Section 2, we first present the continuous-time variants of VAE loss, and then derive a continuous-time IWAE loss to train continuous-time statespace models with deep neural networks. In Section 3, we propose the deep learning structures of VSDN. Comprehensive experiments are presented in section 4 and conclusion is given in section 5.

2. CONTINUOUS-TIME VARIATIONAL BAYES

In this section, we first introduce the basic notations and formulate our problem. We then define the continuous-time variants of the Variational Auto-Encoding (VAE) and Importance-Weighted Auto-Encoding (IWAE) lower bounds to enable the efficient training of our models. Due to the page limit, we present all deductions in Appendix A.

2.1. BASIC NOTATIONS AND PROBLEM FORMULATION

Throughout this paper, we define X t ∈ R d1 as the continuous-time latent state at time t and Y n ∈ R d2 as the n th discrete-time observed data at time t n . d 1 and d 2 are the dimensions of the latent state and observation, respectively. X <t is the continuous trajectory before time t and X ≤t is the path up to time t. Y n1:n2 is the sequence of data points and X tn 1 :tn 2 is the continuous-time state trajectory from t n1 to t n2 . Y t = {Y n |t n < t} is the historical observations before t and Y t = {Y n |t n ≥ t} is the current and future observations. For simplicity, we also assume that the initial value of the latent state is constant. The results in this paper can be easily extended to the situation that the initial states are also random variables. Given K data sequences {y (1)



In this paper, we propose a new continuous-time stochastic recurrent network called Variational Stochastic Differential Network (VSDN) that incorporates SDEs into recurrent neural model to effectively model the continuous-time stochastic dynamics based only on sparse or irregular observations. Taking advantage of the capacity of deep neural networks, VSDN has higher flexibility and generalizability in modeling the nonlinear stochastic dependency from high-dimensional observations. Compared to Neural ODEs, VSDN incorporates the latent state trajectory to capture the underlying factors of the system dynamics. The trajectory helps to more flexibly model the data distribution and more accurately generate the output data than Neural ODEs. Parallel to the theoretical analysis (Tzen & Raginsky, 2019a;b) and gradient computations (Li et al., 2020), our study focuses more on exploring the feasible variational loss and flexible recurrent architecture for the Neural SDEs to model the sporadic data.

}, i = 1, • • • , K, the target of our study is to learn an accurate continuous-time generative model G that maximizes the log-likelihood:G = arg max G 1 K K i=1 log P G (y (i) 1:ni ).

