CHARACTERISTIC NEURAL ORDINARY DIFFERENTIAL EQUATIONS

Abstract

We propose Characteristic-Neural Ordinary Differential Equations (C-NODEs), a framework for extending Neural Ordinary Differential Equations (NODEs) beyond ODEs. While NODE models the evolution of latent variables as the solution to an ODE, C-NODE models the evolution of the latent variables as the solution of a family of first-order partial differential equations (PDEs) along curves on which the PDEs reduce to ODEs, referred to as characteristic curves. This reduction along characteristic curves allows for analyzing PDEs through standard techniques used for ODEs, in particular the adjoint sensitivity method. We also derive C-NODE-based continuous normalizing flows, which describe the density evolution of latent variables along multiple dimensions. Empirical results demonstrate the improvements provided by the proposed method for irregularly sampled time series prediction on MuJoCo, PhysioNet, and Human Activity datasets and classification and density estimation on CIFAR-10, SVHN, and MNIST datasets given a similar computational budget as the existing NODE methods. The results also provide empirical evidence that the learned curves improve the system efficiency using a lower number of parameters and function evaluations compared with those of the baselines.

1. INTRODUCTION

Deep learning and differential equations share many connections, and techniques in the intersection have led to insights in both fields. One predominant connection is based on certain neural network architectures resembling numerical integration schemes, leading to the development of Neural Ordinary Differential Equations (NODEs) (Chen et al., 2019b) . NODEs use a neural network parameterization of an ODE to learn a mapping from observed variables to a latent variable that is the solution to the learned ODE. A central benefit of NODEs is the constant memory cost, when backward passes are computed using the adjoint sensitivity method rather than backpropagating through individual forward solver steps. Backpropagating through adaptive differential equation solvers to train NODEs will often result in extensive memory use, as mentioned in Chen et al. (2019b) . Moreover, NODEs provide a flexible probability density representation often referred to as continuous normalizing flows (CNFs). However, since NODEs can only represent solutions to ODEs, the class of functions is somewhat limited and may not apply to more general problems that do not have smooth and one-to-one mappings. To address this limitation, a series of analyses based on methods from differential equations have been employed to enhance the representation capabilities of NODEs, such as the theory of controlled differential equations (Kidger et al., 2020) , learning higher-order ODEs (Massaroli et al., 2021 ), augmenting dynamics (Dupont et al., 2019) , and considering dynamics with delay terms (Zhu et al., 2021) . Additionally, certain works consider generalizing the ODE case to partial differential equations (PDEs), such as in Ruthotto & Haber (2020) and Sun et al. (2019) . These PDE-based methods do not use the adjoint method, removing the primary advantage of constant memory cost. This leads us to the central question motivating the work: can we combine the benefits of the rich function class of PDEs with the efficiency of the adjoint method? To do so, we propose a method of continuous-depth neural networks that solves a PDE over parametric curves that reduce the PDE to an ODE. Such curves are known as characteristics, and they define the solution of the PDE in terms of an ODE (Griffiths et al., 2015) . The proposed Characteristic Neural Ordinary Differential Equations (C-NODE) learn both the characteristics and the ODE along the characteristics to solve the PDE over the data space. This allows for a richer class of models while still incorporating the same memory efficiency of the adjoint method. The proposed C-NODE is also an extension of existing methods, as it improves the empirical accuracy of these methods in classification tasks, time series prediction tasks, and image quality in generation tasks. 

2. RELATED WORK

u(t 1 ) = u(t 0 ) + t1 t0 du(t) dt dt = u(t 0 ) + t1 t0 f (u(t), t, θ)dt. Numerical integration can then be treated as a black box, using numerical schemes beyond the forward Euler to achieve higher numerical precision. However, since black box integrators can take an arbitrary number of intermediate steps, backpropagating through individual steps would require too much memory since the individual steps must be saved. Chen et al. (2019b) addressed this problem by using adjoint backpropagation, which has a constant memory usage. For a given loss function on the terminal state t = 1 of the hidden state L(u(t 1 )), the adjoint a(t) is governed by another ODE: da(t) dt = -a(t) ⊺ ∂f (u(t), t, θ) ∂u , a(t 1 ) = ∂L ∂u(t 1 ) , that dictates the gradient with respect to the parameters. The loss L(u(t 1 )) can then be calculated by solving another ODE (the adjoint) rather than backpropagating through the calculations involved in the numerical integration. However, assuming that the hidden state is governed by an ODE imposes a limitation on the expressiveness of the mapping. 2020) additionally proposed and proved a second-order ODE system can efficiently solve the intersecting trajectories problem. We note however that the interpretation of NODE as a continuous form of ResNet is also problematic, owing to the fact that the empirical behavior of the ResNet does not match the theoretical properties (Krishnapriyan et al., 2022; Ott et al., 2021) . As such, alternative interpretations of the process represented by ODE have been considered. In Zhang et al. (2019) , the authors considered an augmentation where the augmented



Figure 1: Comparison of traditional NODE (left) and proposed C-NODE (right). The solution to NODE is the solution to a single ODE, whereas C-NODE represents a series of ODEs that form the solution to a PDE. Each color in C-NODE represents the solution to an ODE with a different initial condition. NODE represents a single ODE, and can only represent u(x, t) along one dimension, for example, u(x = 0, t).

For example, Dupont et al. (2019) describes a notable limitation of NODEs is in the inability to represent dynamical systems with intersecting trajectories. In response to such limitations, many works have tried to increase the expressiveness of the mapping. Dupont et al. (2019) proposed to solve the intersection trajectories problem by augmenting the vector space, lifting the points into additional dimensions; Zhu et al. (2021) included time delay in the equation to represent dynamical systems of greater complexity; Massaroli et al. (2021) proposed to condition the vector field on the inputs, allowing the integration limits to be conditioned on the input; Massaroli et al. (2021) and Norcliffe et al. (

