LEARNING PDE SOLUTION OPERATOR FOR CONTINUOUS MODELING OF TIME-SERIES Anonymous

Abstract

Learning underlying dynamics from data is important and challenging in many realworld scenarios. Incorporating differential equations (DEs) to design continuous networks has drawn much attention recently, the most prominent of which is Neural ODE. Most prior works make specific assumptions on the type of DEs or restrict them to first or second-order DEs, making the model specialized for certain problems. Furthermore, due to the use of numerical integration, they suffer from computational expensiveness and numerical instability. Building upon recent Fourier neural operator (FNO), this work proposes a partial differential equation (PDE) based framework which improves the dynamics modeling capability and circumvents the need for costly numerical integration. FNO is hard to be directly applied to real applications because it is mainly confined to physical PDE problems. To fill this void, we propose a continuous-in-time FNO to deal with irregularlysampled time series and provide a theoretical result demonstrating its universality. Moreover, we reveal an intrinsic property of PDEs that increases the stability of the model. Several numerical evidence shows that our method represents a broader range of problems, including synthetic, image classification, and irregular timeseries. Our framework opens up a new way for a continuous representation of neural networks that can be readily adopted for real-world applications.

1. INTRODUCTION

The modeling of time-series data plays an important role in various applications in our everyday lives including climate forecasting (Schneider, 2001; Mudelsee, 2019) , medical sciences (Stoffer & Ombao, 2012; Jensen et al., 2014) , and finance (Chatigny et al., 2020; Andersen et al., 2005) . Numerous deep learning architectures (Connor et al., 1994; Hochreiter & Schmidhuber, 1997; Cho et al., 2014) have been developed to learn sequential patterns from diverse time-series datasets. In recent years, leveraging differential equations (DEs) to design continuous networks has attracted increasing attention, first sparked by neural ordinary differential equations (ODEs) (Chen et al., 2018) . Differential equations that characterize the rates of change and interaction of continuously varying quantities have become indispensable mathematical language to describe time-evolving real-world phenomena (Cannon & Dostrovsky, 2012; Sundén & Fu, 2016; Black & Scholes, 2019) . By virtue of their ability to represent and predict the world around us, incorporating differential equations into neural networks has reinvigorated research in continuous deep learning, offering new theoretical perspectives on neural networks. Moreover, they provide memory efficiency, invertibility, and the ability to handle irregular time-series (Rubanova et al., 2019; Chen et al., 2019; Dong et al., 2020) . Despite their eminent success, Neural ODEs have yet to be successfully applied to complex and large-scale tasks due to the limitation of expressiveness of ODEs. To respond to this limitation, there are several works that enhance the expressiveness of Neural ODEs (Gholami et al., 2019; Gu et al., 2021) . Another line of works attempts to introduce more diverse differential equations, such as controlled differential equations (Kidger et al., 2020) , delay differential equations (Zhu et al., 2020; Anumasa & PK, 2021) , and integro-differential equations (Zappala et al., 2022) . In real applications, however, we usually know little about the underlying dynamics of the time evolution system. In general, we are hard to knowledge about how the temporal states evolve, which kind of differential equation it follows, how variables depend on each other, and how high derivatives it contains. Therefore, it is necessary to develop a model that can learn an extended class of differential equations that is able to cover more diverse applications, in a data-driven manner (Holt et al., 2022) . In this work, we propose a partial differential equation (PDE) based novel framework that can learn a broad range of dynamics without prior knowledge of governing equations. PDEs that enjoy relations between the various partial derivatives of multivariable states represent much general dynamics, including ODEs as a special case. There have been several attempts to design neural networks through the lens of PDEs (Eliasof et al., 2021; Ruthotto & Haber, 2020; Ben-Yair et al., 2021; Sun et al., 2020; Kim et al., 2020b) . Most of the prior works have been designed under specific assumptions about the type or structure of the PDEs. As the underlying dynamics are unknown in real-world data, however, it should be oblivious to the knowledge of the underlying PDE structure and needs to be learned from data. Moreover, because the appropriate properties of PDE differ for each given problem (Eliasof et al., 2021) , it is necessary to be able to represent a wide range of PDEs as possible. To this end, we adopt Fourier neural operator (FNO) (Li et al., 2021) , an emerging model that directly parametrizes the PDE solution operator without prior information on the governing PDE. By learning the solution operator, the model automatically learns diverse equations in a completely data-driven way, as well as attains the solution in a single call without using numerical integration that invokes computational expensiveness and numerical instability. Because FNO has been scrutinized on mathematically well-posed PDE problems, adapting FNO to practical applications confronts several drawbacks. First, due to its notion of discrete time, FNO is difficult to directly transfer to irregularly-sampled time-series commonly arising in real-world problems. To render it more suitable for continuous time-series, we propose a continuous-in-time FNO, termed CTFNO, that can be evaluated at arbitrary time points. By learning continuous-time solution operator, CTFNO can flexibly capture diverse time-series data. Moreover, we demonstrate the representational power of CTFNO via a rigorous theoretical proof of the universal approximation theorem. Secondly, we develop a network architecture that guarantees stability and leads to well-posed learning problems. By ensuring the stability of the model, it can defend against noisy observations and alleviate overfitting. We also verify that it enhances the adversarial robustness for image classification. The results of various experiments show that our model provides superior performance on wide array of real-world data with applications in time-series and image classification.

2. BACKGROUND

Fourier Neural Operator Let Ω ⊂ R n be a bounded domain, and A = A Ω; R da and U = U Ω; R du be spaces of functions from Ω to R da and R du , respectively. Fourier neural operator (FNO) (Li et al., 2021) learns a map G : A → U between infinite-dimensional function spaces, with a special focus on PDEs. An input function a ∈ A is any of source or initial functions, and u ∈ U is the corresponding solution. The solution to fairly general PDEs is represented as a convolution operator with kernel G : R n → R du×da called by a Green's function as follows: u (x) = G (a) (x) = Ω G (x -y) a (y) dy, ∀x ∈ Ω. (1) This solution formula elucidates an elegant way to design FNO. The overall computational flow of FNO for approximating the convolution operator (1) is given as a P -→ v 0 L1 -→ v 1 L2 -→ • • • L L -→ v L Q -→ u, for a given depth L. To recover the sorts of universality, the input function a concatenated with position x, denoted by [a(x); x] ∈ R (da+n) , is lifted to a higher dimensional representation v 0 (x) ∈ R dv by P(a)(x) := P [a(x); x] with a matrix P ∈ R dv× (da+n) . Q is a projection operator of the form Q(v)(x) := Qv(x) for Q ∈ R du×dv . Fourier layers L ℓ are defined as follows:  ℓ : U Ω; R dv → U Ω; R dv with kernel κ ℓ : R n → R dv×dv , a linear transform W ℓ : R dv → R dv , R ℓ = F (κ ℓ ) : R n → C dv×dv , Fourier transform F, and an activation function σ : R → R which is componentwisely applied, the ℓ-th Fourier layer L ℓ is defined as follows: ∀v ∈ U Ω; R dv , x ∈ Ω, L ℓ (v) (x) := σ (W ℓ v (x) + K ℓ (v) (x)) = σ W ℓ v (x) + F -1 (R ℓ • (Fv)) (x) , where R ℓ • F (v) : ξ → R ℓ (ξ) (Fv) (ξ) for ξ = (ξ 1 , • • • , ξ n ) ∈ R n . Both Fourier transform F and inverse Fourier transform F -1 are implemented by the fast Fourier transform (Nussbaumer, 1981) with truncated frequencies at maximum modes max |ξ i | ≤ N . Treatment of time-varying problems When applied to time-dependent PDEs, the original FNO can only learn an operator that maps the initial function to a solution for a single fixed time. To



Definition 2.1. (Fourier layers (Li et al., 2021)) For a convolution operator K

