CONTINUOUS PDE DYNAMICS FORECASTING WITH IMPLICIT NEURAL REPRESENTATIONS

Abstract

Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINO, that models a PDE's flow with continuous-time dynamics of spatially continuous functions. This is achieved by embedding spatial observations independently of their discretization via Implicit Neural Representations in a small latent space temporally driven by a learned ODE. This separate and flexible treatment of time and space makes DINO the first data-driven model to combine the following advantages. It extrapolates at arbitrary spatial and temporal locations; it can learn from sparse irregular grids or manifolds; at test time, it generalizes to new grids or resolutions. DINO outperforms alternative neural PDE forecasters in a variety of challenging generalization scenarios on representative PDE systems.

1. INTRODUCTION

Modeling the dynamics and predicting the temporal evolution of physical phenomena is paramount in many fields, e.g. climate modeling, biology, fluid mechanics and energy (Willard et al., 2022) . Classical solutions rely on a well-established physical paradigm: the evolution is described by differential equations derived from physical first principles, and then solved using numerical analysis tools, e.g. finite elements, finite volumes or spectral methods (Olver, 2014) . The availability of large amounts of data from observations or simulations has motivated data-driven approaches to this problem (Brunton & Kutz, 2022) , leading to a rapid development of the field with deep learning methods. The main motivations for this research track include developing surrogate or reduced order models that can approximate high-fidelity full order models at reduced computational costs (Kochkov et al., 2021) , complementing classical solvers, e.g. to account for additional components of the dynamics (Yin et al., 2021) , or improving low fidelity models (De Avila Belbute-Peres et al., 2020) . Most of these attempts rely on workhorses of deep learning like CNNs (Ayed et al., 2020) or GNNs (Li et al., 2020; Pfaff et al., 2021; Brandstetter et al., 2022) . They all require prior space discretization either on regular or irregular grids, such that they only capture the dynamics on the train grid and cannot generalize outside it. Neural operators, a recent trend, learn mappings between function spaces (Li et al., 2021b; Lu et al., 2021) and thus alleviate some limitations of prior discretization approaches. Yet, they still rely on fixed grid discretization for training and inference: e.g., regular grids for Li et al. (2021b) or a free-form but predetermined grid for Lu et al. (2021) . Hence, the number and / or location of the sensors has to be fixed across train and test which is restrictive in many situations (Prasthofer et al., 2022) . Mesh-agnostic approaches for solving canonical PDEs (Partial Differential Equations) are another trend (Raissi et al., 2019; Sirignano & Spiliopoulos, 2018) . In contrast to physics-agnostic grid-based approaches, they aim at solving a known PDE as usual solvers do, and cannot cope with unknown dynamics. This idea was concurrently developed for computer graphics, e.g. for learning 3D shapes (Sitzmann et al., 2020; Mildenhall et al., 2020; Tancik et al., 2020) and coined as Implicit Neural Representations (INRs) . When used as solvers, these methods can only tackle a single initial value problem and are not designed for long-term forecasting outside the training horizon. Because of these limitations, none of the above approaches can handle situations encountered in many practical applications such as: different geometries, e.g. phenomena lying on a Euclidean plane or an Earth-like sphere; variable sampling, e.g. irregular observation grids that may evolve at train and test time as in adaptive meshing (Berger & Oliger, 1984) ; scarce training data, e.g. when observations are only available at a few spatiotemporal locations; multi-scale phenomena, e.g. in large scaledynamics systems as climate modeling, where integrating intertwined subgrid scales a.k.a. the closure problem is ubiquitous (Zanna & Bolton, 2021) . These considerations motivate the development of new machine learning models that improve existing approaches on several of these aspects. In our work, we aim at forecasting PDE-based spatiotemporal physical processes with a versatile model tackling the aforementioned limitations. We adopt an agnostic approach, i.e. not assuming any prior knowledge on the physics. We introduce DINO (Dynamics-aware Implicit Neural representations), a model operating continuously in space and time, with the following contributions. Continuous flow learning. DINO aims at learning the PDE's flow to forecast its solutions, in a continuous manner so that it can be trained on any spatial and temporal discretization and applied to another. To this end, DINO embeds spatial observations into a small latent space via INRs; then it models continuous-time evolution by a learned latent Ordinary Differential Equation (ODE). Space-time separation. To efficiently encode different sequences, we propose a novel INR parameterization, amplitude modulation, implementing a space-time separation of variables. This simplifies the learned dynamics, reduces the number of parameters and greatly improves performance. Spatiotemporal versatility. DINO combines the benefits of prior models; cf. 

2. PROBLEM DESCRIPTION

Problem setting. We aim at modeling, via a data-driven approach, the temporal evolution of a continuous fully-observed deterministic spatiotemporal phenomenon. It is described by trajectories v : R → V in a set Γ ; we use v t ≜ v(t) ∈ V. We focus on Initial Value Problems, where only v t at any time t is required to infer v t ′ for t ′ > t. Hence, trajectories share the same dynamics but differ by their initial condition v 0 ∈ V. R is the temporal domain and V is the functional space of the form Ω → R n , where Ω ⊂ R p is a compact spatial domain and n the number of observed values. In other words, v t is a spatial function of x ∈ Ω, with vectorial output v t (x) ∈ R n ; cf. examples of Section 5.1. To this end, we consider the setting illustrated in Figure 1 . We observe a finite training set of trajectories D with a free-form spatial observation grid X tr ⊂ Ω and on discrete times t ∈ T ⊂ [0, T ]. At test time, we are only given a new initial condition v 0 , with observed values v 0 | Xts on a new observation grid X ts , potentially different from X tr . Inference is performed on both train and test trajectories given only the initial condition, on a new free-form grid X ′ ⊂ Ω and times t ∈ T ′ ⊂ [0, T ′ ]. Inference grid X ′ comprises observed positions (respectively X tr and X ts for train and test trajectories) and unobserved positions corresponding to spatial interpolation. Note that the inference temporal horizon is larger than the train one: T < T ′ . For simplicity, In-s refers to data in X ′ on the observation grid (X tr for train / X ts for test), Out-s to data in X ′ outside the observation grid; In-t refers to times within the train horizon T ⊂ [0, T ], and Out-t to times in T ′ \ T ⊂ (T, T ′ ], beyond T , up to inference horizon T ′ . T 0 v t v t v 0 v 0

Out-s

In-s < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 y 1 9 I q v r 1 / 2 r B C x 4 R y w a y Y f a y T X ts < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 y 1 9 I q v r 1 / 2 r B C x 4 R y w a y Y f a y T Evaluation scenarios. The desired properties in Section 1 call for spatiotemporally continuous forecasting models. We select six criteria that our approach should meet; cf. column titles of Table 1 . First, the model should be robust to the change of initial condition v 0 , i.e. generalize to test trajectories (col. 1). Second, it should extrapolate beyond the train conditions: in space, on a test observation grid that differs from the train one, i.e. X ′ = X ts ̸ = X tr (In-s) (col. 2), and outside the observed train and test grid, i.e. on X ′ \ X ts , X ′ \ X tr col. 3) ; in time, between train snapshots (col. 5) and beyond the observed train horizon T (Out-t, col. 6). Finally, it should adapt to free-form spatial domains, i.e. to various geometries (e.g. manifolds) or irregular grids (col. 4). See also Figure 1 . < l a t e x i t s h a 1 _ b a s e 6 4 = " J Q 1 A K j h a c 4 v k O M 5 g F x c K q M I G D y I = " > A A A E C n i c n V P N a h R B E K 7 N + B P j T z b x 6 G U 0 C I v I M h t E P Q b 0 4 E U S w U 0 W 0 m H p 6 e 1 s h s w f 0 z 0 h Y d g 3 E H 0 X 8 S K i B / E l f A N 9 A O 9 + X d M J m m A U e 5 i d q q / r + 6 q r u j Y u 0 8 T Y K P r a m Q s u X L x 0 e f 7 K w t V r 1 2 8 s d p e W N 0 1 R V 0 o P V Z E W 1 S i W R q d J r o c 2 s a k e l Z W W W Z z q r X j / i d v f O t C V S Y r 8 p T 0 q 9 U 4 m p 3 m y m y h p A Y 2 7 P b G e 6 a k U t 4 X R N k v y 2 s D M p N 1 T M m 1 G s 7 G w + t A 2 1 s z G 3 Z W o H / E K z x o D b 6 y s L Y v e j 8 9 v x E a x 1 N E k a E I F K a o p I 0 0 5 W d g p S T J 4 t m l A E Z X A d q g B V s F K e F / T j B b A r R G l E S G B 7 u N 3 C m / b o z l 8 p 2 m Y r Z A l x V u B G d J d c A r E V b B d t p D 3 a 1 Z 2 6 J + 0 G 9 Z 0 Z z v C N / Z a G V B L e 0 D / x j u O / H e e q 7 i t 4 v y K L e 3 S Y 6 4 0 Q e U l I 6 4 H y u e q W c n V F / 5 S u 4 V C C c z Z E + x X s B U z j 2 8 j Z I 7 h D r k b k L z / j S M d 6 n z l Y 2 v 6 f m 4 t O d + e x k l b n b a i p 5 w z Z X V N z 3 1 P 1 v 2 5 n O d u 7 R 7 4 g m O m 3 L d D s B v v C 7 p / g v 2 v o u v a a c U W c 4 q u B x b f A 1 Y W u H k 3 T Z O T O Y C H 6 R + c n v W z x u Z q f / C w / + A F / g a r 1 K 5 5 u k V 3 q I d Z f 0 R r 9 I w 2 a I g c r + k d f a R P w a v g b f A + + N C G z n U 8 5 y b 9 t o I v P w F k s t w w < / l a t e x i t > ⌦\X ts < l a t e x i t s h a 1 _ b a s e 6 4 = " J Q 1 A K j h a c 4 v k O M 5 g F x c K q M I G D y I = " > A A A E C n i c n V P N a h R B E K 7 N + B P j T z b x 6 G U 0 C I v I M h t E P Q b 0 4 E U S w U 0 W 0 m H p 6 e 1 s h s w f 0 z 0 h Y d g 3 E H 0 X 8 S K i B / E l f A N 9 A O 9 + X d M J m m A U e 5 i d q q / r + 6 q r u j Y u 0 8 T Y K P r a m Q s u X L x 0 e f 7 K w t V r 1 2 8 s d p e W N 0 1 R V 0 o P V Z E W 1 S i W R q d J r o c 2 s a k e l Z W W W Z z q r X j / i d v f O t C V S Y r 8 p T 0 q 9 U 4 m p 3 m y m y h p A Y 2 7 P b G e 6 a k U t 4 X R N k v y 2 s D M p N 1 T M m 1 G s 7 G w + t A 2 1 s z G 3 Z W o H / E K z x o D b 6 y s L Y v e j 8 9 v x E a x 1 N E k a E I F K a o p I 0 0 5 W d g p S T J 4 t m l A E Z X A d q g B V s F K e F / T j B b A r R G l E S G B 7 u N 3 C m / b o z l 8 p 2 m Y r Z A l x V u B G d J d c A r E V b B d t p D 3 a 1 Z 2 6 J + 0 G 9 Z 0 Z z v C N / Z a G V B L i p 5 w z Z X V N z 3 1 P 1 v 2 5 n O d u 7 R 7 4 g m O m 3 L d D s B v v C 7 p / g v 2 v o u v a a c U W c 4 q u B x b f A 1 Y W u H k 3 T Z O T O Y C H 6 R + c n v W z x u Z q f / C w / + A F / g a r 1 K 5 5 u k V 3 q I d Z f 0 R A = " > A A A D + 3 i c n V P N a h R B E K 7 N a B L j T z b m J F 4 G g y A i y 2 w Q 4 z G g B y / B B N x k Y W d Z e n o 7 m y E 9 P 8 z 0 h I R h 8 R m 8 + A I i g S B 6 8 i V 8 A / E F 9 C 3 8 u q Y T N M E o 9 j A 7 V V / X 9 1 V X d W 2 U 6 7 g 0 Q f C 1 N e N d u T o 7 N 3 9 t 4 f q N m 7 c W 2 0 u 3 t 8 u s K q T q y U x n R T 8 S p d J x q n o m N l r 1 8 0 K J J N J q J 9 p / Z v d 3 D l R R x l n 6 y h z l a p i I S R r v x l I Y Q K P 2 n X A Q J s L s S a H r / n Q U G n V o a l N O w + G o v R J 0 A l 7 + R a P r j J X 1 2 a 1 v X 9 6 + O d 7 M l l q K Q h p T R p I q S k h R S g a 2 J k E l n g F 1 K a A c 2 J B q Y A W s m P c V T W k B 3 A p R C h E C 6 D 5 + J / A G D k 3 h W 8 2 S 2 R J Z N N 4 C T J / u g 5 M h r o B t s / m 8 X 7 G y R f + k X b O m P d s R v p H T S o A a 2 g P 6 N 9 5 p 5 L / z b M V N F Z d X b G i X n n K l M S r P G b E 9 k C 5 X x U q 2 P v + X 2 g 0 U c m D W H m O / g C 2 Z e X o b P n N K 7 p C 9 A c H 7 3 z n S o t a X L r a i H 5 f W k v L t K Z y 0 0 W k q e s A = " > A A A D + 3 i c n V P N a h R B E K 7 N a B L j T z b m J F 4 G g y A i y 2 w Q 4 z G g B y / B B N x k Y W d Z e n o 7 m y E 9 P 8 z 0 h I R h 8 R m 8 + A I i g S B 6 8 i V 8 A / E F 9 C 3 8 u q Y T N M E o 9 j A 7 V V / X 9 1 V X d W 2 U 6 7 g 0 Q f C 1 N e N d u T o 7 N 3 9 t 4 f q N m 7 c W 2 0 u 3 t 8 u s K q T q y U x n R T 8 S p d J x q n o m N l r 1 8 0 K J J N J q J 9 p / Z v d 3 D l R R x l n 6 y h z l a p i I S R r v x l I Y Q K P 2 n X A Q J s L s S a H r / n Q U G n V o a l N O w + G o v R J 0 A l 7 + R a P r j J X 1 2 a 1 v X 9 6 + O d 7 M l l q K Q h p T R p I q S k h R S g a 2 J k E l n g F 1 K a A c 2 J B q Y A W s m P c V T W k B 3 A p R C h E C 6 D 5 + J / A G D k 3 h W 8 2 S 2 R J Z N N 4 C T J / u g 5 M h r o B t s / m 8 X 7 G y R f + k X b O m P d s R v p H T S o A a 2 g P 6 N 9 5 p 5 L / z b M V N F Z d X b G i X n n K l M S r P G b E 9 k C 5 X x U q 2 P v + X 2 g 0 U c m D W H m O / g C 2 Z e X o b P n N K 7 p C 9 A c H 7 3 z n S o t a X L r a i H 5 f W k v L t K Z y 0 0 W k q e s Objective. To satisfy these requirements, we learn the flow Φ of the physical system: Φ : (V × R) → V, (v t , τ ) → Φ τ (v t ) = v t+τ ∀v ∈ Γ, t ∈ R. Learning the flow is a common strategy in sequential models to better generalize beyond the train time horizon. Yet, so far, it has always been learned with discretized models, which poses generalization issues violating our requirements. We describe these issues in Section 3.

3. RELATED WORK

We review current data-driven approaches for PDE modeling and the representative methods listed in Table 1 . We express the forecasting rule using the notations in Eq. (1): t is an arbitrary time; τ is an arbitrary time interval; δt is a fixed, predetermined time interval (as a model hyperparameter). Sequential discretized models. Most sequential dynamics models are learned on a fixed observed grid X tr and use discretized models, e.g. CNN or GNN to process the observations. CNNs require observations on a regular grid but can be extended to irregular grids through interpolation (Chae et al., 2021) . GNNs are more flexible as they handle irregular grids, at an additional memory and computational cost. Yet, prediction on new grids X ′ ̸ = X tr fails experimentally for both CNNs and GNNs, as these discretized models are biased towards the training grid X tr , as we later show in Section 5. We distinguish two types of temporal models which both extrapolate beyond the train horizon due to their sequential nature. Long et al., 2018; de Bézenac et al., 2018; Pfaff et al., 2021; Brandstetter et al., 2022) predict the sequence from t only at fixed time increments δt and not in between. • Time-continuous extensions using numerical solvers Yin et al., 2021; Iakovlev et al., 2021) solve this limitation as they provide a prediction at arbitrary times, thus remove dependency on the time discretization. • Autoregressive models v t | X → v t+δt | X ( (v t | X , τ ) → v t+τ | X ( Operator learning. Recently, operator-based models aim at finding a parameterized mapping between functions. They define in theory space-continuous models. First, neural operators (Kovachki et al., 2021) attempt to replace standard convolution with continuous alternatives. Fourier Neural Operator (FNO, Li et al., 2021b) applies convolution in the spectral domain via Fast Fourier Transformation (FFT). Graph Neural Operator (GNO, Li et al., 2020) performs convolution on a local interaction grid described by a graph. Second, DeepONet (Lu et al., 2021 ) uses a coordinate-based neural network to output a prediction at arbitrary time and space locations given a function observed on a fixed grid. Three types of temporal models were used for operators with some limitations. • The standard approach, v 0 → v t , models the output at a given time t ∈ [0, T ] within the train horizon (Li et al., 2020) . • A sequential extension, v t → v t+δt , was proposed in Li et al. (2021a) . • Finally, a time-continuous  version v 0 → (t ∈ [0, T ] → v t ) in DeepONet propose a solution at arbitrary time and space locations. The first and third approaches are not designed to generalize beyond the train horizon, i.e. when t > T as they are not sequential. The second solves this limitation but is only able to predict solutions from t at fixed time increments of δt and not in-between. Furthermore, all existing approaches make restrictive assumptions on the space discretization. They lack flexibility when encoding spatial observations: FNO is limited to uniform Cartesian observation grids due to FFT, and while one concurrent follow-up alleviates this issue (Li et al., 2022) , it still cannot perform predictions on unobserved spatial locations; GNO does not adapt well to changing observation grids as for the GNN-based models in the previous paragraph; DeepONet is limited to input observations on fixed observation locations. The latter are chosen at random spatial positions but should remain fixed throughout training and testing.

Spatiotemporal INRs.

Another class of models relies on coordinate-based neural networks, called Implicit Neural Representations (INRs, Sitzmann et al., 2020; Fathony et al., 2021; Tancik et al., 2020) . These space-continuous models share a similar objective as operators, despite constituting a separate research field. INRs for spatiotemporal data take time as an input along spatial coordinates. Physics-informed neural networks (PINNs, Raissi et al., 2019) 

4. MODEL

We present DINO, the first space / time-continuous model that tackles all prediction tasks of Section 2, without the above limitations. We specify DINO's inference procedure (Section 4.1), illustrated in 

4.1. INFERENCE MODEL

As explained in Section 2, we aim at estimating the flow Φ in Eq. ( 1), so that our model can be trained on an observed grid X tr and perform inference given a new one X ts , both possibly irregular. To this end, we leverage a space-and time-continuous formulation, independent of a given data discretization. At inference, DINO starts from a single initial condition v 0 ∈ V and uses a flow to forecast its dynamics. DINO first embeds spatial observations from v 0 into a latent vector α 0 of small dimension d α via an encoder of spatial functions E φ : V → R dα (ENC). Then, it unrolls a latent timecontinuous dynamics model f ψ : R dα → R dα given this initial condition (DYN). Finally, it decodes latent vectors via a decoder D ϕ : R dα → V into a spatial function (DEC). At any time t, D ϕ takes as input α t and outputs a function ṽt : Ω → R n . This results in the following model, illustrated in Figure 2 (left) and whose components are detailed in Section 4.2: (ENC) α 0 = E φ (v 0 ), (DYN) dα t dt = f ψ (α t ), (DEC) ∀t, ṽt = D ϕ (α t ). (2)

4.2. COMPONENTS

Encoder: α t = E φ (v t ). The encoder computes a latent vector α t given observation v t at any time t. It is used in two different contexts, respectively for train and test. At train time, given an observed trajectory v T = {v t } t∈T , it will encode any v t into α t (see Section 4.3) . At inference time, only v 0 is available, and then only α 0 is computed to be used as initial value for the dynamics. Given the decoder D ϕ , α t is a solution to the inverse problem D ϕ (α t ) = v t . We solve this inverse problem with autodecoding (Park et al., 2019) . Denoting ℓ dec (ϕ, α t ; v t ) = ∥D ϕ (α t )-v t ∥ 2 2 the decoding loss where ∥•∥ 2 is the euclidean norm of a function and K the number of update steps, auto-decoding defines E φ as: E φ (v t ) = α K t , where α 0 t = α t ; ∀k > 0, α k+1 t = α k t -η∇ αt ℓ dec (ϕ, α k t ; v t ) and φ = ϕ. (3) In practice, we observe a discretization (X tr , X ts ) and accordingly approximate the norm in ℓ dec as in Eq. ( 6). Compared to auto-encoding, auto-decoding underfits less (Kim et al., 2019) and is more flexible: without requiring specialized encoder architecture, it handles free-formed (irregular or on a manifold) observation grids as long as the decoder shares the same property. 4). Decoder: ṽt = D ϕ (α t ). We define a flexible decoder using a coordinatebased INR network with parameters conditioned on α t . An INR I θ : Ω → R n is a space-continuous model parameterized by θ ∈ R d θ defined on domain Ω. It approximates functions independently of the observation grid, e.g. it handles irregular grids and changing observation positions unlike FNO and DeepONet. Thus, it constitutes a flexible alternative to operators suitable to auto-decoding. To implement the conditioning of the INR's parameters, we use a hypernetwork (Ha et al., 2017) h ϕ : R dα → R d θ , as illustrated in Figure 3 . It generates highdimensional parameters θ t ∈ R d θ of the INR given the low-dimensional latent vector α t ∈ R dα . Hence, the decoder D ϕ , parameterized by ϕ, is defined as: ∀x ∈ Ω, ṽt (x) = D ϕ (α t )(x) ≜ I h ϕ (αt) (x). The decoder's predictions at all spatial locations x ∈ Ω thus all depend on α t . We provide further details on the precise implementation in Section 4.4. Dynamics model: dαt dt = f ψ (α t ). Finally, the dynamics model f ψ : R dα → R dα defines a flow via an ODE in the latent space. The initial condition can be defined at any time t by encoding with E φ the corresponding input function v t . Overall flow. Combined altogether, our components define the following flow in the input space that can approximate the data flow Φ in Eq. (1): ∀(t, τ ), (v t , τ ) → D ϕ E φ (v t ) + t+τ t f ψ (α τ ′ ) dτ ′ where α t = E φ (v t ). To summarize, DINO defines a time-continuous latent temporal model with a space-continuous emission function D ϕ , combining the flexibility of space and time continuity. This is fully novel to our knowledge, as prior latent approaches are discretized (cf. Fraccaro (2018) for state-space models).

4.3. TRAINING

We present the training procedure, illustrated in Figure 2 (right), of the previous components. We use a two-stage optimization process, close to recent work in video prediction (Yan et al., 2021) . Given the train sequences D, we first use auto-decoding to obtain the latent vectors α T = {α v t } t∈T ,v∈D and the decoder parameters ϕ. We then learn the parameters of the dynamics ψ by modeling the latent flow over α v t , ∀v ∈ D. We detail this procedure in Appendix D.1, which can be formalized as a two-stage optimization problem that we solve in parallel without inducing training instability (cf. Appendix D.2): min ψ ℓ dyn (ψ, α T ) ≜ E v∈D,t∈T α v t -α v 0 + t 0 f ψ (α v τ ) dτ 2 2 s.t. α T , ϕ = arg min α T ,ϕ ℓ dec (ϕ, α T ) ≜ E v∈D,x∈Xtr,t∈T v t (x) -D ϕ (α v t )(x) 2 2 .

4.4. DECODER IMPLEMENTATION VIA AMPLITUDE-MODULATED INRS

We now specify our implementation of decoder D ϕ in Eq. ( 4). This includes the definition of the INR architecture I θ and of the hypernetwork h ϕ . We introduce for the latter a new method called amplitude modulation, which implements a space-time separation of variables. I θ as FourierNet. We implement I θ as a FourierNet, a state-of-the-art INR architecture, which instantiates a Multiplicative Filter Network (MFN, Fathony et al., 2021) . A FourierNet relies on the recursion in Eq. ( 7), where x ∈ Ω is an input spatial location, z (l) (x) is the hidden feature vector at layer l for x and s ω (l) (x) = [cos(ω (l) x), sin(ω (l) x)] is a Fourier basis: z (0) (x) = s ω (0) (x), z (L) (x) = W (L-1) z (L-1) (x) + b (L-1) , z (l) (x) = W (l-1) z (l-1) (x) + b (l-1) ⊙ s ω (l) (x) for l ∈ 1, L -1 , where we fix W (0) = 0, b (0) = 1, s ω (0) (x) = x and ⊙ is the Hadamard product. Denoting W = [W (l) ] L-1 l=1 , b = [b (l) ] L-1 l=1 , ω = [ω (l) ] L-1 l=1 , we fit a FourierNet to an input function v observed on a grid X by learning {W, b, ω} s.t. ∀x ∈ X , z (L) (x) = v(x). In practice, we observe that fixing ω uniformly sampled performs similarly as learning them, so we exclude them from training parameters. FourierNets are interpretable, a property we leverage to separate time and space via amplitude modulation. Fathony et al. (2021) show that for some M ≫ L ∈ N, there exist a set of coefficients {c (m) j } M m=1 that depend individually on {W, b} as well as a set of parameters {γ (m) } M m=1 that depend individually on those of the filters ω s.t. the j th dimension of z (L) (x) can be expressed as: z (L) j (x) = M m=1 c (m) j s γ (m) (x) + bias. ( ) Eq. ( 8) involves a basis of spatial functions {s γ (m) } M m=1 evaluated on x and the amplitudes of this basis {c (m) j } M m=1 . Note that Eq. ( 8) can be extended to other choices of s ω (l) (Fathony et al., 2021) .  k o K x N l c W r W 4 9 0 H E l / f M 1 V t i 3 z N 7 Z d m K 4 u m u d 2 2 S e Q I j U N n M 1 M / X + w H g 0 C X f 9 w Y d k b / 3 v f y 3 Z e 1 T 2 9 X i 6 X e M 4 S Y o E C C B h k M c j j a K S L U f D Y x R I C S 2 B Z m x C p a V u M G b 7 B A b s M s w 4 y I 6 C 6 / U 3 q b H Z r T F 8 1 a 2 Q l 3 S f l W Z P q 4 R k 7 B v I q 2 7 O Z r v F F l Q X + n P V N N O d s + / 3 G n l R F 1 2 C H 6 N 9 5 B 5 r / y p B a H b d z V G i x r K h W R 6 p J O p d G u y M n 9 n 6 p y V C i J i T 1 h v K K d K P O g z k o K x N l c W r W 4 9 0 H E l / f M 1 V t i 3 z N 7 Z d m K 4 u m u d 2 2 S e Q I j U N n M 1 M / X + w H g 0 C X f 9 w Y d k b / 3 v f y 3 Z e 1 T 2 9 X i 6 X e M 4 S Y o E C C B h k M c j j a K S L U f D Y x R I C S 2 B Z m x C p a V u M G b 7 B A b s M s w 4 y I 6 C 6 / U 3 q b H Z r T F 8 1 a 2 Q l 3 S f l W Z P q 4 R k 7 B v I q 2 7 O Z r v F F l Q X + n P V N N O d s + / 3 G n l R F 1 2 C H 6 N 9 5 B 5 r / y p B a H b d z V G i x r K h W R 6 p J O p d G u y M n 9 n 6 p y V C i J i T 1 h v K K d K P O g z 7 5 y a q 1 d e h t p / K t m C i p + 0 u U 2 + P b H 6 n K d i + F J W 5 1 2 O g 9 1 z 1 T V D R 5 3 1 T 7 p z i W e z O M G + a H m T L U j r 8 i e d X 6 I m 4 f Y / y p K 1 4 4 q t p g o S g 8 c / 3 u q H H K m c k 8 m h x O m x 3 s 9 P H q L j x v j 5 c H w 9 u D W 0 6 C / s o x 2 z e M K r u I 6 b / E d r O A R V j H i H i / w H h / w 0 b v v 7 X i F 9 7 J N n e t 1 n M v 4 Z X m v f w B 7 U c X 4 < / l a t e x i t > ⇥ < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = "  > A A A D u 3 i c n V L L b t R A E K y N g Y T w S u D I x W K F h A C t v B E i X B C R 4 M A F k a B s E i k b I X t 2 s j G Z t S 1 7 H B G t + A I E J / g T f o Y L Z / g E b l S 3 n Q g S 8 R B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > A A A D u 3 i c n V L L b t R A E K y N g Y T w S u D I x W K F h A C t v B E i X B C R 4 M A F k a B s E i k b I X t 2 s j G Z t S 1 7 H B G t + A I E J / g T f o Y L Z / g E b l S 3 n Q g S 8 R B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B

5.1. EXPERIMENTAL SETTING

Datasets. We consider the following PDEs defined over a spatial domain Ω, with further details in Appendix C. • 2D Wave equation (Wave) is a second-order PDE ∂ 2 u ∂t 2 = c 2 ∆u. u is the displacement w.r.t. the rest position and c is the wave traveling speed. We consider its first-order form, so that v t = (u t , ∂ ut ∂t ) has a two-dimensional output (n = 2). • 2D Navier Stokes (Navier-Stokes, Stokes, 1851) corresponds to an incompressible fluid dynamics dv dt = -u∇v + ν∆v + f, v = ∇ × u, ∇u = 0, where u is the velocity field and v the vorticity. ν is the viscosity and f is a constant forcing term; n = 1. • 3D Spherical shallow water (Shallow-Water, Galewsky et al., 2004) : it involves the vorticity w, tangent to the sphere's surface, and the thickness of the fluid h. The input is v t = (w t , h t ); n = 2. Baselines. We reimplement representative models from Section 3 and Table 1 and adapt them to our multi-dimensional datasets. • CNODE (Ayed et al., 2020) combines a CNN and an ODE solver to handle regular grids. • MP-PDE (Brandstetter et al., 2022) uses a GNN to handle free-formed grids, yet is unable to predict outside the observation grid. We developed an interpolative extension, I-MP-PDE, to handle this limitation; it performs bicubic interpolation on the observed grid and training is done on the resulting interpolation. • MNO (Li et al., 2021a) is an autoregressive version of FNO (Li et al., 2021b) for regular grids; it can be evaluated on new uniform grids. • DeepONet (Lu et al., 2021) , considered autoregressively (Wang & Perdikaris, 2021) where we remove time from the trunk net's input, can be evaluated on new spatial locations without interpolation. • SIREN (Sitzmann et al., 2020) and MFN (Fathony et al., 2021) are two INR methods which we extend to fit our setting. We consider an agnostic setting, i.e. without the knowledge of the differential equation, and perform sequence conditioning to generalize to more than a trajectory. This is achieved by learning a latent vector with auto-decoding; it is then concatenated to the spatial coordinates. Tasks. We evaluate models on various forecasting tasks which combine the evaluation scenarios of Section 2. Performance is measured by the prediction Mean Squared Error (MSE) given only an initial condition. • Space and time generalization. We consider a uniform grid X ′ for inference. Training is performed on different observations grids X tr subsampled from X ′ with different ratios, s ∈ {5%, 25%, 50%, 100%} where s = 100% corresponds to the full inference grid, i.e. X tr = X ′ . In this setting, we consider that all trajectories (train and test) share the same observation grid X tr = X ts . We evaluate MSE error on X ′ over the train time interval (In-t) and beyond (Out-t) at each subsampling ratio. • Flexibility w.r.t. input grid. We vary the test observation grid, i.e. X ts ̸ = X tr and perform inference on X ′ = X ts , i.e. on the test observation grid (In-s) under two settings: ▷ Generalizing across grids: X tr , X ts are subsampled differently from the same uniform grid; s tr (resp. s ts ) is the train (resp. test) subsampling ratio. ▷ Generalizing across resolutions: X tr , X ts are subsampled with the same ratio s from two uniform grids with different resolutions; the train resolution is fixed to r tr = 64 while we vary the test resolution r ts ∈ {32, 64, 256}. • Data on manifold. We consider a PDE on a sphere and combine several evaluation scenarios, as described later. • Finer time resolution. We consider an inference time grid T ′ with a finer resolution than the train one T .

5.2. RESULTS

Space and time generalization. We report prediction MSE in Table 2 for varying subsampling ratios s ∈ {5%, 25%, 100%} on Navier-Stokes and Wave. Appendix A provides a fine-grained evaluation inside the train observation grid (In-s) or outside (Out-s) and additionally reports the results for s = 50%. We visualize some predictions in Appendix B. DINO is compared to all baselines when s = 100%, i.e. X ′ = X tr = X ts , and otherwise it is compared only to models which handle irregular grids and prediction at arbitrary spatial locations (DeepONet, SIREN, MFN, I-MP-PDE). • General analysis. We observe that all models degrade when the subsampling ratio s decreases. DINO performs competitively overall: it achieves the best Out-t performance on all subsampling settings, it outperforms all the baselines on low subsampling ratios and performs comparably to the competitive discretized (MP-PDE, CNODE) and operator (MNO) alternatives when s = 100%, i.e. when observation and inference grids are equal. Note that this fully observed setting is favorable for CNODE, MP-PDE and MNO, designed to perform inference on the observation grid. This can be seen in Table 2 , where DINO is slightly outperformed only for few settings. MP-PDE is significantly better only on Wave for In-t. Overall, CNNs and GNNs exhibit good performance for spatially local dynamics like Wave, while INRs (like DINO) and MNO are more adapted to global dynamics like Navier-Stokes. • Analysis per model. MP-PDE is the most competitive baseline across datasets as it combines a strong and flexible encoder (GNNs) to a good dynamics model; however, it cannot predict outside the observation grid (Out-s). To keep a strong competitor, we extend this baseline into its interpolative version I-MP-PDE on subsampled settings. I-MP-PDE is competitive for high subsampling ratios, e.g. s ∈ {50%, 100%} but underperforms w.r.t. DINO at lower subsampling ratios due to the accumulated interpolation error. MNO is a competitive baseline on Navier-Stokes, performing on par with MP-PDE and DINO inside the training horizon (In-t); its performance on Out-t degrades more significantly compared to other models, especially DINO. DeepONet is more flexible than MP-PDE as it can predict at arbitrary locations. As no interpolation error is introduced, it outperforms I-MP-PDE for s = 5% on train data. Yet, we observe that it underperforms especially on Out-t w.r.t. its alternatives. Finally, we observe that SIREN and MFN fit correctly the train horizon In-t on train, yet generalize poorly outside this horizon Out-t or on new initial conditions (test). This is in accordance with our analysis of Section 3; we highlight that this is not the case for DINO which extrapolates temporally and generalizes to new initial conditions thanks to its sequential modeling of the flow. Thus, DINO is currently the state-of-the-art INR model for spatiotemporal data. • Modulation. We observe that modulating both amplitudes and frequencies (row "DINO (no sep.)" in Table 2 ) degrades performance w.r.t. DINO (row "DINO" in Table 2 ) that only modulates amplitudes. Amplitude modulation enables long temporal extrapolation and reduces the number of parameters. Hence, as opposed to DINO (no sep.) which is outperformed by some baselines, time-space variable separation in DINO is an essential ingredient of the model to reach state-of-the-art levels. Reference DINo Flexibility w.r.t. input grid. We consider in Table 3 Navier-Stokes and compare DINO to the most competitive baselines, MP-PDE and MNO (with s = 100% subsampling ratio). • Generalizing across grids. In Table 3a , we consider that the test observation grid X ts is different from the train one X tr . This occurs when sensors differ between two observed trajectories. We vary the subsampling ratio for the train observation grid s tr and the test one s ts . We report test MSE on new grids X ′ = X ts . We observe that DINO is very robust to changing grids between train and test, while MP-PDE's performance degrades, especially for low subsampling ratios, e.g. 5%. For reference, we report in Table 6 Appendix A (result col. 1) the performance when X ′ = X tr , where MP-PDE is substantially better. • Generalizing across spatial resolutions. In Table 3b we vary the test resolution r ts . We train at a resolution r tr = 64 and perform inference at resolutions r ts ∈ {32, 64, 256}. For that, we build a high-fidelity 256×256 simulation dataset and downscale it to obtain the other resolutions. We observe that DINO's performance is the stablest across resolutions in the uniform or irregular setting. MNO is also relatively stable but is only applicable to uniform grids while MP-PDE is particularly brittle, especially for a 5% ratio. t = 0 t = 0 T T T ′ T ′ Model In-t Out-t I-MP-PDE 1.908E-3 7.240E-3 DINO 1.063E-4 6.466E-4 Data on manifold. We consider in Figure 5 Shallow-Water in a super-resolution setting: test resolution is twice the train one, close to weather prediction applications. We observe an irregular 3D Euclidean coordinate grid X tr = X ts ⊂ R 3 shared for train and test. It uniformly samples Euclidean positions on the sphere, via the quasi-uniform skipped latitude-longitude grid (Weller et al., 2012) . We predict the PDE on test trajectories with a conventional latitude-longitude inference grid X ′ . At Earth scale, X tr corresponds to a resolution of about 300 km, and X ′ to 150 km. DINO significantly outperforms I-MP-PDE, making it a viable candidate for this complex setting. Finer time resolution. We consider in Table 4 a longer and ten times finer test temporal grid T ′ than the train grid T on Navier-Stokes. We observe the same spatial uniform grid across train and test and perform inference on this grid. We compare DINO that performs prediction with an ODE solver, to interpolating coarser predictions obtained at the train resolution (I-DINO). We report the corresponding test MSE. We observe that the ODE solver accurately extrapolates outside the train temporal grid, outperforming interpolation. This confirms that DINO benefits from its continuoustime modeling of the flow, providing consistency and stability across temporal resolutions.

6. CONCLUSION

We propose DINO, a novel space-and time-continuous data-driven PDE forecaster. DINO handles free-form spatiotemporal conditions encountered in many applications, where existing methods fail. DINO outperforms recent PDE forecasters on a variety of PDEs and spatiotemporal generalization settings, including evaluation on unseen sparse irregular meshes and resolutions. There are many promising future work such as scaling DINO to real-world problems, e.g. weather forecasting, or incorporating recent strategies to adapt to changing dynamics (Kirchmeyer et al., 2022) .

A FULL RESULTS

We provide in Table 5 a more detailed version of Table 2 for the space-time extrapolation problem where we report the performance In-s (on the observation grid) and Out-s (outside). We add s = 50%. Then, we report in Table 6 a more detailed version of Table 3a , which includes the results of X ts = X tr . This corresponds to our generalization across grids problem. 

E COMPLEMENTARY ANALYSES

We detail in this section additional experiments, allowing us to further analyze and assess the performance of DINO.

E.1 LONG-TERM TEMPORAL EXTRAPOLATION

We provide in Table 8 an analysis of error accumulation over time for long-term extrapolation. More precisely, we generate a Navier-Stokes dataset with longer trajectories and report MSE for T ′ = T + ∆T where ∆T ∈ {T, 5T, 10T, 50T }. Note that ∆T = T is the setting of our main experiments (T ′ = 2T ). We observe that DINO's MSE in long-term forecasting is more than an order of magnitude smaller than for (I-)MP-PDE. This demonstrates the extrapolation abilities of our model.

E.2 INRS' ADVANTAGE OVER INTERPOLATION

We report in Table 9 the MSE of bicubic interpolation, our FourierNet's MSE (auto-decoding with amplitude modulation but without dynamics model) and DINO's MSE (with dynamics model) on train In-t for both Navier-Stokes and Wave. This corresponds to MSE averaged over all training frames within the train horizon and not only the initial condition v 0 . We observe that FourierNet is better than interpolation. Indeed, interpolation is poorly adapted to sparse observation grids: the interpolation errors are clearly visible in Figure 7 , first row (5% setting). Interestingly, DINO's MSE is only slightly worse than the FourierNet's MSE, showing that we correctly learned the dynamics of latent modulations α t . I-MP-PDE, which combines bicubic interpolation with MP-PDE, is then expectedly outperformed by DINO on this challenging 5% setting. This shows the advantage of using INRs instead of standard bicubic interpolation to interpolate between observed spatial locations. SST. We evaluate DINO on real-world data to further assess its applicability. Following de Bézenac et al. ( 2018) and Donà et al. (2021) , we model the Sea Surface Temperature (SST) of the Atlantic ocean, derived from the data-assimilation engine NEMO (Nucleus for European Modeling of the Ocean, Madec & NEMO System Team) using E.U. Copernicus Marine Service Information.foot_0 Accurately modeling SST dynamics is critical in weather forecasting or planning of coastal activities. This problem is particularly challenging as SST dynamics are only partially observed: several unobserved variables affecting the dynamics (e.g. the sea water flow) need to be estimated from data. For this experiment, we consider trajectories collected from three geographical zones (17 to 20) following the initial train / test split of de Bézenac et al. (2018) . Notably, T = 9 d, which includes τ = 4 d of conditioning frames, i.e. models are tested to predict v t∈ τ,T from v t∈ 0,τ -1 . Incorporating consecutive time steps. To model SST which includes non-Markovian data and thus does not correspond to an Initial Value Problem as in Section 2, we modify our dynamics model in a similar fashion to Yıldız et al. (2019) to integrate a history of several consecutive observations v t∈ 0,τ -1 instead of only the initial observation v 0 . In more details, we define a neural ODE over an augmented state [α t , α ′ t ] where α t is our auto-decoded state and α ′ t is an encoding of τ = 4 past autodecoded observations via a neural network c ξ . We adjust our inference and training settings as follows: • inference: we compute α ′ τ -1 = c ξ (α 0 , . . . , α τ -1 ) and then unroll our neural ODE from the initial condition [α τ -1 , α ′ τ -1 ] to obtain [α t , α ′ t ] for all t > τ -1: ∀t ∈ 0, τ -1 , α t = e φ (v t ), α ′ τ -1 = c ξ (α 0 , . . . , α τ -1 ), d[α t , α ′ t ] dt = f ψ ([α t , α ′ t ]); • training: for all t, we infer α ′ t+τ -1 = c ξ (α t , . . . , α t+τ -1 ) and fit the above neural ODE on the [α t , α ′ t ] obtained for all t ∈ 0, Tτ + 1 . This experiment confirms that our space-and time-continuous framework can easily be extended to incorporate refined temporal models. Results. We report in Table 10 test MSE for DINO and VarSep (Donà et al., 2021) , the current state-of-the-art on SST, retrained on the same training data. DINO notably outperforms VarSep in prediction performance. This demonstrates DINO's potential to handle complex real-world spatiotemporal dynamics. We also provide some visualizations of DINO's train and test predictions in Figure 10 . We make two observations. First, DINO fits very accurately the train data. Second, on the test data, we observe that the dynamics on low frequencies seem to be correctly modeled while the prediction of high frequencies dynamics are less accurate. Larger scale experiments would be required to effectively evaluate the model performance on this challenging dataset. Given the complexity of the data, this is out of the scope of the paper. Yet, these experiments already demonstrate that DINO behaves competitively w.r.t. the previous state-of-the-art. Implementation choices. We choose a similar INR and dynamics architecture as for our Shallowwater experiment. We use for c ξ , which takes as input four consecutive α t s, individual encoding of the α t s through a four-layer fully connected network which are then fed to a single linear layer. 



https://data.marine.copernicus.eu/product/GLOBAL_ANALYSIS_FORECAST_ PHY_001_024/description.



e 0 D / x j u O / H e e q 7 i t 4 v y K L e 3 S Y 6 4 0 Q e U l I 6 4 H y u e q W c n V F / 5 S u 4 V C C c z Z E + x X s B U z j 2 8 j Z I 7 h D r k b k L z / j S M d 6 n z l Y 2 v 6 f m 4 t O d + e x k l b n b a

r 9 I w 2 a I g c r + k d f a R P w a v g b f A + + N C G z n U 8 5 y b 9 t o I v P w F k s t w w < / l a t e x i t >

4 5 N a s r 2 n A 9 e e n O Z T 1 7 a w / B D z l m w n 0 7 B L t 2 f k i P z r D / V b R d O 6 / Y Y F b R 9 s D g e 8 D K I W 7 e T t P 4 b A 7 g Y f q 7 5 2 f 9 o r G 9 2 u k + 6 T z e w t 9 g l Z o 1 T 3 f p H j 3 A r K / R O r 2 g T e o h x 2 t 6 T x / p k z f 1 3 n k n 3 o c m d K b l O M v 0 2 / I + / w S g K d a J < / l a t e x i t >

Figure 1: (Left) We represent time contexts. The train trajectory consists of training snapshots (■), observed in a train interval [0, T ] denoted In-t. The line (-) in continuation is a forecasting of this trajectory beyond In-t, in (T, T ′ ] denoted Out-t. The line below (-, test) is a forecasting from a new initial condition v 0 (■) on In-t and Out-t. (Middle and right) We illustrate spatial contexts. (Middle) Dots (•) correspond to the train observation grid X tr , denoted In-s. Out-s denotes the complementary domain Ω \ X tr . (Right) New test observation grid X ts , used as an initial point for forecasting (left).

Figure 2: Proposed DINO model. Inference (left): given a new initial condition observed on a grid X ts , v 0 | Xts , forecasting amounts at decoding α t to ṽt , by unrolling α 0 with a time-continuous ODE dynamics model f ψ . Train (right): given an observation grid X tr and a space-continuous decoder D ϕ , α t is learned by auto-decoding s.t. D ϕ (α t )| Xtr = v t | Xtr ; its evolution is then modeled with f ψ .

, then introduce each of its components (Section 4.2) and how they are trained (Section 4.3, Figure2(right)). Finally, we detail our implementation based on amplitude modulation, a novel INR parameterization for spatiotemporal data which performs separation of variables (Section 4.4).

Figure 3: Decoding via INR Eq. (4).

< l a t e x i t s h a 1 _ b a s e 6 4 = " f Y Z P 9 w 5 0 p A U H 2 f B h R n 6 8 Z r 9 5 F W U = " > A A A D w H i c n V L L b t R A E K y N e Y T w S u D I x W K F h B B a e S M E 3 I i A A x d E Q N l N p D h C t n e y G e I X 9 j g i r P g E E F f 4 D n 6 G C 2 f 4 B G 5 U t 5 0 I E v E Q Y 9 n u r u 6 q m e 6 e u E x t 7 Y L g c 2 / O O 3 H y 1 O n 5 M w t n z 5 2 / c H F x 6 d K 4 L p o q M a O k S I t q I 4 5 q k 9 r c j J x 1 q d

7 5 y a q 1 d e h t p / K t m C i p + 0 u U 2 + P b H 6 n K d i + F J W 5 1 2 O g 9 1 z 1 T V D R 5 3 1 T 7 p z i W e z O M G + a H m T L U j r 8 i e d X 6 I m 4 f Y / y p K 1 4 4 q t p g o S g 8 c / 3 u q H H K m c k 8 m h x O m x 3 s 9 P H q L j x v j 5 c H w 9 u D W 0 6 C / s o x 2 z e M K r u I 6 b / E d r O A R V j H i H i / w H h / w 0 b v v 7 X i F 9 7 J N n e t 1 n M v 4 Z X m v f w B 7 U c X 4 < / l a t e x i t > ⇥ < l a t e x i t s h a 1 _ b a s e 6 4 = " f Y Z P 9 w 5 0 p A U H 2 f B h R n 6 8 Z r 9 5 F W U = " > A A A D w H i c n V L L b t R A E K y N e Y T w S u D I x W K F h B B a e S M E 3 I i A A x d E Q N l N p D h C t n e y G e I X 9 j g i r P g E E F f 4 D n 6 G C 2 f 4 B G 5 U t 5 0 I E v E Q Y 9 n u r u 6 q m e 6 e u E x t 7 Y L g c 2 / O O 3 H y 1 O n 5 M w t n z 5 2 / c H F x 6 d K 4 L p o q M a O k S I t q I 4 5 q k 9 r c j J x 1 q d

6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t >

Figure 5: Data on manifold. DINO's Shallow-Water superresolution test prediction (top) against the reference (middle), with test MSE (↓) (bottom).

Figure 10: DINO's prediction examples on SST.

Comparison of data-driven approaches to spatiotemporal PDE forecasting.



Space and time generalization. Train and test observation grids are equal and subsampled from an uniform 64×64 grid, used for inference. We report MSE (↓) on the inference time interval T ′ , divided within training horizon (In-t, T ) and beyond (Out-t, outside T ) across subsampling ratios.

Flexibility w.r.t. input grid. Observed test / train grid differ (X ts ̸ = X tr ). We report test MSE (↓) for Navier-Stokes on X ′ = X ts (In-s). Green Yellow Red mean excellent, good, poor MSE. Generalization across grids: Xtr, Xts are subsampled with different ratios str ̸ = sts among {5, 50, 100}% from the same uniform 64×64 grid. Generalization across resolutions: Xts (resp. Xtr) are subsampled at the same ratio s ∈ {5, 100}% from different uniform grids with resolution rts ∈ {32, 64, 256} (resp. rtr = 64).

Finer time resolution. Test MSE (↓) under T ′ for Navier-Stokes.

Space and time generalization. The train and test observation grids are equal; they are subsampled with a ratio s from an uniform 64×64 grid fixed here to be the inference grid X ′ . We report MSE (↓) on X ′ (on the observation grid In-s, outside Out-s or on both Full) and the inference time interval T ′ , divided within training horizon (In-t, T ) and beyond (Out-t, outside T ) across subsampling ratios s ∈ {5%, 25%, 50%, 100%}. Best in bold and second best underlined.

Long-term extrapolation performance of DINO and (I-)MP-PDE in the space and time generalization experiment for test trajectories on Out-t (]T, T ′ = T + ∆T ]); cf. Table2 and Section5.1.

MSE reconstruction error (In-s and Out-s) of train sequences within the train horizon (In-t) for three different methods: interpolation of observed points in X tr , FourierNet learned over individual frames in X tr , and DINO (FourierNet with a dynamics model). Similarly to the previous SIREN baseline, we concatenate the per-trajectory context code to space and time coordinates at the first layer. The hidden layer size is fixed to 256 and we use the default parameter initialization with a frequency scale ω s of 64 higher than DINO. The size of the context code is d α = 800. The learning rate is 10 -3 .

SST test prediction performance for DINO and VarSep(Donà et al., 2021).

ACKNOWLEDGEMENTS

We thank Emmanuel de Bézenac and Jérémie Donà for helpful insights and discussions on this project. We also acknowledge financial support from DL4CLIM (ANR-19-CHIA-0018-01) and DEEPNUM (ANR-21-CE23-0017-02) ANR projects. This study has been conducted using E.U. Copernicus Marine Service Information.

REPRODUCIBILITY STATEMENT

We present in Section 5.1 our experimental setting with datasets, baselines and forecasting tasks. The train and test settings are detailed in Appendix C, including more information on the chosen physical PDE systems. We describe DINO's pseudo-code in Algorithm 1 and provide implementation and hyperparameters details in Appendix D. We provide our source code at https://github.com/ mkirchmeyer/DINo.

+

< l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > A A A D u 3 i c n V L L b t R A E K y N g Y T w S u D I x W K F h A C t v B E i X B C R 4 M A F k a B s E i k b I X t 2 s j G Z t S 1 7 H B G t + A I E J / g T f o Y L Z / g E b l S 3 n Q g S 8 R B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > A A A D u 3 i c n V L L b t R A E K y N g Y T w S u D I x W K F h A C t v B E i X B C R 4 M A F k a B s E i k b I X t 2 s j G Z t S 1 7 H B G t + A I E J / g T f o Y L Z / g E b l S 3 n Q g S 8 R B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > A A A D u 3 i c n V L L b t R A E K y N g Y T w S u D I x W K F h A C t v B E i X B C R 4 M A F k a B s E i k b I X t 2 s j G Z t S 1 7 H B G t + A I E J / g T f o Y L Z / g E b l S 3 n Q g S 8 R B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > A A A D u 3 i c n V L L b t R A E K y N g Y T w S u D I x W K F h A C t v B E i X B C R 4 M A F k a B s E i k b I X t 2 s j G Z t S 1 7 H B G t + A I E J / g T f o Y L Z / g E b l S 3 n Q g S 8 R B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > x F i 2 u 6 u 7 a q a 7 J 6 u L v H V R 9 L W 3 E F y 4 e O n y 4 p W l q 9 e u 3 7 i 5 v H J r q 6 2 6 x t i h q Y q q 2 c 7 S 1 h Z 5 a Y c u d 4 X d r h u b T r P C j r K D Z x I f H d q m z a v y j T u q 7 e 4 0 n Z T 5 X m 5 S R y gx F i 2 u 6 u 7 a q a 7 J 6 u L v H V R 9 L W 3 E F y 4 e O n y 4 p W l q 9 e u 3 7 i 5 v H J r q 6 2 6 x t i h q Y q q 2 c 7 S 1 h Z 5 a Y c u d 4 X d r h u b T r P C j r K D Z x I f H d q m z a v y j T u q 7 e 4 0 n Z T 5 X m 5 S R y gr z Y l a v C X 6 N 9 5 x 5 r / y X C 0 W e 3 g o N W j W V A r i q k u 9 S i 1 d c S c P f 6 r K U q E k 5 u w p 4 4 Z 2 K s z j P o f C q a R 2 1 9 t Y 4 t 8 k 0 6 H O T 3 1 u j e 9 / r C 6 X u S i e t N V p p / N E 9 s x E X e G Z r / a 5 P 5 f z 3 D z u k j + R n J l 0 5 B 3 Z j f c n u H e C / a + i 6 9 p p x R Z z i q 4 H l v 8 D U Z 5 w p u 6 e T E 8 m T I / 3 e n D 6 F p 8 1 t o b 9 w Y g f c Z q 2 9 5 6 5 5 8 y 9 d 2 5 S m q y 2 U f S l M x V c u j x 9 Z e b q 7 L X r N 2 7 e m p u / v V 4 X T Z X q f l q Y o h o k q t Y m y 3 X f Z t b o Q V l p N U 6 M 3 k j 2 n 7 n 9 j U N d 1 V m R v 7 Z H p d 4 a q 1 G e 7 W S p s o Q G s T L l r t q 2 2 3 M L U T e S F V 4 0 e t 5 Y W J 5 e + / r 5 / d t P q 8 V 8 5 Y g f c Z q 2 9 5 6 5 5 8 y 9 d 2 5 S m q y 2 U f S l M x V c u j x 9 Z e b q 7 L X r N 2 7 e m p u / v V 4 X T Z X q f l q Y o h o k q t Y m y 3 X f Z t b o Q V l p N U 6 M 3 k j 2 n 7 n 9 j U N d 1 V m R v 7 Z H p d 4 a q 1 G e 7 W S p s o Q G s T L l r t q 2 2 3 M L U T e S F V 4 0 e t 5 Y W J 5 e + / r 5 / d t P q 8 V 8 5R V t 2 C z X e q L K g v 9 O e q a a c 7 Z j / x G t N i T r s E / 0 b 7 y T z X 3 l S i 8 M e H m s N K W s q F Z H q j F d p t C t y 8 v C n q h w V S m J i j x m v a B t l n v Q 5 V E 6 t t U t v Y 4 1 / 0 0 x B x T c + t 8 H 3 P 1 a X 6 1 w s T 9 r q t N N 5 p n t m q m 7 x w l f 7 0 p 9 L P J n H f f J H m j P R j h y R PR V t 2 C z X e q L K g v 9 O e q a a c 7 Z j / x G t N i T r s E / 0 b 7 y T z X 3 l S i 8 M e H m s N K W s q F Z H q j F d p t C t y 8 v C n q h w V S m J i j x m v a B t l n v Q 5 V E 6 t t U t v Y 4 1 / 0 0 x B x T c + t 8 H 3 P 1 a X 6 1 w s T 9 r q t N N 5 p n t m q m 7 x w l f 7 0 p 9 L P J n H f f J H m j P R j h y R P< l a t e x i t s h a 1 _ b a s e 6 4 = " l 4 g j t S W h n s 3 m f q w P p z t R j s Y U V q y F h T J Y i t L n U q j X T F n t z / q S p D h Y q Y t Q e M a 9 q p M E / 6 7 A u n l t p t b 2 O J f 5 V M i 1 o / d b k N v v 2 x u k L m o n j S V q e d z h P Z M x d 1 h e e u 2 h f u X N a z 8 7 h P f i Q 5 Q + n I E d k T 5 0 d 4 c I r 9 r 6 L t 2 l n F F r O K t g e G / 0 N R j j h T e 0 8 G p x O m x 3 s d n r 3 F 5 4 3 1 p V 7 4 q P f w J S / 4 E t o 1 j d u 4 i y 5 v 8 T J W 8 R R r 6 H M P j Q / 4 i E / e M 6 / y 3 n j j N n W q 4 z i 3 8 M v y 3 v 0 A c X H H B A = = < / l a t e x i t > z (1) t (x) < l a t e x i t s h a 1 _ b a s e 6 4 = " p q e F V w k t J k w T q 1 r IM X c 3 P 0 p K 0 2 F k p i x x 9 y v a M f C P K m z K 5 x a c j e 1 D W X / q 0 Q a 1 P i x j W 3 w 7 Y / Z 5 d I X x Z t 2 O l 1 3 n s u Z q a g r r N l s X 9 p 7 G c / 0 4 w H 5 g c R M p C L v y G 6 t H + D h K f a / i q Z q Z x U 7 z C i a G m j + D 0 Q 5 Y E / N n I x P O 0 y P c + 2 f n e L z x u b K w H 8 8 e L T B A V 9 B t 2 Z x C 3 e x z C l + g l W 8 w D q G M u k f 8 Q m f n T W n d t 4 7 H 7 r Q m Z 7 l 3 M Q v y z n 6 A e y j y F U = < / l a t e x i t > µ t(1) < l a t e x i t s h a 1 _ b a s e 6 4 = " p q e F V w k t J k w T q 1 r IR B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3l u 7 u 6 q 2 a 6 e 9 I q 1 7 W N o q + 9 h e D M 2 X P n F y / 0 L 1 6 6 f O X q 0 v K 1 z b p s T K Z G W Z m X Z i t N a p X r Q o 2 s t r n a q o x K 5 m m u x u n e u o u P 9 5 W p d V m 8 s g e V 2 p k n s 0 L v 6 i y l u 7 u 6 q 2 a 6 e 9 I q 1 7 W N o q + 9 h e D M 2 X P n F y / 0 L 1 6 6 f O X q 0 v K 1 z b p s T K Z G W Z m X Z i t N a p X r Q o 2 s t r n a q o x K 5 m m u x u n e u o u P 9 5 W p d V m 8 s g e V 2 p k n s 0 L v 6 i y e 0 j T J P + x w q p 9 H a p b e J x r 9 p p q D i G 5 / b 4 v s f q y t 0 L p Y n n e n M p v N K 9 8 x V 3 W L d V / v a n 0 s 8 m c c j 8 m P N G W t H D s m e e j / G 4 z P s fe 0 j T J P + x w q p 9 H a p b e J x r 9 p p q D i G 5 / b 4 v s f q y t 0 L p Y n n e n M p v N K 9 8 x V 3 W L d V / v a n 0 s 8 m c c j 8 m P N G W t H D s m e e j / G 4 z P s f R V t 2 C z X e q L K g v 9 O e q a a c 7 Z j / x G t N i T r s E / 0 b 7 y T z X 3 l S i 8 M e H m s N K W s q F Z H q j F d p t C t y 8 v C n q h w V S m J i j x m v a B t l n v Q 5 V E 6 t t U t v Y 4 1 / 0 0 x B x T c + t 8 H 3 P 1 a X 6 1 w s T 9 r q t N N 5 p n t m q m 7 x w l f 7 0 p 9 L P J n H f f J H m j P R j h y R PR V t 2 C z X e q L K g v 9 O e q a a c 7 Z j / x G t N i T r s E / 0 b 7 y T z X 3 l S i 8 M e H m s N K W s q F Z H q j F d p t C t y 8 v C n q h w V S m J i j x m v a B t l n v Q 5 V E 6 t t U t v Y 4 1 / 0 0 x B x T c + t 8 H 3 P 1 a X 6 1 w s T 9 r q t N N 5 p n t m q m 7 x w l f 7 0 p 9 L P J n H f f J H m j P R j h y R PR V t 2 C z X e q L K g v 9 O e q a a c 7 Z j / x G t N i T r s E / 0 b 7 y T z X 3 l S i 8 M e H m s N K W s q F Z H q j F d p t C t y 8 v C n q h w V S m J i j x m v a B t l n v Q 5 V E 6 t t U t v Y 4 1 / 0 0 x B x T c + t 8 H 3 P 1 a X 6 1 w s T 9 r q t N N 5 p n t m q m 7 x w l f 7 0 p 9 L P J n H f f J H m j P R j h y R PR V t 2 C z X e q L K g v 9 O e q a a c 7 Z j / x G t N i T r s E / 0 b 7 y T z X 3 l S i 8 M e H m s N K W s q F Z H q j F d p t C t y 8 v C n q h w V S m J i j x m v a B t l n v Q 5 V E 6 t t U t v Y 4 1 / 0 0 x B x T c + t 8 H 3 P 1 a X 6 1 w s T 9 r q t N N 5 p n t m q m 7 x w l f 7 0 p 9 L P J n H f f J H m j P R j h y R P f P + C A 9 O s f 9 V l K 6 d V W w x U Z Q e O P 4 P V X n E m c o 9 G Z 9 O m B 7 v 9 e D s L T 5 v b C 7 3 B 4 / 6 D z e i 3 t o y 2 r W A 2 7 i D e 7 z F K 1 j D c 6 x j q F 1 6 j w / 4 G K w G J n g b Z G 3 q X M d z b u G X F T Q / A E i / w k g = < / l a t e x i t > x < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > AR B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " > AR B j 2 e 6 u 7 q q Z 7 p 6 k c G n l o + h z Z y Y 4 c / b c 7 N z 5 + Q s X L 1 2 + s r B 4 d a P K 6 9 L Y g c l d X m 4 l c W V d m t m B T 7 2 z W 0 V p 4 0 n i 7 G a y / 1 j i m w e 2 r N I 8 W / e H h d 2 Z x O M s 3 U 1 N 7 A m t 3 X m 5 0 I 1 6 k a 7 w t N F v j e 6 j 7 8 X 7 L + u f 3 q 3 m i 5 0 X G G K E H A Y 1 J r D I 4 G k 7 x K j 4 b K O P C A W x H U y J l b R S j V u 8 w T y 5 N b M s M 2 K i + / y O 6 W 2 3 a E Z f N C t l G + 7 i + J Z k h r h J T s 6 8 k r b s F m q 8 V m V B f 6 c 9 V U 0 5 2 y H / S a s 1 I e q x R / R v v K P M f + V J L R 6 7 e K A 1 p K y p U E S q M 6 1 K r V 2 R k 4 c / V e W p U B A T e 8 R 4 S d s o 8 6 j P o X I q r V 1 6 G 2 v 8 q 2 Y K K r 5 p c 2 t 8 + 2 N 1 m c 7 F 8 q S N T j O d J 7 q n U 3 W L Z 2 2 1 z 9 t z i S f z u E 3 + U H P G 2 p H X Z E 9 b f 4 i 7 x 9 j / K k r X T i o 2 m C h K D z z / B 6 o 8 5 E z l n o y O J 0 y P 9 7 p / 8 h a f N j a W e v 3 7 v X t r U X d l C c 2 a w 3 X c w C 3 e 4 m W s 4 C l W M d A u v c U H f A w e B i Z 4 F b g m d a b T c q 7 h l x X U P w D v 2 M N z < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " M A c X M p J V n L h 8 P 6 T q 0 4 O B g 3 J w h S w = " 2022), we consider latent shift transformations (Figure 4 ), detailed in Eq. ( 9). Eq. ( 9) extends Eq. ( 7) by introducing a shift term µ (l-1) t at each layer l, defined as µThe INR's parameters are defined as h ϕ (α t ) = {W ; b + W ′ α t ; ω} where ϕ = {W, b, W ′ } are h's parameters. Thus, amplitude modulation separates time and space. We show in Table 5 that it significantly improves performance, particularly time extrapolation.

5. EXPERIMENTS

We assess the spatiotemporal versatility of DINO, following Section 2. We introduce our experimental setting (Section 5.1), which includes a variety of challenging PDE datasets, state-of-the-art baselines and forecasting tasks. Then, we present and comment the experimental results (Section 5.2).Published as a conference paper at ICLR 2023 

B PREDICTION

We display the test prediction of DINO (Figure 6 ) and I-MP-PDE (Figure 7 ) for various subsampling levels when X = X tr = X ts . Predictions are performed on a 64×64 uniform grid which defines the observation grid X via different subsampling rates. Yellow points correspond to the observation grid X (In-s) while purple points indicate off-grid points (Out-s). The prediction for I-MP-PDE at t = 0 is the interpolated initial condition.Subsampling rate Observation grid X Predicted trajectory Figure 8 : Prediction MSE per frame for DINO on Wave with its corresponding observed grid X . For each model, the first row contains the predicted trajectory from 0 to T ′ , the second row is the corresponding error maps w.r.t. the reference data (the darker the pixel, the lower the error).

C DETAILED DESCRIPTION OF DATASETS

We choose T (resp. T ′ ) on a regular grid in [0, T ] (resp. [0, T ′ ]) with a given temporal resolution and fix T ′ = 2T + δt, where δt is the step size of the temporal grid. Hence, we always consider 10 consecutive frames for In-t and 10 more for Out-t. The range of T depends on the nature of the dataset. We provide below further details on the choice of these parameters and other experimental parameters, such as the number of observed trajectories.2D Wave equation (Wave). It is a second-order PDE:where u is a function of the displacement at each point in space w.r.t. the rest position, c ∈ R * + is the speed of wave traveling. We transform the equation to a first-order form, considering the input v t = u t , ∂ ut ∂t , so that the dimension of v t (x) at each point x ∈ Ω is n = 2. We generate our dataset for speed c = 2 with periodic boundary condition. The domain is, the initial displacement u 0 is a Gaussian function:where the height of the peak displacement is a ∼ U(2, 4), the location of the peak displacement is (b 1 , b 2 ) ∼ U(-1, 1), and the standard deviation is r ∼ U(0.25, 0.3). The initial time derivative is ∂ut ∂t t=0 = 0. Each snapshot is generated on a uniform grid of 64×64. Each sequence is generated with fixed interval δt = 0.25. We set the train horizon T = 2.25 and the inference horizon T = 4.75. We generated 512 train trajectories and 32 test trajectories.2D Navier Stokes (Navier-Stokes, Stokes, 1851). This dataset corresponds to an incompressible fluid dynamics described by:where u is the velocity field and w the vorticity. u, w lie on a spatial domain with periodic boundary conditions, ν is the viscosity and f is a constant forcing term. The input v t is w t (n = 1). ν is the viscosity and f is the constant forcing term in the domain Ω.The spatial domain is Ω = [-1, 1] 2 , the viscosity is ν = 1 × 10 -3 , the forcing term is set as:The full spatial grid is of dimension 64×64 or 256×256 according to experiments in Section 5. We sample initial conditions as in Li et al. (2021b) to create different trajectories. The first 20 steps of the trajectories are cut off as they are too noisy and not informative in terms of dynamics. Trajectories are collected with δt = 1. We set the training horizon T = 19 and the inference horizon T ′ = 39. We generated 512 train trajectories and 32 test trajectories.3D spherical shallow water (Shallow-Water, Galewsky et al., 2004) . The following problem was originally presented for numerical model testing of global shallow-water equations. They can be written as:where d dt is the material derivative, k is the unit vector orthogonal to the spherical surface, u is the velocity field tangent to the surface of the sphere which can be transformed into the vorticity w = ∇ × u, and h is the thickness of the sphere. Note that the data we observe at each time t is v t = (w t , h t ). f, g, ν, Ω are parameters of the Earth; cf. Galewsky et al. (2004) for details.

Published as a conference paper at ICLR 2023

The initial conditions are slightly modified from Galewsky et al. (2004) , detailed below, to create symmetric phenomena on the northern and southern hemisphere. The initial zonal velocity u 0 contains two non-null symmetric bands in the both hemispheres, which are parallel to the circles of latitude. At each latitude and longitude ϕ, θ ∈, 0 if ϕ ∈ (-ϕ 1 , -ϕ 0 ), (0, 0) otherwise.(where u max is the maximum velocity, ϕ 0 = π /7, ϕ 1 = π /2ϕ 0 , and e n = exp -4 /(ϕ1-ϕ0) 2 . The water height h 0 is initialized by solving a boundary value condition problem as in Galewsky et al. (2004) . It is then perturbed by adding the following h ′ 0 to h 0 :h ′ 0 (ϕ, θ) = ĥ cos(ϕ) exp -where ϕ 2 = π /4, ĥ = 120 m, α = 1 /3 and β = 1 /15 are constants defined in Galewsky et al. (2004) .We simulate this phenomenon with Dedalus (Burns et al., 2020) on a latitude-longitude (lat-lon) grid. The size of the grid is 128 (lat) × 256 (lon). We take different initial conditions by sampling u max ∼ U(60, 80) to generate long trajectories. These long trajectories are then sliced into shorter ones. For simulation, we take one snapshot per hour (of internal simulation time), i.e. δt = 1 h. We stop the simulation at the 320 th hour. To construct a dataset rich of dynamical phenomena, we take the snapshots within the last 160 h in a long trajectory and slice them into 8 shorter trajectoires. Also note that the data is scaled into a reasonable range: the height h is scaled by a factor of 3 × 10 3 , and the vorticity w by a factor 2. In each short trajectory, T = 9 h and T ′ = 19 h. In total, we generated 16 long trajectories (i.e. 128 short trajectories) for train, 2 for test (i.e. 16 short trajectories).

D IMPLEMENTATION

We provide our code at https://github.com/mkirchmeyer/DINo.

D.1 ALGORITHM

We detail the algorithm of DINO for training and test via pseudo-code in Algorithm 1. Training consists in solving Eq. ( 6) w.r.t. ψ, α T , ϕ. Inference involves optimization only to find α 0 . 

D.2 CONVERGENCE ANALYSIS

In practice, we observe no training instability induced by the two-stage learning process of Eq. ( 6) and Algorithm 1: the objectives are non-conflicting. To assess this, we track the evolution of the auto-decoding loss ℓ dec and the dynamics loss ℓ dyn throughout training on Navier-Stokes (s = 100%) in Figure 9 . We observe that both losses smoothly converge until the end of training.

D.3 TIME EFFICIENCY

Our auto-decoding strategy coupled with a latent neural ODE makes DINO computationally efficient compared to our best competitor MP-PDE.Inferring α 0 via auto-decoding. Given a decoder and an observation frame v 0 , finding α 0 corresponds to solving an inverse problem, cf. Eq. ( 3). At inference, we use 300 steps to infer α 0 ; using less steps is possible but results in slight underfitting. This represents 2.76 s for 64 trajectories on a single Tesla V100 Nvidia GPU. Note that, as we unroll dynamics in the latent space, there is no need to relearn α t when t > 0. Moreover, this differs from training, where α t is continuously optimized for all t ∈ [0, T ] within the train horizon, in parallel with our INR decoder. Overall, we trained MP-PDE and DINO for approximately 7 days such that there is no major additional temporal training cost for DINO.Latent Neural ODE. Unrolling the dynamics with a neural ODE is efficient (0.35 s for 19 time predictions for 64 trajectories on a single Tesla V100 Nvidia GPU). Indeed, the latent space is small (at most 100 dimension) and the dynamics model uses a simple four-layer MLP for f ψ . With the same latent dynamics model, using an RK4 numerical scheme only incurs four additional function evaluations over a discretized alternative like a standard ResNet. This incurs a minor computational cost but enables DINO to operate at different temporal resolutions, unlike e.g. MP-PDE.In comparison, the official code of MP-PDE takes 312 s for inference on the same hardware for the same number of trajectories (vs 3 s for DINO). MP-PDE requires building an adjacency matrix and incurs for this reason a high memory cost, especially as the number of nodes increases. Interpolation also significantly increases inference time. This is not the case for DINO, which is faster.

D.4 ADDITIONAL IMPLEMENTATION DETAILS

We use PyTorch (Paszke et al., 2019) to implement DINO and our baselines. Hyperparameters are further defined in Appendix D.5. The dynamics model f ψ is a multilayer perceptron with swish activation function (Hendrycks & Gimpel, 2016; Ramachandran et al., 2017) . Its input and output sizes are the same as the size of latent space d α . All hidden layers share the same size. DINO's parameters are initialized with the default initialization in PyTorch, defining ϕ 0 , ψ 0 , ω in Algorithm 1. We recall that ω is fixed throughout training to reduce the number of optimized parameters without loss of performance. As in related work (Sitzmann et al., 2020; Fathony et al., 2021) , the frequency parameters ω are scaled by a factor, ω s , considered as a hyperparameter. For dynamics learning, we use an RK4 integrator via TorchDiffEq (Chen et al., 2018) and apply exponential Scheduled Sampling (Bengio et al., 2015) to stabilize training. In practice, modulations α t are learned channel-wise such that I θ : Ω → R dc has separate parameters per output dimension to make predictions less correlated across channels. We optimize all parameters ϕ, α, ψ using Adam (Kingma & Ba, 2015) with decay parameters (β 1 , β 2 ) = (0.9, 0.999).

D.5 HYPERPARAMETERS

We list the hyperparameters of DINO for each dataset in Table 7 . In practice, we observe it is beneficial to decay the learning rates η ϕ , η α when the loss reaches a plateau.

D.6 BASELINES IMPLEMENTATION

We detail in the following the hyperparameters and architectures used in our experiments for the considered baselines, which we reimplemented for our paper.• CNODE is implemented with four two-dimensional convolutional layers with 64 hidden features, ReLU activations, 3 × 3 kernel and zero padding. Learning rate is fixed to 10 -3 . We use an adjoint method for integration like Chen et al. ( 2018).• MNO. We use the FNO architecture of Li et al. (2021b) with three FNO blocks, GeLU activations, 12 modes and a width of 32. Learning rate is fixed to 10 -3 .• DeepONet. We consider an autoregressive formulation of DeepONet. We choose a width of 1000 for hidden features with a depth of 4 for both trunk and branch nets with ReLU activations. Learning rate is fixed to 10 -5 .• MP-PDE. We adapt the implementation in Brandstetter et al. (2022) to handle two-and three-dimensional PDEs. We use a time window of 1 with pushforward trick. Batch size and number of neighbors are fixed to 8. Learning rate is fixed to 10 -3 . We use ReLU activations.• SIREN. To represent data in space and time, SIREN takes space and time coordinates (x, t) as input. To handle multiple trajectories, we concatenate an optimizable per-trajectory context code α to the coordinates like in DINO. We fix the hidden layer size of SIREN to 256. We initialize the parameters and use the default input scale as in Sitzmann et al. (2020) .The size of the context code is d α = 800. The learning rate is 10 -3 .

