Φ-DVAE: LEARNING PHYSICALLY INTERPRETABLE REPRESENTATIONS WITH NONLINEAR FILTERING

Abstract

Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder (Φ-DVAE) for embedding diverse data streams into timeevolving physical systems described by differential equations. Our approach combines a standard (possibly nonlinear) filter for the latent state-space model and a VAE, to embed the unstructured data stream into the latent dynamical system. A variational Bayesian framework is used for the joint estimation of the embedding, latent states, and unknown system parameters. To demonstrate the method, we look at three examples: video datasets generated by the advection and Korteweg-de Vries partial differential equations, and a velocity field generated by the Lorenz-63 system. Comparisons with relevant baselines show that the Φ-DVAE provides a data efficient dynamics encoding methodology that is competitive with standard approaches, with the added benefit of incorporating a physically interpretable latent space.

1. INTRODUCTION

Physical models -as represented by ordinary, stochastic, or partial differential equations -are ubiquitous throughout engineering and the physical sciences. These differential equations are the synthesis of scientific knowledge into mathematical form. However, as a description of reality they are imperfect (Judd & Smith, 2004) , leading to the well-known problem of model misspecification (Box, 1979) . At least since Kalman (1960) physical modellersls with observations (Anderson & Moore, 1979) . Such approaches are usually either solving the inverse problem of attempting to recover model parameters from data, and/or, the data assimilation (DA) problem of conducting state inference based on a time-evolving process. For the inverse problem, Bayesian methods are common (Tarantola, 2005; Stuart, 2010) . In this, prior belief in model parameters Λ is updated with data y to give a posterior distribution, p(Λ|y). This describes uncertainty with parameters given the data and modelling assumptions. DA can also proceed from a Bayesian viewpoint, where inference is cast as a nonlinear state-space model (SSM) (Law et al., 2015; Reich & Cotter, 2015) . The SSM is typically the combination of a time-discretised differential equation and an observation process: uncertainty enters the model through extrusive, additive errors. For a latent state variable u n representing some discretised system at time n, with observations y n , the object of interest is the filtering distribution p(u n |y 1:n ), where y 1:n := {y k } n k=1 . Additionally, the joint filtering and estimation problem, which estimates p(u n , Λ|y 1:n ) has received significant attention in the literature (see, e.g., Kantas et al. (2015) and references therein). This has been well studied in, e.g., electrical engineering (Storvik, 2002) , geophysics (Bocquet & Sakov, 2013) , neuroscience (Ditlevsen & Samson, 2014 ), chemical engineering (Kravaris et al., 2013 ), biochemistry (Dochain, 2003 ), and hydrology (Moradkhani et al., 2005) , to name a few. Typically in data assimilation tasks, while parameters of an observation model may be unknown, the observation model itself is assumed known (Kantas et al., 2015) . This assumption breaks down in settings where data arrives in various modalities, such as videos, images, or audio, hindering the ability to perform inference. However, in such cases often the underlying variation in the data stream is due to a latent physical process, which is typically at least partially known. In this work, these data streams are video data and velocity fields. We develop a variational Bayes (VB) (Blei et al., 2017) methodology which jointly solves the inverse and filtering problems for the case in which the observation operator is unknown. We model this unknown mapping with a variational autoencoder (VAE) (Kingma & Welling, 2014), which encodes the assumed timedependent observations y 1:N into pseudo-data x 1:N in a latent space. On this latent space, we stipulate that the pseudo-observations are taken from a known dynamical system, given by a stochastic ordinary differential equation (ODE) or partial differential equation (PDE) with possibly unknown coefficients. The differential equation is also assumed to have stochastic forcing, which accounts for possible model misspecification. The stipulated system gives a structured prior p(x 1:N |Λ), which acts as a physics-informed regulariser whilst also enabling inference over the unknown Λ. This prior is approximated using classical nonlinear filtering algorithms. Our framework is fully probabilistic: inference proceeds from a derived evidence lower bound (ELBO), enabling joint estimation of unknown network parameters and unknown dynamical coefficients via VB. To set the scene for this work, we now review the relevant literature.

2. RELATED WORK

As introduced above, VAEs (Kingma & Welling, 2014) are a popular high-dimensional encoder. A VAE defines a generative model that learns low-dimensional representations, x, of high-dimensional data, y, using VB. To perform efficient inference, a variational approximation q ϕ (x|y) is made to the intractable posterior p(x|y). Variational parameters ϕ are estimated via optimisation of the ELBO. This unsupervised learning approach infers latent representations of high-dimensional data. Recent works have extended the VAE to high-dimensional time-series data y 1:N , indexed by time n, with the aim of jointly learning latent representations x 1:N , and a dynamical system that evolves them. These dynamical variational autoencoder (DVAE) methods (Girin et al., 2021) enforce the dynamics with a structured prior p(x 1:N ) on the latent space. Various DVAE methods have been proposed. The Kalman variational autoencoder (KVAE) of Fraccaro et al. ( 2017) is a popular approach, which encodes y 1:N into latent variables x 1:N that are assumed to be observations of a linear Gaussian state-space model (LGSSM), driven by latent dynamic states u 1:N . Assumed linear dynamics are jointly learnt with the encoder and decoder, via Kalman filtering/smoothing. Another approach is the Gaussian process variational autoencoder (GPVAE) (Pearce, 2020; Jazbec et al., 2021; Fortuin et al., 2020) , which models x 1:N as a temporally correlated Gaussian process (GP). The Markovian variant of Zhu et al. ( 2022) allows for a similar Kalman procedure as in the KVAE, except, in this instance, the dynamics are known and are given by an stochastic differential equation (SDE) approximation to the GP (Hartikainen & Sarkka, 2010) . A related approach is provided for control applications in Watter et al. (2015); Hafner et al. (2019) , where locally linear embeddings are estimated. Yildiz et al. (2019) also propose the so-called ODE 2 VAE, which encodes the data to an initial condition which is integrated through time using a Bayesian neural ODE (Chen et al., 2018) . This trajectory, only, is used to generate the reconstructions via the decoder network. A related class of methods are deep SSMs (Bayer & Osendorfer, 2014; Krishnan et al., 2015; Karl et al., 2017) . These works assume that the parametric form of the SSM is unknown, and replace the transition and emission distributions with neural network (NN) models, which are trained based on an ELBO. They harness the representational power of deep NNs to directly model transitions between high-dimensional states. More emphasis is placed on generative modelling and prediction than representation learning, or system identification. We also note the related VAE works of Wu et al. (2021); Franceschi et al. (2020); Babaeizadeh et al. (2022) , which use VAE-type architectures for similar video prediction tasks. In Chung et al. (2015) the variational recurrent neural networks (VRNN) attempt to capture variation in highly structured time-series data, by pairing a recurrent NN for learning nonlinear state-transitions with a sequential latent random variable model. Methods to include physical information inside of autoencoders have been studied in the physics community. A popular approach uses SINDy (Brunton et al., 2016) for discovery of low-dimensional latent dynamical systems using autoencoders (Champion et al., 2019) . A predictive framework is given in Lopez & Atzberger (2021) , which aims to learn nonlinear dynamics by jointly optimizing

