REINFORCEMENT LEARNING-BASED ESTIMATION FOR PARTIAL DIFFERENTIAL EQUATIONS

Abstract

In systems governed by nonlinear partial differential equations such as fluid flows, the design of state estimators such as Kalman filters relies on a reduced-order model (ROM) that projects the original high-dimensional dynamics onto a computationally tractable low-dimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reduced-order estimator (RL-ROE), a ROMbased estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RL-ROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and Navier-Stokes equations, we show that in the limit of very few sensors, the trained RL-ROE outperforms a Kalman filter designed using the same ROM. Moreover, it yields accurate high-dimensional state estimates for reference trajectories corresponding to various physical parameter values, without direct knowledge of the latter.

1. INTRODUCTION

Active control of turbulent flows has the potential to cut down emissions across a range of industries through drag reduction in aircrafts and ships or improved efficiency of heating and air-conditioning systems, among many other examples (Brunton & Noack, 2015) . But real-time feedback control requires inferring the state of the system from sparse measurements using an algorithm called a state estimator, which typically relies on a model for the underlying dynamics (Simon, 2006) . Among state estimators, the Kalman filter is by far the most well-known thanks to its optimality for linear systems, which has led to its widespread use in numerous applications (Kalman, 1960; Zarchan, 2005) . However, continuous systems such as fluid flows are governed by partial differential equations (PDEs) which, when discretized, yield high-dimensional and oftentimes nonlinear dynamical models with hundreds or thousands of state variables. These high-dimensional models are too expensive to integrate with common state estimation techniques, especially in the context of embedded systems. Thus, state estimators for control are instead designed based on a reduced-order model (ROM) of the system, in which the underlying dynamics are projected to a low-dimensional subspace that is computationally tractable (Barbagallo et al., 2009; Rowley & Dawson, 2017) . A big challenge is that ROMs provide a simplified and imperfect description of the dynamics, which negatively affects the performance of the state estimator. One potential solution is to improve the accuracy of the ROM through the inclusion of additional closure terms (Ahmed et al., 2021) . In this paper, we leave the ROM untouched and instead propose a new design paradigm for the estimator itself, which we call a reinforcement-learning reduced-order estimator (RL-ROE). The RL-ROE is constructed from the ROM in an analogous way to a Kalman filter, with the crucial difference that the linear filter gain function, which takes in the current measurement data, is replaced by a nonlinear policy trained through reinforcement learning (RL). The flexibility of the nonlinear policy, parameterized by a neural network, enables the RL-ROE to compensate for errors of the ROM while still taking advantage of the imperfect knowledge of the dynamics. Indeed, we show that in the limit of sparse measurements, the trained RL-ROE outperforms a Kalman filter designed using the same ROM and displays robust estimation performance across different dynamical regimes. To our knowledge, the RL-ROE is the first application of RL to state estimation of parametric PDEs.

2.1. PROBLEM FORMULATION

Consider the parametric discrete-time nonlinear system given by z k+1 = f (z k ; µ), (1a) y k = Cz k + n k , where z k ∈ R n and y k ∈ R p are respectively the state and measurement at time k, f : R n → R n is a time-invariant nonlinear map from current to next state, n k ∈ R p is observation noise (assumed zero unless stated otherwise), µ ∈ R is a physical parameter, and C ∈ R p×n is a linear map from state to measurement. In this study, we assume that the dynamics given in (1) are obtained from a highfidelity numerical discretization of a nonlinear partial differential equation (PDE), which typically requires a large number n of continuous state variables (on the order of at least a few hundreds). Nonetheless, our work is applicable to any high-dimensional nonlinear system of the form (1). We do not account for exogenous control inputs to the system, which is left for future work. Here, we will focus on the post-transient dynamics of (1); these are the observed dynamics once the transients associated with the initial condition have died down. In particular, we consider systems whose post-transient dynamics are described by an attractor that is either a steady state, a periodic limit cycle or a quasi-periodic limit cycle, which encompasses the behavior of a large class of physical systems. The nature of the attractor is independent of the initial condition but depends on the value of µ, which we will consider to be in a range [µ 1 , µ 2 ]. The purpose of the present work is to combine reduced-order modeling (ROM) and reinforcement learning (RL) to construct a state estimator that solves the following problem: given a sequence of measurements {y 1 , • • • , y k } from a post-transient reference trajectory of (1), estimate the highdimensional state z k at current time k without knowledge of µ itself. The ROM procedure, which follows standard practices, is described in Section 2.2. The integration of the ROM with RL to solve the estimation problem, which constitutes the main novelty of the paper, is described in Section 2.3.

2.2. REDUCED-ORDER MODEL

Since the high dimensionality of (1) renders online estimation impractical, it is customary to formulate a reduced-order model (ROM) of the dynamics (Rowley & Dawson, 2017). First, one chooses a suitable linearly independent set of modes {u 1 , . . . , u r }, where u i ∈ R n , defining an r-dimensional subspace of R n in which most of the dynamics is assumed to take place. Stacking these modes as columns of a matrix U ∈ R n×r , one can then express z k U x k , where the reduced-order state x k ∈ R r represents the coordinates of z k in the subspace. Finally, one finds a ROM for the dynamics of x k , which is vastly cheaper to evolve than (1) when r n. There exist various ways to find an appropriate set of modes U and corresponding ROM for the dynamics of x k (Taira et al., 2017) . In this work, we employ the Dynamic Mode Decomposition (DMD), a purely data-driven algorithm that has found numerous applications in fields ranging from fluid dynamics to neuroscience (Schmid, 2010; Kutz et al., 2016) . Importantly, we seek a single ROM to describe dynamics corresponding to various parameter values µ ∈ [µ 1 , µ 2 ] since the state estimator that we will later construct based on this ROM does not have knowledge of µ. In order to apply DMD, we first construct a training dataset by solving (1) for values of µ belonging to a finite set S ⊂ [µ 1 , µ 2 ], resulting in a concatenated collection of snapshots Z train = {Z µ } µ∈S , where each Z µ = {z µ 0 , . . . , z µ K } is a post-transient trajectory of (1a) for a specific value µ ∈ S. The DMD then seeks a best-fit linear model of the dynamics in the form of a matrix A ∈ R n×n such that z µ k+1 Az µ k for all k and µ, and computes the modes U as the r leading principal component analysis (PCA) modes of Z train . The transformation z k U x k and the orthogonality of U then yield a linear discrete-time ROM of the form x k = A r x k-1 + w k-1 , (2a) y k = C r x k + v k , where A r = U T AU ∈ R r×r and C r = CU ∈ R p×r are the reduced-order state-transition and observation models, respectively. The (unknown) non-Gaussian process noise w k and observation

