NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF PDES WITH DISCONTINUITIES

Abstract

A large class of hyperbolic and advection-dominated PDEs can have solutions with discontinuities. This paper investigates, both theoretically and empirically, the operator learning of PDEs with discontinuous solutions. We rigorously prove, in terms of lower approximation bounds, that methods which entail a linear reconstruction step (e.g. DeepONet or PCA-Net) fail to efficiently approximate the solution operator of such PDEs. In contrast, we show that certain methods employing a nonlinear reconstruction mechanism can overcome these fundamental lower bounds and approximate the underlying operator efficiently. The latter class includes Fourier Neural Operators and a novel extension of DeepONet termed shift-DeepONet. Our theoretical findings are confirmed by empirical results for advection equation, inviscid Burgers' equation and compressible Euler equations of aerodynamics.

1. INTRODUCTION

Many interesting phenomena in physics and engineering are described by partial differential equations (PDEs) with discontinuous solutions. The most common types of such PDEs are nonlinear hyperbolic systems of conservation laws (Dafermos, 2005) , such as the Euler equations of aerodynamics, the shallow-water equations of oceanography and MHD equations of plasma physics. It is well-known that solutions of these PDEs develop finite-time discontinuities such as shock waves, even when the initial and boundary data are smooth. Other examples include the propagation of waves with jumps in linear transport and wave equations, crack and fracture propagation in materials (Sun & Jin, 2012) , moving interfaces in multiphase flows (Drew & Passman, 1998) and motion of very sharp gradients as propagating fronts and traveling wave solutions for reaction-diffusion equations (Smoller, 2012) . Approximating such (propagating) discontinuities in PDEs is considered to be extremely challenging for traditional numerical methods (Hesthaven, 2018) as resolving them could require very small grid sizes. Although bespoke numerical methods such as high-resolution finitevolume methods, discontinuous Galerkin finite-element and spectral viscosity methods (Hesthaven, 2018) have successfully been used in this context, their very high computational cost prohibits their extensive use, particularly for many-query problems such as UQ, optimal control and (Bayesian) inverse problems (Lye et al., 2020) , necessitating the design of fast machine learning-based surrogates. As the task at hand in this context is to learn the underlying solution operator that maps input functions (initial and boundary data) to output functions (solution at a given time), recently developed operator learning methods can be employed in this infinite-dimensional setting (Higgins, 2021) . These methods include operator networks (Chen & Chen, 1995) and their deep version, DeepONet (Lu et al., 2019; 2021) , where two sets of neural networks (branch and trunk nets) are combined in a linear reconstruction procedure to obtain an infinite-dimensional output. DeepONets have been very successfully used for different PDEs (Lu et al., 2021; Mao et al., 2020b; Cai et al., 2021; Lin et al., 2021) . An alternative framework is provided by neural operators (Kovachki et al., 2021a) , wherein the affine functions within DNN hidden layers are generalized to infinite-dimensions by replacing them with kernel integral operators as in (Li et al., 2020a; Kovachki et al., 2021a; Li et al., 2020b) . A computationally efficient form of neural operators is the Fourier Neural Operator (FNO) (Li et al., 2021a) , where a translation invariant kernel is evaluated in Fourier space, leading to many successful applications for PDEs (Li et al., 2021a; b; Pathak et al., 2022) . 2022)) leverage the regularity (or smoothness) of solutions of the PDE to prove that frameworks such as DeepONet, FNO and their variants approximate the underlying operator efficiently. Although such regularity holds for many elliptic and parabolic PDEs, it is obviously destroyed when discontinuities appear in the solutions of the PDEs such as in the hyperbolic PDEs mentioned above. Thus, a priori, it is unclear if existing operator learning frameworks can efficiently approximate PDEs with discontinuous solutions. This explains the paucity of theoretical and (to a lesser extent) empirical work on operator learning of PDEs with discontinuous solutions and provides the rationale for the current paper where, • using a lower bound, we rigorously prove approximation error estimates to show that operator learning architectures such as DeepONet (Lu et al., 2021) and PCA-Net (Bhattacharya et al., 2021) , which entail a linear reconstruction step, fail to efficiently approximate solution operators of prototypical PDEs with discontinuities. In particular, the approximation error only decays, at best, linearly in network size. • We rigorously prove that using a nonlinear reconstruction procedure within an operator learning architecture can lead to the efficient approximation of prototypical PDEs with discontinuities. In particular, the approximation error can decay exponentially in network size, even after discontinuity formation. This result is shown for two types of architectures with nonlinear reconstruction, namely the widely used Fourier Neural Operator (FNO) of (Li et al., 2021a) and for a novel variant of DeepONet that we term as shift-DeepONet. • We supplement the theoretical results with extensive experiments where FNO and shift-DeepONet are shown to consistently outperform DeepONet and other baselines for PDEs with discontinuous solutions such as linear advection, inviscid Burgers' equation, and both the one-and two-dimensional versions of the compressible Euler equations of gas dynamics.

2. METHODS

Setting. Given compact domains D ⊂ R d , U ⊂ R d ′ , we consider the approximation of operators G : X → Y, where X ⊂ L 2 (D) and Y ⊂ L 2 (U ) are the input and output function spaces. In the following, we will focus on the case, where ū → G(ū) maps initial data ū to the solution at some time t > 0, of an underlying time-dependent PDE. We assume the input ū to be sampled from a probability measure µ ∈ Prob(X ). DeepONet. DeepONet (Lu et al., 2021) will be our prototype for operator learning frameworks with linear reconstruction. To define them, let x := (x 1 , . . . , x m ) ∈ D be a fixed set of sensor points. Given an input function ū ∈ X , we encode it by the point values E(ū) = (ū(x 1 ), . . . , ū(x m )) ∈ R m . DeepONet is formulated in terms of two neural networks: The first is the branch-net β, which maps the point values E(ū) to coefficients β(E(ū)) = (β 1 (E(ū)), . . . , β p (E(ū)), resulting in a mapping β : R m → R p , E(ū) → (β 1 (E(ū)), . . . , β p (E(ū)). (2.1) The second neural network is the so-called trunk-net τ (y) = (τ 1 (y), . . . , τ p (y)), which is used to define a mapping τ : U → R p , y → (τ 1 (y), . . . , τ p (y)). (2.2) While the branch net provides the coefficients, the trunk net provides the "basis" functions in an expansion of the output function of the form N DON (ū)(y) = p k=1 β k (ū)τ k (y), ū ∈ X , y ∈ U, (2.3)



Currently available theoretical results for operator learning (e.g. Lanthaler et al. (2022); Kovachki et al. (2021a;b); De Ryck & Mishra (2022b); Deng et al. (

