HT-NET: HIERARCHICAL TRANSFORMER BASED OPERA-TOR LEARNING MODEL FOR MULTISCALE PDES

Abstract

Complex nonlinear interplays of multiple scales give rise to many interesting physical phenomena and pose significant difficulties for the computer simulation of multiscale PDE models in areas such as reservoir simulation, high-frequency scattering, and turbulence modeling. In this paper, we introduce a hierarchical transformer (HT) scheme to efficiently learn the solution operator for multiscale PDEs. We construct a hierarchical architecture with a scale-adaptive interaction range, such that the features can be computed in a nested manner and with a controllable linear cost. Self-attentions over a hierarchy of levels can be used to encode and decode the multiscale solution space across all scales. In addition, we adopt an empirical H 1 loss function to counteract the spectral bias of the neural network approximation for multiscale functions. In the numerical experiments, we demonstrate the superior performance of the HT scheme compared with state-of-the-art (SOTA) methods for representative multiscale problems.

1. INTRODUCTION

Partial differential equation (PDE) models with multiple temporal/spatial scales are ubiquitous in physics, engineering, and other disciplines. They are of tremendous importance in making predictions for challenging practical problems such as reservoir modeling, high-frequency scattering, and atmosphere circulation, to name a few. The complex nonlinear interplays of characteristic scales cause major difficulties in the computer simulation of multiscale PDEs. While the resolution of all characteristic scales is prohibitively expensive, sophisticated multiscale methods have been developed to efficiently and accurately solve the multiscale PDEs by incorporating microscopic information. However, most of them are designed for problems with fixed input parameters. Recently, several novel methods such as Fourier neural operator (FNO) (Li et al., 2021) , Galerkin transformer (GT) (Cao, 2021) and deep operator network (DeepONet) (Lu et al., 2021) are developed to directly learn the operator (mapping) between infinite dimensional spaces for PDE problems, by taking advantages of the enhanced expressibility of deep neural networks and advanced architectures such as feature embedding, channel mixing, and self-attentions. Such methods can deal with an ensemble of input parameters and have great potential for the efficient forward and inverse solvers of PDE problems. However, for multiscale problems, most existing operator learning schemes essentially capture the smooth part of the solution space, and how to resolve the intrinsic multiscale features remains to be a major challenge. In this paper, we design a hierarchical transformer based operator learning method, so that the accurate, efficient, and robust computer simulation of multiscale PDE problems with an ensemble of input parameters becomes feasible. Our main contribution can be summarized in the following, • we develop a novel transformer architecture that allows the decomposition of the input-output mapping to a hierarchy of levels, the features can be updated in a nested manner based on the hierarchical local aggregation of self-attentions with linear computational cost; • we adopt the empirical H 1 loss which avoids the spectral bias and enhances the ability to capture the oscillatory features of the multiscale solution space; • the resulting scheme has significantly better accuracy and generalization properties for multiscale input parameters, compared with state-of-the-art (SOTA) models.

2. BACKGROUND AND RELATED WORK

We briefly introduce multiscale PDEs in Section § 2.1, then summarize relevant multiscale numerical methods in § 2.2, and neural solvers, in particular neural operators in § 2.3.

2.1. MULTISCALE PDES

Multiscale PDEs, in a narrower sense, refer to PDEs with rapidly varying coefficients, which may arise from a wide range of applications in heterogeneous and random media. In a broader sense, they may form a hierarchy of models at different scales by a systematic derivation, starting from fundamental laws of physics such as quantum mechanics (E & Engquist, 2003) . Some outstanding multiscale PDE models may include: Multiscale elliptic equations is an important class of prototypical examples, such as the following secondorder elliptic equation of divergence form, -∇ • (a(x)∇u(x)) = f (x) x ∈ D u(x) = 0 x ∈ ∂D (2.1) where 0 < a min ≤ a(x) ≤ a max , ∀x ∈ D, and the forcing term f ∈ H -1 (D; R). The coefficient to solution map is S : L ∞ (D; R + ) → H 1 0 (D; R), such that u = S(a). In Li et al. ( 2021), smooth coefficient a(x) is considered and S can be well resolved by the FNO parameterization. The setup a(x) ∈ L ∞ allows rough coefficients with fast oscillation (e.g. a(x) = a(x/ε) with ε ≪ 1), high contrast ratio with a max /a min ≫ 1, and even a continuum of non-separable scales. The rough coefficient case is much harder from both scientific computing (Branets et al., 2009) and operator learning perspectives. Navier-Stokes equation models the flow of incompressible fluids, which becomes turbulent due to the simultaneous interaction of a wide range of temporal and spatial scales of motion. Helmholtz equation models time-harmonic acoustic waves. Its numerical solution exhibits severe difficulties in the high wave number regime due to the interaction of high-frequency waves and numerical mesh.

2.2. MULTISCALE SOLVERS FOR MULTISCALE PDES

For multiscale PDEs, the computational cost of classical numerical methods, such as finite element methods, finite difference methods, etc., usually scales proportionally to 1/ε ≫ 1. Multiscale solvers have been developed such that their computational costs are independent of ε by incorporating microscopic information. Asymptotic and numerical homogenization Asymptotic homogenization (Bensoussan et al., 1978) is an elegant analytical approach for multiscale PDEs with scale separation, e.g., a(x) = a(x/ε) with ε ≪ 1 in equation 2.1. For general multiscale PDEs with possibly a continuum of scales, numerical homogenization (Engquist & Souganidis, 2008) offers an effective numerical approach that aims to identify low dimensional approximation spaces such that the approximation bases are adapted to the corresponding multiscale operator, and can be efficiently constructed (e.g. with localized bases). See Appendix A for details.

