MESH-INDEPENDENT OPERATOR LEARNING FOR PDES USING SET REPRESENTATIONS

Abstract

Operator learning, which learns the mapping between infinite-dimensional function spaces, is an attractive alternative to traditional numerical methods for solving partial differential equations (PDEs). In practice, the functions of the physical systems are often observed by sparse or even irregularly distributed measurements; thus, the functions are discretized and usually represented by finite structured arrays, which are given as data of input-output pairs. Through training with the arrays, the solution of the trained models should be independent of the discretization of the input function and can be queried at any point continuously. Therefore, the architectures for operator learning should be flexibly compatible with arbitrary sizes and locations of the measurements, otherwise, scalability can be restricted when the observations have discrepancies between measurement formats. In this study, we propose the proper treatment of discretized functions as set-valued data and construct an attention-based model, called mesh-independent operator learner (MIOL), to provide proper treatments of input functions and query coordinates for the solution functions by detaching the dependence of the input and output meshes. Our models pre-trained with benchmark datasets of operator learning are evaluated by downstream tasks to demonstrate the generalization abilities to varying discretization formats of the system, which are natural characteristics of the continuous solution of the PDEs.

1. INTRODUCTION

Partial Differential equations (PDEs) are among the most successful mathematical tools for representing the physical systems with governing equations over infinitesimal segments of the domain of interest, given some problem-specific boundary conditions or forcing functions (Mizohata, 1973) . The governing PDEs, which are globally shared in the entire domain, are interpreted as interactions between infinitesimal segments with respect to their geometrical structures and values. Because of the universality of the entire domain, the system can be analyzed in a continuous manner with respect to the system inputs and outputs. In general, identifying appropriate governing equations for unknown systems is very challenging without domain expertise, however, numerous unknown processes remain for many complex systems. Even if knowing the governing equation of the system is known, it requires unnecessary time and memory costs to be solved using conventional numerical methods, and sometimes it is intractable to compute in a complex and large-scale system. Motivation. In recent years, operator learning, an alternative to conventional numerical methods, has been gaining attention, pursuing mapping between infinite-dimensional input/output function spaces in a data-driven manner without any problem-specific knowledge of the system (Nelsen & Stuart, 2021; Li et al., 2020a; b; 2021b; Lu et al., 2019; 2021; Cao, 2021; Kovachki et al., 2021) . Intuitively, for the underlying PDE, L a u = f defined on the continuous bounded domains Ω with system parameters a ∈ A, forcing function f ∈ F, and the solution of the system u ∈ U, the goal of the operator learning is to approximate the inverse operator G = L -1 a f : A → U or G : F → U with parametric model G θ . Without loss of generality, when the input function is a, the output function can be computed as u = G θ (a). Because the operator G θ should be able to capture interactions between elements of system inputs a to discover the governing PDEs, G θ is approximated by a series of integral operators with parameterized kernels that iteratively update the system input to the output (Nelsen & Stuart, 2021; Li et al., 2020a; b; 2021b; Cao, 2021; Kovachki et al., 2021) . In practice, ) , which is a successful architectures for operator learning. To process the input functions and query coordinates, DeepONets consist of two sub-networks, called branch network and trunk network, respectively. DeepONets can be queried at any coordinate y from the trunk network, however, they used the fixed discretization of the system inputs a from the branch network (Lu et al., 2019; 2021) . Another promising framework is neural operators, which consists of several integral operators with parameterized kernels to map between infinite-dimensional functions. Neural operators can be adapted to different resolutions of system inputs a in a mesh-invariant manner (Li et al., 2020a; b) . However, the implemented architectures of the neural operators are typically assumed to have the same sampling points for input and output functions (Li et al., 2020a; b; 2021b; Kovachki et al., 2021; Lu et al., 2022) , that is, they have not been formulated as a method for decoupling the discretization of the system input and solution, leading to the solution of the neural operators not being flexibly queried at any coordinate y. In addition, widely used as successful model for operator learning due to their efficacy and accuracy (Li et al., 2021b; Pathak et al., 2022) , Fourier neural operators (FNO) are limited to uniform grid discretization owing to the usage of FFT. The limitations and schematics of the existing studies are presented in Table 1 and Figure 6 . Contributions. There has been limited discussion on the generalization abilities of discretizations with extended variations, such as different sampling points for inputs and outputs functions with arbitrary numbers and irregular distributions. For these considerations, we treated the observational data as a set without intrinsic assumptions about the data and constructed what we call a meshindependent operator learner (MIOL), as shown in Figure 1 . MIOL is a fully attentional architecture consisting of an encoder-processor-decoder, where the encoder and decoder are made in a way related to set encoder-decoder frameworks (Zaheer et al., 2017; Lee et al., 2019) to detach the dependence of input and output meshes from the processor that processes the smaller fixed number of vectors in latent space. Attention mechanisms have been discussed as not only efficient in expressing pair-wise interactions (Cao, 2021; Kovachki et al., 2021; Guibas et al., 2021; Pathak et al., 2022) but also flexible in processing data in a modality-agnostic way (Vaswani et al., 2017; Jaegle et al., 2021; 2022) . Finally, we conducted several experiments on the benchmark PDE datasets for operator learning (Li et al., 2021b) . The results show that our model is not only competitive in original benchmark tasks but also robustly applicable when discretizations of input and output functions are allowed to be different, unstructured, and irregularly distributed, which are natural consequences of continuous solutions for physical systems but are not flexibly compatible with existing ones.



Comparison with other studies. of the input/output functions are infeasible, the observed data are provided as a set of input-output pairs, which are point-wise finite discretization of the functions.The output values at coordinate y can be expressed as u(y) = [G θ (a)](y) which can be viewed as the operating G : A × Y → U with two input placeholders, a ∈ A and y ∈ Y . Then, the following two considerations can be considered for the model with respect to the input function a, and query coordinate y: (1) the output of the model should not depend on any discretization format of a, and (2) the model should be able to output a solution at any query coordinate y. The measurements of the system are often sparsely and irregularly distributed owing to the geometry of the domain, environmental conditions, or inoperative equipment(Belbute-Peres et al., 2020; Lienen  & Günnemann, 2022). In addition, in popular numerical methods for solving the PDEs, such as finite element methods, unstructured meshes are often utilized for the discretization of the domain, and adaptive remeshing schemes are commonly deployed where the regions of the domain require different resolutions depending on the accuracy of the prediction(Brenner et al., 2008; Huang &  Russell, 2010). Thus, the model should aggregate global information over the measurements to process [G

