CAPE: CHANNEL-ATTENTION-BASED PDE PARAME-TER EMBEDDINGS FOR SCIML

Abstract

Scientific Machine Learning (SciML) is concerned with the development of machine learning methods for emulating physical systems governed by partial differential equations (PDE). ML-based surrogate models substitute inefficient and often non-differentiable numerical simulation algorithms and find multiple applications such as weather forecasting, molecular dynamics, and medical applications. While a number of ML-based methods for approximating the solutions of PDEs have been proposed in recent years, they typically do not consider the parameters of the PDEs, making it difficult for the ML surrogate models to generalize to PDE parameters not seen during training. We propose a new channel-attention-based parameter embedding (CAPE) component for scientific machine learning models and a simple and effective curriculum learning strategy. The CAPE module can be combined with any neural PDE solvers allowing them to adapt to unseen PDE parameters without harming the original models' ability to find approximate solutions. The curriculum learning strategy provides a seamless transition between teacher-forcing and fully auto-regressive training. We compare CAPE in conjunction with the curriculum learning strategy using a PDE benchmark and obtain consistent and significant improvements over the base models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without substantially increasing inference time and parameter count. An implementation of the method and experiments are available at https://anonymous.4open.

1. INTRODUCTION

Many real-world phenomena, ranging from weather forecasts to molecular dynamics and quantum systems, can be modeled with partial differential equations (PDEs). While for some problems the mathematical description of these equations is available, finding its solutions is complex and usually needs some numerical treatments. Numerical simulation methods have been developed for many years and have achieved a high level of accuracy in solving these equations. However, numerical methods are resource intensive and time-consuming even when run on larger supercomputers to obtain sufficiently accurate results. Especially high-resolution and high-dimensional hydrodynamictype field equations are computationally demanding. The situation becomes even worse if it is necessary to perform simulations with various PDE parameters since a numerical simulation is required for each of the initial conditions and for each PDE parameter's configurations. Recently, there has been a rapidly growing interest in machine learning methods for the problem of solving PDEs due to their various applications in science and engineering Guo et al. (2021) . For example, several prior studies reported that ML models can estimate solutions more efficiently than classical numerical simulators (Li et al., 2021a; Stachenfeld et al., 2021) . Moreover, using neural networks as surrogate models allows us to compute derivatives with respect to the input variables. Differentiable surrogate models allow one to use backpropagation and automatic differentiation to solve the so-called inverse problems which have numerous real-world applications but are difficult to solve using traditional numerical methods (Coros et al., 2013; Allen et al., 2022) . A considerable number of papers have shown the advantage of ML-based surrogate models (Li et al., 2020; 2021a; Stachenfeld et al., 2021) . The majority of these methods, however, are purely data- 

Neural base model

Step 1 Step 2 Figure 1 : The standard autoregressive approach (left) and the proposed CAPE approach (right) which consists of two interdependent steps. driven, which does not allow us to change PDE parameters. Although a few models are taking into account PDE parameters, they are tailored to specific neural networks and cannot be used with other state-of-the-art methods. This makes it difficult for the SciML community to develop models with high generalization capability not only for the initial conditions but both for different types of PDEs and PDE parameters. To overcome the shortcomings of existing data-driven SciML models, a straightforward approach would include the PDE parameters as additional input. However, this naive method requires modification of the BASE network which is potentially harmful to its accuracy. An alternative approach attaches an external parameter embedding module to the network. However, there are too many possible module structures and methods to provide the embedded parameter information to the base network, and it is in general non-trivial to select the best one. In this paper, we propose a new and effective parameter embedding module by utilizing the channel-attention method inspired by a numerical solver with the implicit method and style transfer in ML (see Sec. 2.3 and Sec. 2.4) . The crucial idea is that a neural network generates intermediate (approximated) field data for future time steps which are then interpolated by a BASE model such as the FNO (Li et al., 2021a) to predict the field data for the next time step. CAPE can be combined with any existing autoregressive neural PDE solvers. Fig. 1 illustrates the proposed CAPE framework. We make the following contributions. First, we propose a CAPE module which can be combined with any existing neural PDE solvers and can effectively transfer the PDE parameter information to the base network (BASE). Second, we propose a simple but effective curriculum learning strategy that seamlessly bridges the teacher-forcing and auto-regressive methods. Third, we perform extensive experiments using various PDEs with a large number of different parameters evaluating the effectiveness and efficiency of the proposed method in comparison with state-of-the-art methods.  @ t u = F (t, x, u, @ x u, @ x,x u, . . . ), (t, x) 2 [0, T ] ⇥ X (1) u(0, x) = u 0 (x), x 2 X, B[u](t, x) = 0, (t, x) 2 [0, T ] ⇥ @X where u : [0, T ] ⇥ X ! R c is the solution of the PDE, where c is the field dimension, used to describe various field quantities such as velocity, pressure, and density, while u 0 (x) is the initial condition at time t = 0, and B[u](t, x) = 0 are the boundary conditions at x in @X, which is the boundary of the domain X. Here, @ x u, @ xx u, . . . are the partial derivatives of the solution u with respect to the domain, while @ t u is with respect to the time. The functional F describes the possibly non-linear interactions between the PDE's terms.

2.2. PROBLEM DEFINITION

We consider PDEs (Sec. 2.1) whose solution is described as a temporal sequence of field data u k k=0,...,N := u 0 , u 1 , ..., u N where u k is the field data at time step t k , that is, the state of the



(2016); Lusch et al. (2018); Sirignano & Spiliopoulos (2018); Raissi (2018); Kim et al. (2019); Hsieh et al. (2019); Bar-Sinai et al. (2019); Bhatnagar et al. (2019); Pfaff et al. (2020); Wang et al. (2020); Khoo et al.

CAPE: A FRAMEWORK FOR NEURAL PDE SOLVERS 2.1 BACKGROUND: PARTIAL DIFFERENTIAL EQUATIONS Following the notation by Brandstetter et al. (2022), we consider Partial Differential Equations (PDEs) over time dimension t 2 [0, T ] and over spatial dimensions x = [x 1 , . . . , x D ] 2 X ✓ R D which can be written as

