SOBOLEV TRAINING FOR THE NEURAL NETWORK SO-LUTIONS OF PDES

Abstract

Approximating the numerical solutions of partial differential equations (PDEs) using neural networks is a promising application of deep learning. The smooth architecture of a fully connected neural network is appropriate for finding the solutions of PDEs; the corresponding loss function can also be intuitively designed and guarantees the convergence for various kinds of PDEs. However, the rate of convergence has been considered as a weakness of this approach. This paper introduces a novel loss function for the training of neural networks to find the solutions of PDEs, making the training substantially efficient. Inspired by the recent studies that incorporate derivative information for the training of neural networks, we develop a loss function that guides a neural network to reduce the error in the corresponding Sobolev space. Surprisingly, a simple modification of the loss function can make the training process similar to Sobolev Training although solving PDEs with neural networks is not a fully supervised learning task. We provide several theoretical justifications for such an approach for the viscous Burgers equation and the kinetic Fokker-Planck equation. We also present several simulation results, which show that compared with the traditional L 2 loss function, the proposed loss function guides the neural network to a significantly faster convergence. Moreover, we provide the empirical evidence that shows that the proposed loss function, together with the iterative sampling techniques, performs better in solving high dimensional PDEs.

1. INTRODUCTION

Deep learning has achieved remarkable success in many scientific fields, including computer vision and natural language processing. In addition to engineering, deep learning has been successfully applied to the field of scientific computing. Particularly, the use of neural networks for the numerical integration of partial differential equations (PDEs) has emerged as a new important application of the deep learning. Being a universal approximator (Cybenko, 1989; Hornik et al., 1989; Li, 1996) , a neural network can approximate solutions of complex PDEs. To find the neural network solution of a PDE, a neural network is trained on a domain wherein the PDE is defined. Training a neural network comprises the following: feeding the input data through forward pass and minimizing a predefined loss function with respect to the network parameters through backward pass. In the traditional supervised learning setting, the loss function is designed to guide the neural network to generate the same output as the target data for the given input data. However, while solving PDEs using neural networks, the target values that correspond to the analytic solution are not available. One possible way to guide the neural network to produce the same output as the solution of the PDE is to penalize the neural network to satisfy the PDE itself (Sirignano & Spiliopoulos, 2018; Berg & Nyström, 2018; Raissi et al., 2019; Hwang et al., 2020) . Unlike the traditional mesh-based schemes including the finite difference method (FDM) and the finite element method (FEM), neural networks are inherently mesh-free function-approximators. Advantageously, as mesh-free function-approximators, neural networks can avoid the curse of dimensionality (Sirignano & Spiliopoulos, 2018) and approximate the solutions of PDEs on complex geometries (Berg & Nyström, 2018) . Recently, Hwang et al. (2020) showed that neural networks could approximate the solutions of kinetic Fokker-Planck equations under not only various kinds of kinetic boundary conditions but also several irregular initial conditions. Moreover, they showed that the neural networks automatically approximate the macroscopic physical quantities including the kinetic energy, the entropy, the free energy, and the asymptotic behavior of the solutions. Further issues including the inverse problem were investigated by Raissi et al. (2019); Jo et al. (2020) . Although the neural network approach can be used to solve several complex PDEs in various kinds of settings, it requires relatively high computational cost compared to the traditional mesh-based schemes in general. To resolve this issue, we propose a novel loss function using Sobolev norms in this paper. Inspired by a recent study that incorporated derivative information for the training of neural networks (Czarnecki et al., 2017) , we develop a loss function that efficiently guides neural networks to find the solutions of PDEs. We prove that the H 1 and H 2 norms of the approximation errors converge to zero as our loss functions tend to zero for the 1-D Heat equation, the 1-D viscous Burgers equation, and the 1-D kinetic Fokker-Planck equation. Moreover, we show via several simulation results that the number of epochs to achieve a certain accuracy is significantly reduced as the order of derivatives in the loss function gets higher, provided that the solution is smooth. This study might pave the way for overcoming the issue of high computational cost when solving PDEs using neural networks. The main contributions of this work are threefold: 1) We introduce novel loss functions that enable the Sobolev Training of neural networks for solving PDEs. 2) We prove that the proposed loss functions guarantee the convergence of neuarl networks in the corresponding Sobolev spaces although it is not a supervised learning task. 3) We empirically demonstrate the effect of Sobolev Training for several regression problems and the improved performances of our loss functions in solving several PDEs including the heat equation, Burgers' equation, the Fokker-Planck equation, and high-dimensional Poisson equation.

2. RELATED WORKS

Training neural networks to approximate the solutions of PDEs has been intensively studied over the past decades. For example, Lagaris et al. (1998; 2000) used neural networks to solve Ordinary Differential Equations (ODEs) and PDEs on a predefined set of grid points. Subsequently, Sirignano & Spiliopoulos (2018) proposed a method to solve high-dimensional PDEs by approximating the solution using a neural network. They focused on the fact that the traditional finite mesh-based scheme becomes computationally intractable when the dimension becomes high. However, because neural networks are mesh-free function-approximators, they can solve high-dimensional PDEs by incorporating mini-batch sampling. Furthermore, the authors showed the convergence of the neural network to the solution of quasilinear parabolic PDEs under certain conditions. Recently, Raissi et al. ( 2019) reported that one can use observed data to solve PDEs using physicsinformed neural networks (PINNs). Notably, PINNs can solve a supervised regression problem on observed data while satisfying any physical properties given by nonlinear PDEs. A significant advantage of PINNs is that the data-driven discovery of PDEs, also called the inverse problem, is possible with a small change in the code. The authors provided several numerical simulations for various types of nonlinear PDEs including the Navier-Stokes equation and Burgers' equation. The first theoretical justification for PINNs was provided by Shin et al. (2020) , who showed that a sequence of neural networks converges to the solutions of linear elliptic and parabolic PDEs in L 2 sense as the number of observed data increases. There also exists a study aiming to enhance the convergence of PINNs (van der Meer et al., 2020) . Additionally, several works related deep neural networks with PDEs but not by the direct approximation of the solutions of PDEs. For instance, Long et al. (2018) attempted to discover the hidden physics model from data by learning differential operators. A fast, iterative PDE-solver was proposed by learning to modify each iteration of the existing solver (Hsieh et al., 2019) 



. A deep backward stochastic differential equation (BSDE) solver was proposed and investigated in Weinan et al. (2017); Han et al. (2018) for solving high-dimensional parabolic PDEs by reformulating them using BSDE. The main strategy of the present study is to leverage derivative information while solving PDEs via neural networks. The authors of Czarnecki et al. (2017) first proposed Sobolev Training that uses derivative information of the target function when training a neural network by slightly modifying

