MOMENTUM DIMINISHES THE EFFECT OF SPECTRAL BIAS IN PHYSICS-INFORMED NEURAL NETWORKS Anonymous authors Paper under double-blind review

Abstract

Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs). However, even the simplest PDEs, often fail to converge to desirable solutions when the target function contains high-frequency modes, due to a phenomenon known as spectral bias. In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under stochastic gradient descent with momentum (GDM). This demonstrates GDM significantly reduces the effect of spectral bias. We have also examined why training a model via the Adam optimizer can accelerate the convergence while reducing the spectral bias. Moreover, our numerical experiments have confirmed that wideenough networks using GDM or Adam still converge to desirable solutions, even in the presence of high-frequency features.

1. INTRODUCTION

Physics-informed neural networks (PINNs) have been proposed as alternatives to traditional numerical partial differential equations (PDEs) solvers (Raissi et al., 2019; 2020; Sirignano & Spiliopoulos, 2018; Tripathy & Bilionis, 2018) . In PINNs, a PDE which describes the physical domain knowledge of a problem is added as a regularization term to an empirical loss function. Although PINNs has shown remarkable performance in solving a wide range of problems in science and engineering (Cai et al., 2022; Kharazmi et al., 2019; Sun et al., 2020; Kissas et al., 2020; Tartakovsky et al., 2018) , regardless of the simplicity of a PDE itself, they often fail to converge to accurate solutions when the target function contains high-frequency features (Krishnapriyan et al., 2021; Wang et al., 2021) . This phenomenon known as the spectral bias exists in even the simplest linear PDEs (Wang et al., 2021; Moseley et al., 2021; Krishnapriyan et al., 2021) . Spectral bias is not limited to PINNs. Rahaman et al. (2019) empirically showed that all fullyconnected feed-forward neural networks (NNs) are biased against learning complex components of target functions. Furthermore, Cao et al. ( 2019) theoretically proved that in training infinitely-wide networks with squared loss, the corresponding eigenvalues of the neural tangent kernel (NTK) (Jacot et al., 2018) indicate the exact convergence rate for different components of the target functions. Thus, spectral bias happens when the absolute values of some of the eigenvalues of the NTK are large while others are small. Recently, utilizing the NTK of infinitely-wide PINNs, Wang et al. ( 2022) examined the gradient flow of these networks during training. They proved that the training error decays based on e -κit , where κ i are the eigenvalues of the NTK. Thus, the components of the target function corresponding to the smaller eigenvalues have a slower rate of decay, which causes spectral bias. To tackle the issue of spectral bias, they proposed to assign a weight to each term of the loss function and dynamically update it. Although the results showed some improvements, as the frequency of the target function increased, their proposed PINN still failed to converge to solutions of PDEs. Moreover, as assigning weights can result in indefinite kernels, the training process could become extremely unstable. Of note, compared to the typical NNs, analyzing the effect of spectral bias for PINNs is more challenging as the loss function is regularized by means of adding the PDE equation. Thus, Wang et al. ( 2022)'s study was limited to training the model only based on GD. Some studies proposed an alternative approach in which instead of modifying the loss function terms, a high-frequency PDE is solved in a few successive steps. In these methods, it is assumed that the optimal solution of low-frequency PDEs is close to the optimal solution of high-frequency 1

