A STABLE AND SCALABLE METHOD FOR SOLVING INITIAL VALUE PDES WITH NEURAL NETWORKS

Abstract

Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs applicability to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural-IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.

1. INTRODUCTION

Partial differential equations (PDEs) are needed to describe many phenomena in the natural sciences. PDEs that model complex phenomena cannot be solved analytically and many numerical techniques are used to computer their solutions. Classical techniques such as finite differences rely on grids and provide efficient and accurate solutions when the dimensionality is low (d = 1, 2). Yet, the computational and memory costs of using grids or meshes scales exponentially with the dimension, making it extremely challenging to solve PDEs accurately in more than 3 dimensions. Neural networks have shown considerable success in modeling and reconstructing functions on highdimensional structured data such as images or text, but also for unstructured tabular data and spatial functions. Neural networks sidestep the "curse of dimensionality" by learning representations of the data that enables them to perform efficiently. In this respect, neural networks have similar benefits and drawbacks as Monte Carlo methods. The approximation error ϵ converges at a rate ϵ ∝ 1/ √ n from statistical fluctuations where n is the number of data points or Monte Carlo samples. Expressed inversely, we would need: n ∝ e 2 log 1/ϵ samples to get error ϵ, a compute grows exponentially in the number of significant digits instead of exponential in the dimension as it is for grids. For many problems this tradeoff is favorable and an approximate solution is much better than no solution. Thus, it is natural to consider neural networks for solving PDEs whose dimensionality makes standard approaches intractable. While first investigated in Dissanayake & Phan-Thien (1994) and Lagaris et al. (1998) , recent developments by Yu et al. (2018) and Sirignano & Spiliopoulos (2018) have shown that neural networks can successfully approximate the solution by forcing them to satisfy the dynamics of the PDE on collocation points in the spatio-temporal domain. In particular, the global collocation approaches have proven effective for solving boundary value problems where the neural network can successfully approximate the solution. However, for initial value problems (IVPs), treating time as merely another spatial dimension results in complications for the neural network like catastrophic forgetting. Some heuristics have been developed to ameliorate this latter problem, such as increasing the collocation points as time progresses, but then the computational cost of training the neural network becomes impractical. Recently, Du & Zaki (2021) and Bruna et al. ( 2022) have provided two methods that follow a novel local-in-time approach for training neural networks to solve IVPs by updating the network parameters sequentially through time rather than by having some fixed set of parameters to model the whole spatio-temporal domain. These methods have proven successful for a variety of PDEs, but they currently suffer from two shortcomings. First, the conditioning of the linear systems required to follow the ODE on the network parameters degrades over time, leading to longer solving times and ultimately to a complete breakdown of the solution. Second, the current methodologies lack the capacity to represent difficult initial conditions and solutions as their runtime scales cubically in the number of network parameters, limiting their ability to use large neural networks. In this work we provide a local-in-time IVP solver (Neural-IVP) that circumvents the shortcomings of Du & Zaki (2021) and Bruna et al. ( 2022) and thus enable us to solve challenging PDEs. In particular: • Leveraging fast matrix vector multiplies and preconditioned conjugate gradients, we develop an approach that scales only linearly in the number of parameters, allowing us to use considerably larger neural networks and more data. • We further improve the representational power and quality of the fit to initial conditions through the use of last layer linear solves and sinusoidal embeddings. • We show how following the parameter ODE leads the network parameters to an increasingly poorly conditioned region of the parameter space, and we show how this relates to exact and approximate parameter symmetries in the network. • Using regularization, restarts, and last layer finetuning, we are able to prevent the parameters from reaching these poorly conditioned regions, thereby stabilizing the method. We provide a code implementation at https://github.com/mfinzi/neural-ivp.

2. BACKGROUND

Given a spatial domain X ⊆ R D , we will consider the evolution of a time-dependent function u : X × [0, T ] → R k which at all times belongs to some functional space U and with dynamics governed by ∂ t u (x, t) = L [u] (x, t) for (x, t) ∈ X × [0, T ] u (x, 0) = u 0 (x) for x ∈ X u (x, t) = h (x, t) for x ∈ ∂X × [0, T ] where u 0 ∈ U is the initial condition, h is spatial boundary condition, and L is the (possibly nonlinear) operator containing spatial derivatives. We can represent PDEs with higher order derivatives in time, such as the wave equation ∂ 2 t ϕ = ∆ϕ, by reducing them to a system of first order in time equations u := [ϕ, ∂ t ϕ], where in this example L[u 0 , u 1 ] = [u 1 , ∆u 0 ].

Global Collocation Methods

The first approaches for solving PDEs via neural networks are based on the idea of sampling uniformly on the whole spatio-temporal domain and ensuring that the neural network obeys the PDE by minimizing the PDE residual (or a proxy for it). This approach was initially proposed by Dissanayake & Phan-Thien (1994) and Lagaris et al. (1998) , which used neural networks as approximate solutions. However, recent advances in automatic differentiation, compute, and neural network architecture have enabled successful applications such as the Deep Galerkin Method (Sirignano & Spiliopoulos, 2018 ), Deep Ritz Method (Yu et al., 2018 ), and PINN (Raissi et al., 2019) , which have revitalized interest in using neural networks to solve PDEs. Learning From Simulations Not all approaches use neural networks as a basis function to represent the PDE solution. Some approaches focus on directly learning the PDE operator as in Lu et al. (2019) or Kovachki et al. (2021) , where the operator can be learned from simulation. However, as these methods typically use grids, their purpose is to accelerate existing solvers rather than tackling new problems. Other approaches that do not rely on collocation points exploit specific information of elliptic and semi-linear parabolic PDEs, such as E. et al. (2017) and Han et al. (2018) .

