NEURAL NETWORK APPROXIMATIONS OF PDES BE-YOND LINEARITY: REPRESENTATIONAL PERSPECTIVE

Abstract

A burgeoning line of research has developed deep neural networks capable of approximating the solutions to high dimensional PDEs, opening related lines of theoretical inquiry focused on explaining how it is that these models appear to evade the curse of dimensionality. However, most theoretical analyses thus far have been limited to simple linear PDEs. In this work, we take a step towards studying the representational power of neural networks for approximating solutions to nonlinear PDEs. We focus on a class of PDEs known as nonlinear variational elliptic PDEs, whose solutions minimize an Euler-Lagrange energy functional E(u) = Ω L(∇u)dx. We show that if composing a function with Barron norm b with L produces a function of Barron norm at most B L b p , the solution to the PDE can be ϵ-approximated in the L 2 sense by a function with Barron norm O (dB L ) . By a classical result due to Barron (1993), this correspondingly bounds the size of a 2-layer neural network needed to approximate the solution. Treating p, ϵ, B L as constants, this quantity is polynomial in dimension, thus showing that neural networks can evade the curse of dimensionality. Our proof technique involves "neurally simulating" (preconditioned) gradient in an appropriate Hilbert space, which converges exponentially fast to the solution of the PDE, and such that we can bound the increase of the Barron norm at each iterate. Our results subsume and substantially generalize analogous prior results for linear elliptic PDEs.

1. INTRODUCTION

Scientific applications have become one of the new frontiers for the application of deep learning (Jumper et al., 2021; Tunyasuvunakool et al., 2021; Sønderby et al., 2020) . PDEs are one of the fundamental modeling techniques in scientific domains, and designing neural network-aided solvers, particularly in high-dimensions, is of widespread usage in many domains (Hsieh et al., 2019; Brandstetter et al., 2022) . One of the most common approaches for applying neural networks to solve PDEs is to parameterize the solution as a neural network and minimize a loss which characterizes the solution (Sirignano & Spiliopoulos, 2018; E & Yu, 2017) . The hope in doing so is to have a method which computationally avoids the "curse of dimensionality"-i.e., that scales less than exponentially with the ambient dimension. To date, neither theoretical analysis nor empirical applications have yielded a precise characterization of the range of PDEs for which neural network-aided methods outperform classical methods. Active research on the empirical side (Han et al., 2018; E et al., 2017; Li et al., 2020a; b) has explored several families of PDEs, e.g., Hamilton-Bellman-Jacobi and Black-Scholes, where neural networks have been demonstrated to outperform classical grid-based methods. On the theory side, a recent line of works (Marwah et al., 2021; Chen et al., 2021; 2022) 

has considered the following fundamental question:

For what families of PDEs can the solution be represented by a small neural network? The motivation for this question is computational: since the computational complexity of fitting a neural network (by minimizing some objective) will grow with its size. Specifically, these works focus on understanding when the approximating neural network can be sub-exponential in size, thus avoiding the curse of dimensionality. Unfortunately, the techniques introduced in this line of work have so far only been applicable to linear PDEs. In this paper, we take the first step beyond linear PDEs, with a particular focus on nonlinear variational elliptic PDEs. These equations have the form -div(∇L(∇u)) = 0 and are instances of nonlinear Euler-Lagrange equations. Equivalently, u is the minimizer of the energy functional E(u) = Ω L(∇u)dx. This paradigm is very generic: its origins are in Lagrangian formulations of classical mechanics, and for different L, a variety of variational problems can be modeled or learned (Schmidt & Lipson, 2009; Cranmer et al., 2020) . These PDEs have a variety of applications in scientific domains, e.g., (non-Newtonian) fluid dynamics (Koleva & Vulkov, 2018) , meteorology (Weller et al., 2016) , and nonlinear diffusion equations (Burgers, 2013) . Our main result is to show that when the function L has "low complexity", so does the solution. The notion of complexity we work with is the Barron norm of the function, similar to Chen et al. ( 2021); Lee et al. (2017) . This is a frequently used notion of complexity, as a function with small Barron norm can be represented by a small, two-layer neural network, due to a classical result (Barron, 1993) . Mathematically, our proof techniques are based on "neurally unfolding" an iterative preconditioned gradient descent in an appropriate function space: namely, we show that each of the iterates can be represented by a neural network with Barron norm not much worse than the Barron norm of the previous iterate-along with showing a bound on the number of required steps. Importantly, our results go beyond the typical non-parametric bounds on the size of an approximator network that can be easily shown by classical regularity results of the solution to the nonlinear variational PDEs (De Giorgi, 1957; Nash, 1957; 1958) along with universal approximation results (Yarotsky, 2017).

2. OVERVIEW OF RESULTS

Let Ω ⊂ R d be a bounded open set with 0 ∈ Ω and ∂Ω denote the boundary of Ω. Furthermore, we assume that the domain Ω is such that the Poincare constant C p is greater than 1 (see Theorem 2 for the exact definition of the Poincare constant). We first define the energy functional whose minimizers are represented by a nonlinear variational elliptic PDE-i.e., the Euler-Lagrange equation of the energy functional. Definition 1 (Energy functional). For all u : Ω → R such that u| ∂Ω = 0, we consider an energy functional of the following form: E(u) = Ω L(∇u)dx, where L : R d → R is a smooth and uniformly convex function , i.e., there exists constant 0 < λ ≤ Λ such that for all x ∈ R we have λI d ≤ D 2 L(x) ≤ ΛI d . Further, without loss of generalityfoot_0 , we assume that λ ≤ 1/C p . Note that due to the convexity of the function L, the minimizer u ⋆ exists and is unique. The proof of existence and uniqueness is standard (e.g., Theorem 3.3 in Fernández-Real & Ros-Oton (2020)). Writing down the condition for stationarity, we can derive a (nonlinear) elliptic PDE for the minimizer of the energy functional in Definition 1 . Lemma 1. Let u ⋆ : Ω → R be the unique minimizer for the energy functional in Definition 1. Then for all φ ∈ H 1 0 (Ω) the minimizer u ⋆ satisfies the following condition, DE[u](φ) = Ω ∇L(∇u)∇φdx = 0, where dE[u](φ) denotes the dirctional derivative of the energy functional calculated at u in the direction of φ. Thus, the minimizer u ⋆ of the energy functional satisfies the following PDE: DE(u) := -div(∇L(∇u)) = 0 ∀x ∈ Ω. (3) and u(x) = 0, ∀x ∈ ∂Ω. Here div denote the divergence operator.



Since λ is a lower bound on the strong convexity constant. If we choose a weaker lower bound, we can always ensure λ ≤ 1/Cp.

