BREAKING THE CURSE OF DIMENSIONALITY FOR PARAMETRIC ELLIPTIC PDES

Abstract

Motivated by recent empirical success, we examine how neural network-based ansatz classes can break the curse of dimensionality for high-dimensional, nonlinear elliptic partial differential equations (PDEs) with variational structure. The high-dimensionality of the PDEs can either be induced through a high-dimensional physical domain or a high-dimensional parameter space. The latter include parametric right-hand sides, parametric domains, and material constants. Our main result shows that any scheme that computes neural network based W 1,p -approximations, leverages the extraordinary approximation capabilities of neural networks and, thus, is able to beat the curse of dimensionality if the ground truth solution is smooth or possesses Barron regularity. Popular examples of W 1,p -convergent schemes include, e.g., the Deep Ritz Method and physics-informed neural networks. We present numerical experiments supporting our theoretical findings.

1. INTRODUCTION

High-dimensional partial differential equations (PDEs) arise naturally in applications with either a high-dimensional domain, a high-dimensional parameter space, or possibly with both. The former includes the Schrödinger equation in quantum physics, the Black-Scholes equation in finance, and the Hamilton-Jacobi-Bellman equation in control theory, we refer to Weinan et al. (2021) ; Bellman (1954) . On the other hand, examples of problems with high-dimensional parameter space are ubiquitous in engineering applications, for instance, in varying material properties, right-hand sides or even in the form of varying computational domains, as discussed in Hennigh et al. (2021) ; Ohlberger & Rave (2016) . For problems with a high-dimensional physical domain, classical mesh-based approximation schemes face the curse of dimensionality, meaning that the computational cost increases exponentially with the dimension of the problem. In the case of parametric problems, one is typically interested in querying the PDE solution for many different parameter instances, possibly with low inference time. To this end, classical methods need to repeatedly solve the equations for every required parameter instance, a potentially prohibitively expensive or slow computational task, see Biegler et al. (2007) . Even assuming additional, favorable structure of the solution of a high-dimensional PDE -may it be a latent low-dimensionality of the solution or a high degree of smoothness -it remains a challenge for classical methods to approximate the solution with an acceptable accuracy, especially in situations of non-linear solution manifolds as discussed in Ohlberger & Rave (2016) ; Lee & Carlberg (2020) . Artificial neural networks have shown great potential in the approximation of high-dimensional functions, among those computer vision, classification and natural language processing tasks and are known to possess extraordinary approximation capabilities with the possibility to achieve dimensionindependent approximation rates for certain function classes, see Ma et al. (2022) ; Barron (1993) ; Yarotsky (2017); Gühring & Raslan (2021) ; Gühring et al. (2020) . Therefore, investigating artificial neural networks as ansatz classes for the solution of PDEs or PDE solution operators has recently gained increased interest for high-dimensional and parametric problems. We refer to Kutyniok et al. (2022) ; Weinan & Wojtowytsch (2022) ; Jentzen et al. (2021) ; Chen et al. (2021) for theoretical studies. Successful empirical results of neural network-based applications to PDEs posed in high-dimensional spaces include Hermann et al. (2020) ; Yu & E (2018); Han et al. (2018) ; Sirignano & Spiliopoulos (2018) . For the parametric setting, we direct the reader to Li et al. (2021) ; Khoo et al. (2021) ; Lee & Carlberg (2020) ; Geist et al. (2021) . Of the aforementioned contributions, the theoretical works either focus on approximation theoretic results or consider linear problems without parametric dependencies. We discuss the relation to our contribution in detail in Section 1.1. The approximation theoretic results guarantee the existence of a neural network with desirable approximation rates but provide no practical way to compute the neural network. For non-linear, parametric PDEs with p-structure, our results deliver the necessary PDE analysis -in a setting suitable for neural network ansatz functions -to alleviate this problem. Instead of explicitly constructing a neural network, we show that it suffices to find a neural network approximation that is close "in energy" to the ground truth solution. This makes a significant difference: Energy approximations can be found by using the variational energy as a loss function or more generally by any W 1,p -convergent approximation scheme -a natural property of a reasonable approximation algorithm. To summarize, our main contributions are the following: • We show that every energy convergent approximation scheme for the p-Dirichlet energy utilizing neural network ansatz functions can leverage the extraordinary approximation capabilities of neural networks. Further, we explain which assumptions on the ground truth solution allow neural networks to beat the curse of dimensionality. We extend these results to parametric problems, where the neural network approximates simultaneously in the physical and the parameter space. Our contributions are the first quantitative error estimates for parametric and non-linear elliptic PDEs for neural network approximation schemes. • From a mathematical point of view, the analysis of non-linear ansatz classes is novel in the case of the p-Laplacian. Existing literature exclusively exploits strategies based on optimality conditions (Galerkin orthogonality) only available for linear ansatz classes, hence, excludes neural networks. Further, to the best of the author's knowledge, error estimates for parametric problems have not been considered in the existing literature.

1.1. MAIN RESULT AND RELATED WORK

For clarity, we present our main result for the case of a parametric right-hand side and with homogeneous Neumann boundary conditions. However, different boundary conditions and parametric dependencies are covered by our analysis. We explain this in the Appendix and refer to Section C. Consider a physical domain Ω ⊆ R dΩ , d Ω ∈ N, and a parameter space P ⊆ R d P , d P ∈ N. Further, let p ∈ (1, ∞) be fixed and denote by (f (τ , •)) τ ∈P a parametric family of right-hand sides. We study the non-linear p-Laplace problem as a prototypical, non-linear elliptic PDE. More precisely, we want to find u * : P × Ω → R satisfying -div |∇ x u * (τ , x)| p-2 ∇ x u * (τ , x) = f (τ , x) for a.e. (τ , x) ⊤ ∈ P × Ω , subjected to -for simplicity -homogeneous Neumann boundary conditions. The case p = 2 retrieves the classical Poisson equation. In this example, the parametric dependencies are induced through the right-hand side and both Ω and P may be high-dimensional. Then, we seek a neural network u θ with input (τ , x) ⊤ ∈ R d P × R dΩ that approximates the solution u * : P × Ω → R simultaneously in the physical domain Ω and the parameter space P. Essential for the statement of our result is the reformulation of equation equation 1 as a minimization problem. We find u * ∈ L p (P, W 1,p (Ω)) as a minimizer of E : L p (P, W 1,p (Ω)) → R, for every v ∈ L p (P, W 1,p (Ω)) defined by E(v) := ˆP 1 p ˆΩ |∇ x v(τ , x)| p dx -ˆΩ f (τ , x) v(τ , x) dx dτ . Then, our main result is the following.  Theorem 1. Let Ω ⊆ R dΩ , d Ω ∈ N, -div(|∇ x u * | p-2 ∇ x u * ) = f in P × Ω , ∂ n u * = 0 on P × ∂Ω . Let M ⊂ Wfoot_0,p (P × Ω) be any subset that contains the zero function 1 and let v ∈ M be arbitrary. Setting M := u ∈ M | ∥∇ x u∥ L p (P×Ω) d ≤ 2∥∇ x u * ∥ L p (P×Ω) d , it holds ∥∇ x v -∇ x u * ∥ L p (P×Ω) d ≾    δ(v) 1/p + inf ṽ∈ M ∥∇ x ṽ -∇ x u * ∥ 2 p L p (P×Ω) d if p ∈ [2, ∞) δ(v) 1/2 + inf ṽ∈ M ∥∇ x ṽ -∇ x u * ∥ p 2 L p (P×Ω) d if p ∈ (1, 2) , where δ(v) := E(v) -inf ṽ∈ M E(ṽ) is the optimization error and the implicit constants depend on p, Ω and ∥f ∥ L p ′ (P×Ω) only. We stress again that the choice of Neumann boundary conditions is for simplicity of presentation. A similar result holds for Dirichlet boundary conditions employing an appropriate penalty scheme. The fact that f (τ , •) for a.e. τ ∈ P is mean-value-free serves to guarantee the well-posedness in the Neumann boundary value case. The main reason to pass from M to M is solely of technical nature, rooted in Lemma 7 and not of relevance for the interpretation of the result. Leveraging the Power of Approximation-Theoretical Results. As we discuss below, suitable approximation theorems allow to estimate the infimum in Theorem 1 to deduce error decay rates. Note that there is no further requirement on the approximating function v ∈ M than that it is a "good" quasi-optimizer of E : M → R, i.e., that δ(v) is sufficiently small. Thus, any algorithm that produces approximate solutions that converge in energy is able to fully leverage the approximation capabilities of neural network ansatz classes -up to the exponent 2/p or p/2, which is due to the non-linearity. Furthermore, energy convergence is equivalent to convergence in the W 1,p -semi-normfoot_1 We stress the drastic difference of our contribution to mere approximation theoretical results that only guarantee the existence of a well-approximating network, yet don't unveil how such an approximation should be found. In this sense, our contribution is orthogonal to approximation theoretical results as it can be combined with these to extend them. Finally note that, analyzing the effect of a solution scheme on the achievable value of δ(v) is a difficult problem, typically connected to a non-convex optimization task, that we do not study in this article. Using Smoothness to Beat the Curse of Dimensionality. We can utilize quantitative universal approximation results to estimate the infimum in Theorem 1. In some situations, this allows us to beat the curse of dimensionality. Assume that the solution u * to equation 3 is a member of W k,p (P × Ω) for some k ∈ N, k > 1. Then, for every n ∈ N, we may use Theorem 4.9 in Gühring & Raslan (2021) to guarantee the existence of a fully connected neural network architecture with ReLU 2 -activationfoot_2 with parameter space Θ n of dimension O(n) such that, setting M = F Θn , where F Θn denotes the realization set of the ansatz class, it holds inf ψ∈Θn ∥∇ x u ψ -∇ x u * ∥ L p (P×Ω) d ≾ ∥u * ∥ 2 p W k,p (P×Ω) 1 n 2 p k-1 d Ω +d P for the case p ≥ 2 and with 2 p replaced by p 2 in the case p < 2. Hence, for arbitrary u θ ∈ M , θ ∈ Θ n , we get ∥∇ x u θ -∇ x u * ∥ L p (P×Ω) d ≾ δ n (u θ ) 1/p + ∥u * ∥ W k,p (P×Ω) 1 n 2 p • k-1 d Ω +d P if p ∈ [2, ∞) δ n (u θ ) 1/2 + ∥u * ∥ W k,p (P×Ω) 1 n p 2 • k-1 d Ω +d P if p ∈ (1, 2) , where δ n (u θ ) := E(u θ ) -inf ψ∈Θn E(u ψ ). This shows -given sufficient smoothness of u * ∈ M -that the error of the neural network approximation does not decay exponentially slow in the dimension n ∈ N of the parameter space Θ n . More precisely, the result requires dimension-dependent smoothness with the smoothness parameter k ∈ N, k > 1, scaling like k ∼ d Ω + d P . However, the assumption of smoothness is very natural in the context of (linear) elliptic PDEs and holds also in the parametric case, see Lemma B which gives an easily verifiable criterion when the smoothness assumption holds. Employing Barron Regularity to Beat the Curse of Dimensionality. A different situation where the curse of dimensionality can be circumvented is when u * ∈ M is a member of the Barron space B, or can be "well-approximated" by Barron functions. We refer to Barron (1993) ; Ma et al. (2022) for a definition of the Barron space. In essence, members of B can be approximated with respect to the H 1 -norm by shallow neural networks with a dimension-independent rate of n -1/2 , where n ∈ N is the width of the shallow network. Hence, setting p = 2 and assuming u * ∈ B, we can estimate for an arbitrary shallow neural network u θ ∈ M ∥∇ x u θ -∇ x u * ∥ L 2 (P×Ω) d ≾ δ 1 2 n + 1 n 1 2 ∥u * ∥ B . The assumption u * ∈ B is too restrictive in general, cf. the discussion in Weinan & Wojtowytsch (2022) . However, assuming that the data f ∈ L 2 (P × Ω) (and possibly coefficients) are of Barron regularity, it was recently established that the solution u * ∈ M can be approximated by Barron functions with Barron norm growing only polynomially in the dimension, yielding the rate n -1/2 for shallow networks of width (dn) C log (n) , where C is a constant, we refer to Chen et al. (2021) . Note that the result in Chen et al. ( 2021) so far does only hold for linear elliptic PDEs and special activation functions, and does not include parametric dependencies. Our result, then, shows that this error decay rate is preserved for any energy-convergent approximation scheme and is, in fact, not a mere approximation result.

Related Work

For PDEs that admit a stochastic representation, several situation are known in which the curse of dimensionality can be circumvented, Jentzen et al. (2021) ; Han et al. (2018) ; Weinan et al. (2021) . These results are of approximation theoretic nature and do not provide a way to construct the approximating network. Further, this approach crucially relies on the stochastic representation of the PDE's solution and, thus, is not generally applicable. The works Xu ( 2020 2022) are similar to our contribution since they consider elliptic equations and provide a Céa type Lemma and, consequently, are not only approximation theoretic results. However, they neither analyze non-linear nor parametric equations. The contributions in Chen et al. (2021) ; Weinan & Wojtowytsch (2022) mark the beginning of a regularity theory for elliptic equations with respect to Barron spaces. These results are complementary to our analysis in the sense that they can be combined with our contribution. For instance, the main result of Chen et al. (2021) states that a solution to a linear elliptic PDE with Barron data is "almost" of Barron regularity and can be approximated with a polynomial rate with respect to the dimension. Our analysis, then, guarantees that every neural network approximation that is close in energy to the ground truth solution realizes this rate. Notation For a Banach space X with norm ∥ • ∥ X : X → R ≥0 , we denote by X * , its (topological) dual space equipped with the norm (Adams & Fournier, 2003, Chapter 2) . ∥ • ∥ X * : X * → R ≥0 , defined by ∥x * ∥ X * := sup ∥x∥ X ≤1 ⟨x * , x⟩ X for all x * ∈ X * . Here, ⟨•, •⟩ X : X * × X → R denotes the duality pairing, defined by ⟨x * , x⟩ X := x * (x) for all x * ∈ X * and x ∈ X. For p ∈ [1, ∞], we denote by L p (Ω), the space of (Lebesge-)measurable functions u : Ω → R that are integrable in p-th power, i.e., ´Ω |u| p dx < ∞ if p ∈ [1, ∞) and ess sup x∈Ω |u(x)| < ∞ if p = ∞. Endowed with the norm ∥u∥ L p (Ω) := ( ´Ω |u| p dx) 1 p if p ∈ [1, ∞) and ∥u∥ L ∞ (Ω) := ess sup x∈Ω |u(x)| if p = ∞, the space L p (Ω) forms a Banach space, which is separable if p ∈ [1, ∞) and reflexive if p ∈ (1, ∞), cf. For k ∈ N and p ∈ [1, ∞], we denote by W k,p (Ω), the space of functions in L p (Ω) with distributional derivatives up to k-th order in L p (Ω). Endowed with the norm ∥u∥ (Adams & Fournier, 2003, Chapter 3) . For k ∈ N and p ∈ [1, ∞], we denote by W k,p 0 (Ω), the closure of all compactly supported, smooth functions C ∞ c (Ω) in W k,p (Ω). We always denote parameter-dependent functions by boldface letters, e.g., u, v, w, . . . , and parameterindependent functions by non-boldface letters, e.g., u, v, w, . . . . In the same spirit, we denote by E : L p (P; W 1,p (Ω)) → R, the parametric p-Dirichlet energy equation 2 and by E : W 1,p (Ω) → R, the non-parametric p-Dirichlet energy (c.f. equation 6). W k,p (Ω) := k l=0 ∥D l u∥ L p (Ω) , the space W k,p (Ω) forms a Banach space, which is separable if p ∈ [1, ∞) and reflexive if p ∈ (1, ∞), cf.

2. PROOF OF THE MAIN RESULT

In this section, we provide the proof of Theorem 1. For clarity, we first consider the non-parametric case and extend the results afterwards to include parametric dependencies.

2.1. PROOF OF THE NON-PARAMETRIC SETTING

The main step in the proof of the non-parametric version of Theorem 1 is to show that convergence in energy, i.e., E(u n ) → E(u * ) (n → ∞) (cf. equation 6) for a neural network approximation u n ∈ M of the ground truth solution u * ∈ M , is equivalent to the convergence of u n → u * (n → ∞) in the Sobolev topology. To establish this in a quantitative fashion, we need an optimal measure of the convexity of the p-Dirichlet energy (cf. equation 6). This is given through the bi-variate, symmetric mapping ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R, defined by ρ 2 F (v, w) := ∥F (∇v) -F (∇w)∥ 2 L 2 (Ω) d for all v, w ∈ W 1,p (Ω) , where F : R d → R d is defined by F (a) := |a| p-2 2 a for all a ∈ R d . ( ) The map ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R is the optimal distance measure for the p-Dirichlet problem. This is embodied in the two-sided estimate proved in the next proposition, see equation 7, that relates convergence in energy to convergence in terms of ρ 2 Next, we establish a Céa type Lemma in terms of ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R, see Lemma 5. This decomposes the distance of u n to u * into an optimization error and an approximation theoretic contribution. Finally, we study the relation of ρ 2 F : W 1,p (Ω)×W 1,p (Ω) → R to the standard Sobolev topology in Lemma 7. F : W 1,p (Ω) × W 1,p (Ω) → R. In the literature, ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R is From a technical perspective, the central estimate in equation 7 is proved via a Taylor expansion of the p-Dirichlet energy around its minimizer u * ∈ W 1,p (Ω). However, care needs to be taken since E : W 1,p (Ω) → R is not twice continuously differentiable and a subtle regularization procedure needs to be employed to rigorously carry out the expansion. Proposition 2. Let Ω ⊆ R d , d ∈ N, be a bounded domain, f ∈ W 1,p (Ω) * , p ∈ (1, ∞), and let U ⊆ W 1,p (Ω) be a closed subspace such that Poincaré's inequality is valid, i.e., there exists a constant C P > 0 such that for every v ∈ U , it holds ∥v∥ L p (Ω) ≤ C P ∥∇v∥ L p (Ω) d . Moreover, define E : U → R for every v ∈ U by E(v) := 1 p ˆΩ |∇v| p dx -⟨f, v⟩ W 1,p (Ω) . Then, the following statements apply: (i) There exists a unique minimizer u * ∈ U for E : U → R. (ii) There exists a constant c(p) > 0, depending only on p ∈ (1, ∞) and not depending on d ∈ N, such that for every v ∈ U , it holds c(p) -1 ρ 2 F (v, u * ) ≤ E(v) -E(u * ) ≤ c(p) ρ 2 F (v, u * ) . (7) Moreover, we can choose c(p) > 0 such that (p → c(p)) ∈ C 0 (1, ∞). Proof. The proof is provided in the Appendix, see A. Remark 3 (The Case p = 2). In the case p = 2, we retrieve the well-known Dirichlet energy. Further, equality holds in equation 7 with constant c(p) = 1 2 . More precisely, for every v ∈ U , we have that E(v) -E(u * ) = 1 2 ∥∇v -∇u * ∥ 2 L 2 (Ω) d = ρ 2 F (v, u * ) . This can be shown by a straight-forward Taylor expansion of E : U → R around u * ∈ U , cf. Müller & Zeinhofer (2022) . Remark 4 (The Role of the Space U ). The space U encodes boundary conditions, for example, U = W 1,p 0 (Ω) is an admissible choice. However, when choosing U = W 1,p (Ω) and requiring that the right-hand side f ∈ W 1,p (Ω) * vanishes on constant functions, Proposition 2 stays valid with the exemption of the uniqueness of the minimizer u * ∈ U . In this case, u * ∈ U is only determined up to additive constants. This can be seen by considering the energy E on the quotient space W 1,p (Ω) modulo the constant functions. On this space, a Poincaré inequality is available. An immediate consequence of Theorem 2 is the following Céa type lemma. Lemma 5 (Céa Lemma). Let the assumptions of Proposition 2 be satisfied. Moreover, let M ⊆ U be an arbitrary subset. Then, there exists a constant c(p) > 0, depending only on p ∈ (1, ∞) and not depending on d ∈ N, such that for every v ∈ M , it holds ρ 2 F (v, u * ) ≤ c(p) δ(v) + inf ṽ∈M ρ 2 F (ṽ, u * ) , where δ(v) := E(v) -inf ṽ∈M E(ṽ) is the optimization error. Moreover, we can choose c(p) > 0 such that (p → c(p)) ∈ C 0 (1, ∞). Proof of Lemma 5. Let v ∈ M be arbitrary. Then, by referring to Theorem 2, we find that c(p) -1 ρ 2 F (v, u * ) ≤ E(v) -inf ṽ∈M E(ṽ) + inf ṽ∈M E(ṽ) -E(u * ) ≤ δ(v) + c(p) inf ṽ∈M ρ 2 F (ṽ, u) . □ Remark 6. Note that we do not need to impose any structure on the set M , in particular, it does not need to possess a linear structure. This, in contrast to classical formulations of Céa's Lemma, allows us to choose M as an ansatz class consisting of neural networks. In order to arrive at error decay rates in Sobolev topology, we need the relation of ρ 2 F to the W 1,p (Ω)-semi-norm. Lemma 7 (Relation Between Natural Distance and W 1,p -Semi-Norm). Let Ω ⊆ R d , d ∈ N, be a bounded domain and p ∈ (1, ∞). Then, there exists a constant c(p) > 0, depending only on p ∈ (1, ∞) and not depending on d ∈ N, such that the following relations apply: (i) If p ∈ [2, ∞), then for every u, v ∈ W 1,p (Ω), it holds c(p) -1 ∥∇u -∇v∥ p L p (Ω) d ≤ ρ 2 F (u, v) ≤ c(p) ∥∇u∥ L p (Ω) d + ∥∇v∥ L p (Ω) d p-2 ∥∇u -∇v∥ 2 L p (Ω) d . (ii) If p ∈ (1, 2), then for every v, w ∈ W 1,p (Ω), it holds c(p) -1 ρ 2 F (u, v) ≤ ∥∇u -∇v∥ p L p (Ω) d ≤ c(p) ∥∇u∥ L p (Ω) d + ∥∇v∥ L p (Ω) d p(2-p) 2 ρ 2 F (u, v) p 2 . Moreover, we can choose c(p ) > 0 such that (p → c(p)) ∈ C 0 (1, ∞). Proof. The proof is provided in the Appendix, see A. We are now able to prove the main result in a setting excluding parametric dependencies. Theorem 8. Let f ∈ W 1,p (Ω) * , p ∈ (1, ∞), be such that ⟨f, c⟩ W 1,p (Ω) = 0 for all c ∈ R. Moreover, let u * ∈ W 1,p (Ω) be a weak solution of the p-Laplace problem with homogeneous Neumann boundary conditions, i.e., u * ∈ W 1,p (Ω) minimizes E : W 1,p (Ω) → R, for every v ∈ W 1,p (Ω) defined by E(v) := 1 p ˆΩ |∇v| p dx -⟨f, v⟩ W 1,p (Ω) . Let M ⊂ W 1,p (Ω) be any subset that contains the zero function and let v ∈ M be an arbitrary. Setting M := u ∈ M | ∥∇u∥ L p (Ω) d ≤ 2∥∇u * ∥ L p (Ω) d , it holds ∥∇v -∇u * ∥ L p (Ω) d ≾    δ(v) 1/p + inf ṽ∈ M ∥∇ṽ -∇u * ∥ 2 p L p (Ω) d if p ∈ [2, ∞) δ(v) 1/2 + inf ṽ∈ M ∥∇ṽ -∇u * ∥ p 2 L p (Ω) d if p ∈ (1, 2) , where δ(v) := E(v) -inf ṽ∈ M E(ṽ) is the optimization error and the implicit constant depends on p, Ω and ∥f ∥ W 1,p (Ω) * only. Proof. ad p ∈ [2, ∞). If p ∈ [2, ∞ ), then we estimate, using the relation of the natural distances to Sobolev semi-norms as described in Lemma 7, Céa's Lemma 5, and the coercivity estimate in Lemma 13 to obtain ∥∇v -∇u * ∥ p L p (Ω) d ≤ c(p) ρ 2 F (v, u * ) ≤ c(p) δ(v) + inf ṽ ρ 2 F (ṽ, u * ) ≤ c(p) δ(v) + inf ṽ∈ M ∥∇ṽ∥ L p (Ω) d + ∥∇u * ∥ L p (Ω) d p-2 ∥∇ṽ -∇u * ∥ 2 L p (Ω) d ≤ c(p) δ(v) + 3 p-2 ∥∇u * ∥ p-2 L p (Ω) d inf ṽ∈ M ∥∇ṽ -∇u * ∥ 2 L p (Ω) d ≤ c(p) δ(v) + 3 p-2 c(p, Ω) ∥f ∥ p-2 p-1 W 1,p (Ω) * inf ṽ∈ M ∥∇ṽ -∇u * ∥ 2 L p (Ω) d . ad p ∈ (1, 2]. If p ∈ (1, 2], then, again, using the relation of the natural distance to Sobolev seminorms (cf. Lemma 7) and Céa's Lemma 5, we obtain ∥∇v -∇u * ∥ L p (Ω) d ≤ c(p) ∥∇v∥ L p (Ω) d + ∥∇u * ∥ L p (Ω) d 2-p 2 δ(v) 1 2 + inf ṽ∈M ∥∇ṽ -∇u * ∥ p 2 L p (Ω) d . Thus, it remains to estimate the first factor in the equation above. We use the coercivity estimate of Lemma 13 to obtain ∥∇v∥ L p (Ω) d ≤ c(p, Ω) E(v) + ∥f ∥ p ′ W 1,p (Ω) * 1 p ≤ c(p, Ω) ∥f ∥ 1 p-1 W 1,p (Ω) * , where we used that 0 ∈ M to be able to estimate E(v) ≤ δ(v). As a result, again applying Lemma 13, it follows that ∥∇v∥ L p (Ω) d + ∥∇u * ∥ L p (Ω) d 2-p 2 ≤ c(p, Ω) δ(v) 2-p 2p + ∥f ∥ 2-p 2p-2 W 1,p (Ω) * = c(p, Ω) δ(v) 1 2 + ∥f ∥ 2-p 2p-2 W 1,p (Ω) * . Assuming δ(v) ≤ 1, it holds δ(v) 1 2 + δ(v) 1 p ≤ 2δ(v) 1 2 , which concludes the proof.

2.2. PROOF OF THE PARAMETRIC SETTING

As detailed in the introduction, the energy formulation we use for a p-Laplace problem with a parametric right-hand side f ∈ L p ′ (P × Ω) and parameter space P ⊆ R d P , d P ∈ N, for every v ∈ L p (P, W 1,p (Ω)), is defined by E(v) := ˆP 1 p ˆΩ |∇ x v(τ , •)| p dx -ˆΩ f (τ , •) v(τ , •) dx dτ . Before proving the error decay rates of Theorem 1, we need to identify the correct function space U for the definition of E. In the case of a parametric right-hand side, this is straight-forward and the space U is a standard Bochner space, see Proposition 9. For varying domains or a varying exponent as a parametric dependency, the corresponding function spaces are intricate, we refer to Appendix C. Next, we need to guarantee that the minimizer of E indeed solves the parametric problem. This is carried out in Proposition 9 and is encoded in the fact that u * (τ , •) ∈ W 1,p (Ω) for a.e. τ ∈ P minimizes E τ in the notation of this Proposition. Proceeding to derive error estimates, we want to mimic the strategy of the non-parametric case. This crucially relies on the fact that the constants in equation 7 do not depend on the right-hand side. As a consequence, we can prove a two-sided estimate as in Proposition 9 with an analogue of ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R given by ρ 2 F : L p (P, W 1,p (Ω)) × L p (P, W 1,p (Ω)) → R, for every v, w ∈ L p (P, W 1,p (Ω)) defined by ρ 2 F (v, w) := ˆP ρ 2 F (v(τ , •), u(τ , •)) dτ . Finally, we can proceed as in the non-parametric case and derive a Céa Lemma. Proposition 9 (Variable Right-Hand Sides). Let Ω ⊆ R dΩ , d Ω ∈ N, and P ⊆ R d P , d P ∈ N, be bounded domains and p ∈ (1, ∞). Assume U ⊂ W 1,p (Ω) is a closed subset that satisfies a Poincaré inequality, as in Proposition 2. Moreover, we define the Bochner-Lebesgue space U := L p (P, U ) . For fixed f ∈ L p ′ (P × Ω), we define the variable right-hand side p-Dirichlet energy E : U → R for every v ∈ U by E(v) := ˆP 1 p ˆΩ |∇ x v(τ , •)| p dx -ˆΩ f(τ , •) v(τ , •) dx dτ , where the gradient ∇ x for a.e. τ ∈ P is to be understood with respect to the variable x ∈ Ω only. Then, the following statements apply: (i) There exists a unique (parametric) minimizer u * ∈ U of E : U → R. (ii) For a.e. τ ∈ P, the function u * (τ , •) ∈ U is the unique minimizer of E τ : U → R, for every v ∈ U defined by E τ (v) := 1 p ˆΩ |∇v| p dx -ˆΩ f (τ , •) v dx . (iii) Furthermore, for every v ∈ U , it holds c(p) -1 ρ 2 F (v, u * ) ≤ E(v) -E(u * ) ≤ c(p) ρ 2 F (v, u * ) , where c(p) > 0 is the constant from Proposition 7 (ii). Proof. We prove this for more general parametric dependencies in the Appendix, see C. Remark 10. Requiring f (τ , •) to be mean-value-free for a.e. τ ∈ P, we may, again, drop the assumption of a Poincaré inequality on the space U , as we explained in Remark 4. In this case, we cannot expect the minimizer to be unique. The estimate 9 is the key to establish a Céa type Lemma for the energy E : U → R. In the situation of Proposition 9, we can accomplish this as in Lemma 5. More precisely, for any fixed v ∈ M ⊂ U , it holds ρ 2 F (v, u * ) ≤ c(p) δ(v) + inf ṽ∈M ρ 2 F (ṽ, u * ) , where δ(v) := E(v) -inf ṽ∈M E(ṽ) the parametric optimization error. With all the previous work, the Main Theorem is can now be proved in a similar way as in the case without parameters. We postpone the proof to the Appendix, see A.2.

3. NUMERICAL EXAMPLES

We give two examples, one with a high dimensional physical domain and one with a highdimensional parametric right-hand side. Our goal is to investigate the error in dependence on the dimension. To find good neural network approximations, we employ a Deep Ritz Method for training. Example 1 As a PDE posed on a high-dimensional physical domain, we consider -∆u + u = f in Ω , with homogeneous Neumann boundary conditions, and Ω = (0, 1) dΩ , d Ω ∈ N. As manufactured solution, we use u ∈ W 1,2 (Ω), for every x = (x 1 , . . . , x dΩ ) ⊤ ∈ Ω defined by u(x) := c • dΩ i=1 cos(πx i ) , where c > 0 is chosen such that u has uniform L 2 (Ω) norm. Then, the right-hand side f ∈ L 2 (Ω), for every x ∈ Ω is given via f (x) = (π 2 + 1) u(x) . Example 2 As a problem with a high-dimensional parametric right-hand side, we consider -u ′′ + u = f in P × Ω , with homogeneous Neumann boundary conditions, parameter space P = [-1, 1] d P , d P ∈ N, and physical domain Ω = (0, 1). As manufactured solution and right-hand side, we use u ∈ L 2 (P, W 1,2 (Ω)) and f ∈ L 2 (P × Ω), for every (τ , x) ⊤ = (τ 1 , . . . , τ d P , x) ⊤ ∈ P × Ω defined by u(τ , x) := d P -1 k=0 τ k k 2 π 2 + 1 cos(kπx) and f (τ , x) = d P -1 k=0 τ k cos(kπx) . Neural Network Architecture and Training We employ fully-connected ReLU 2 -networks with four hidden layers and varying width as well as a Deep Ritz energy formulation as a loss function. To resolve the minimization problem, we employ the Adam optimizer with learning rate set to 0.001. The appearing integrals are discretized using Monte-Carlo approximations, where new random points are drawn for every update in the gradient descent. The optimization is run until no further improvement is seen in approximating the ground truth, which in our examples happens typically in around 10, 000 to 20, 000 iterations. Although it is impossible for us to quantify how well the empirically found solutions resolve the minimization, i.e., how large the quantity δ(u θ ) in Theorem 1 is, the experiments still confirm the promising behavior of neural networks for solving high-dimensional and parametric problems. A PROOFS Here, we collect the proofs that were deferred to the Appendix.

A.1 PROOF OF PROPOSITION 2

In order to prove Proposition 2, we need some preparation. The first two lemmas analyze the point-wise properties of the function F : R d → R d , for every a ∈ R d defined by F (a) = |a| p-2 a, that induces the natural distance measure ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R, which is essential for the error analysis of the p-Laplacian. Lemma 11. Let p ∈ (1, ∞) and d ∈ N. Then, there exists a constant c(p) > 0, depending only on p ∈ (1, ∞) and not depending on d ∈ N, such that the following statements apply: (i) For every a, b ∈ R d , it holds c(p) -1 |F (a) -F (b)| 2 ≤ (|a| p-2 a -|b| p-2 b) • (a -b) ≤ c(p) |F (a) -F (b)| 2 . (ii) For every a, b ∈ R d , it holds c(p) -1 |F (a) -F (b)| 2 ≤ (|a| + |b|) p-2 |a -b| 2 ≤ c(p) |F (a) -F (b)| 2 . Moreover, we can choose c(p) > 0 such that (p → c(p)) ∈ C 0 (1, ∞). Proof. See (Diening et al., 2007, Appendix) or (Diening & Ettwein, 2008, Appendix) . Furthermore, carefully reviewing the proofs in (Diening et al., 2007, Appendix) reveals that the constants c(p) > 0, p ∈ (1, ∞), in Lemma 11 depend continuously on p ∈ (1, ∞). Lemma 12. Let p ∈ (1, ∞) and d ∈ N. Then, there exits a constant c(p) > 0, depending only on p ∈ (1, ∞) and not depending on d ∈ N, such that for every a, b ∈ R d with |a|+|b| > 0, we have that c(p) -1 |F (a) -F (b)| 2 ≤ ˆ1 0 D 2 ϕ(τ a + (1 -τ )b) : (a -b) ⊗ (a -b) (1 -τ ) dτ ≤ c(p) |F (a) -F (b)| 2 , where ϕ ∈ C 1 (R d ) ∩ C 2 (R d \ {0}), defined by ϕ(a) := 1 p |a| p for all a ∈ R d , denotes the p-Dirichlet density. Moreover, we can choose c(p) > 0 such that (p → c(p)) ∈ C 0 (1, ∞). Proof. We introduce the abbreviation η (Růžička, 2004, p. 73, ineq. (1.35) )), for every a, b ∈ R d with |a| + |b| > 0, we obtain 2 : R d × R d \ {(0, 0) ⊤ } → R ≥0 , for every a, b ∈ R d with |a| + |b| > 0 defined by η 2 (a, b) := ˆ1 0 D 2 ϕ(τ a + (1 -τ )b) : (a -b) ⊗ (a -b) (1 -τ ) dτ . Using D 2 ϕ(a) : b ⊗ b ≥ min{1, p -1}|a| p-2 |b| 2 for all a ∈ R d \ {0}, b ∈ R d (cf. η 2 (a, b) ≥ min{1, p -1} ˆ1 0 |τ a + (1 -τ )b| p-2 |a -b| 2 (1 -τ ) dτ . With the help of Jensen's inequality applied with respect to the measure dµ = (1 -τ )dτ , i.e., in particular, we use that dµ([0, 1]) = 1 2 , for every a, b ∈ R d with |a| + |b| > 0, we observe that 2 ˆ1 0 |τ a + (1 -τ )b|(1 -τ ) dτ p ≤ ˆ1 0 |τ a + (1 -τ )b| p (1 -τ ) dτ . Then, we continue in equation 11 by incorporating equation 12 and, thus, find that for every a, b ∈ R d with |a| + |b| > 0, it holds η 2 (a, b) ≥ min{1, p -1} ˆ1 0 |τ a + (1 -τ )b| p (1 -τ ) dτ |a -b| 2 (|a| + |b|) 2 ≥ min{1, p -1} 2 ˆ1 0 |τ a + (1 -τ )b| (1 -τ ) dτ p |a -b| 2 (|a| + |b|) 2 . ( ) For every a, b ∈ R d , it holds 2 ˆ1 0 |τ a + (1 -τ )b| (1 -τ ) dτ ≥ 1 6 (|a| + |b|) , which is based on that for |a| > |b| and τ ∈ [ 2 3 , 1], it holds |τ a + (1 -τ )b| ≥ 1 3 |a| > 1 6 (|a| + |b|), and for |b| ≥ |a| and τ ∈ [0, 1 3 ], it holds |τ a + (1 -τ )b| ≥ 1 3 |b| ≥ 1 6 (|a| + |b|). Using equation 14 in equation 13, for every a, b ∈ R d with |a| + |b| > 0, we deduce that η 2 (a, b) ≥ min{1, p -1} 1 6 p (|a| + |b|) p-2 |a -b| 2 . Resorting to Lemma 11, we conclude the existence of a constant c(p) > 0, depending only on p ∈ (1, ∞), with (p → c(p)) ∈ C 0 (1, ∞), such that for every a, b ∈ R d with |a| + |b| > 0, it holds η 2 (a, b) ≥ c(p) -1 |F (a) -F (b)| 2 . On the other hand, since also D 2 ϕ(a) : b ⊗ b ≤ max{1, p -2}|a| p-2 |b| 2 for all a ∈ R d \ {0}, b ∈ R d , which, again, follows very similarly to (Růžička, 2004, p. 73, ineq. (1.35 )), we find that η 2 (a, b) ≤ max{1, p -2} ˆ1 0 |τ a + (1 -τ )b| p-2 (1 -τ ) dτ |a -b| 2 . ( ) Since, appealing to (Diening et al., 2007, Lemma 6.1) , there is a constant c(p) > 0, depending only on p ∈ (1, ∞), with (p → c(p)) ∈ C 0 (1, ∞), such that for every a, b ∈ R d with |a| + |b| > 0, it holds ˆ1 0 |τ a + (1 -τ )v| p-2 dτ ≤ c(p) (|a| + |b|) p-2 , we deduce from equation 15 that η 2 (a, b) ≤ max{1, p -2}c(p)(|a| + |b|) p-2 |a -b| 2 for all a, b ∈ R d with |a| + |b| > 0, which, resorting again to Lemma 11, completes the proof of Lemma 12. Proof of Proposition 2. ad (i). The p-Dirichlet energy E : U → R is proper, strictly convex, and continuous, hence, weakly lower semi-continuous. In addition, the validity of Poincaré's inequality guarantees the coercivity of E : U → R, so that the direct method in the calculus of variations yields, cf. Dacorogna (2007) , the existence of a unique minimizer u * ∈ U of E : U → R. ad (ii). We proceed similar to (Diening & Kreuzer, 2008, Lemma 16.) . Again, we employ the notation ϕ ∈ C 1 (R d ) ∩ C 2 (R d \ {0}), defined by ϕ(a) := 1 p |a| p for all a ∈ R d , for the p-Dirichlet density. Since Dϕ ∈ C 0 (R d ) d with |Dϕ(a)| = |a| p-1 for all a ∈ R d , the p-Dirichlet energy is continuously Fréchet differentiable with ⟨DE(u), v⟩ U := ˆΩ Dϕ(∇u) • ∇v dx -⟨f, v⟩ W 1,p (Ω) . for all u, v ∈ U . In particular, due to the minimality of u * ∈ U , we have that DE(u * ) = 0 in U * , i.e., for every v ∈ U , it holds ⟨DE(u), v⟩ U = 0 . However, E : U → R is not twice continuously Fréchet differentiable. Therefore, we consider regularizations (ϕ ε ) ε>0 ⊆ C 2 (R d ), defined by ϕ ε (a) := 1 p (ε 2 + |a| 2 ) p 2 for every ε > 0 and a ∈ R d , having the following properties: (α) ϕ ε (a) → ϕ(a) (ε → 0) for all a ∈ R d and ϕ ε (a) ≤ 2 p 2 /p (|a| p + ε p ) for all a ∈ R d and ε > 0, (β) (Dϕ ε )(a) → (Dϕ)(a) (ε → 0) for all a ∈ R d and |(Dϕ ε )(a)| ≤ 2 p-1 2 (|a| p-1 + ε p-1 ) for all a ∈ R d and ε > 0, (γ) (D 2 ϕ ε )(a) → (D 2 ϕ)(a) (ε → 0) for all a ∈ R d \ {0} and |(D 2 ϕ ε )(a)| ≤ (p -1) 2 p-2 2 (ε p-2 + |a| p-2 ) for all a ∈ R d and ε > 0. Inasmuch as (ϕ ε ) ε>0 ⊆ C 2 (R d ) satisfies (α), (β) and (γ), it is easily checked that for every ε > 0, the regularized p-Dirichlet energy E ε : U → R, for every v ∈ U defined by E ε (v) := ˆΩ ϕ ε (∇v) dx -⟨f, v⟩ W 1,p (Ω) , is twice continuously Fréchet differentiable. In consequence, using Taylor's formula and Fubini's theorem, for every ε > 0 and v ∈ U , we obtain E ε (v) -E ε (u * ) = ⟨DE ε (u * ), v -u * ⟩ U + ˆ1 0 D 2 E ε (τ v + (1 -τ )u * ) [v -u * , v -u * ] (1 -τ ) dτ (17) = ˆΩ Dϕ ε (∇u * ) • ∇(v -u * ) dx + ˆ1 0 ˆΩ D 2 ϕ ε (τ ∇v + (1 -τ )∇u * ) : ∇(v -u * )⊗∇(v -u * ) dx (1 -τ ) dτ = ˆΩ Dϕ ε (∇u * ) • ∇(v -u * ) dx + ˆΩ ˆ1 0 D 2 ϕ ε (τ ∇v + (1 -τ )∇u * ) : ∇(v -u * )⊗∇(v -u * ) dx (1 -τ ) dτ . Next, given both (α), (β) and (γ), it is allowed to apply Lebesgue's dominated convergence theorem in equation 17. Hence, by passing for ε → 0 in equation 17, using equation 16 in doing so, for every v ∈ U , we find that E(v) -E(u * ) = ˆΩ Dϕ(∇u * ) • ∇(v -u * ) dx (18) + ˆΩ ˆ1 0 D 2 ϕ(τ ∇v + (1 -τ )∇u * ) : ∇(v -u * ) ⊗ ∇(v -u * ) (1 -τ ) dτ dx = ⟨DE(u * ), v -u * ⟩ U + ˆΩ ˆ1 0 D 2 ϕ(τ ∇v + (1 -τ )∇u * ) : ∇(v -u * ) ⊗ ∇(v -u * ) (1 -τ ) dτ dx = ˆΩ ˆ1 0 D 2 ϕ(τ ∇v + (1 -τ )∇u * ) : ∇(v -u * ) ⊗ ∇(v -u * ) (1 -τ ) dτ dx . Apart from that, resorting to Lemma 12, we deduce the existence of a constant c(p) > 0, depending only on p ∈ (1, ∞), with (p → c(p)) ∈ C 0 (1, ∞), such that for every v ∈ U , it holds c(p) -1 ρ 2 F (v, u * ) ≤ ˆΩ ˆ1 0 D 2 ϕ(τ ∇v + (1 -τ )∇u * ) : ∇(v -u * ) ⊗ ∇(v -u * ) (1 -τ ) dτ dx ≤ c(p) ρ 2 F (v, u * ) . ( ) Eventually, by combining equation 18 and equation 19, we conclude the assertion of Theorem 2.

A.2 THE REMAINING PROOFS

Proof of Lemma 7. The following proof is inspired by (Nakov & Toulopoulos, 2021, Section 3.1) . ad (i) By referring to Lemma 11 (ii), we deduce the existence of a constant c(p) > 0, depending only on p ∈ (1, ∞), with (p → c(p)) ∈ C 0 (1, ∞), such that for every u, v ∈ W 1,p (Ω), it holds ∥∇u -∇v∥ p L p (Ω) d ≤ ˆΩ |∇u -∇v| 2 (|∇u| + |∇v|) p-2 dx ≤ c(p) ρ 2 F (u, v) , and, using Hölder's inequality with respect to p 2 , p p-2 , c(p) -1 ρ 2 F (u, v) ≤ ˆΩ |∇u -∇v| 2 (|∇u| + |∇v|) p-2 dx ≤ ˆΩ |∇u -∇v| p dx 2 p ˆΩ (|∇u| + |∇v|) p dx p-2 p ≤ ∥∇u∥ L p (Ω) d + ∥∇v∥ L p (Ω) d p-2 ∥∇u -∇v∥ 2 L p (Ω) d . ad (ii) By referring to Lemma 11 (ii), we deduce the existence of a constant c(p) > 0, depending only on p ∈ (1, ∞), with (p → c(p)) ∈ C 0 (1, ∞), such that for every u, v ∈ W 1,p (Ω), using Hölder's inequality with respect to 2 p , 2 2-p , it holds ∥∇(u -v)∥ L p (Ω) d ≤ ˆΩ |∇(u -v)| 2 (|∇u| + |∇v|) p-2 dx p 2 ˆΩ (|∇u| + |∇v|) p dx 2-p 2 ≤ ∥∇u∥ L p (Ω) d + ∥∇v∥ L p (Ω) d 2p-p 2 2 ˆΩ |∇(u -v)| 2 (|∇u| + |∇v|) p-2 dx p 2 ≤ c(p) ∥∇u∥ L p (Ω) d + ∥∇v∥ L p (Ω) d p(2-p) 2 ρ 2 F (u, v) p 2 , and c(p) -1 ρ 2 F (u, v) ≤ ˆΩ |∇u -∇v| 2 (|∇u| + |∇v|) p-2 dx ≤ ˆΩ |∇u -∇v| p |∇u -∇v| 2-p (|∇u| + |∇v|) 2-p dx ≤ ∥∇u -∇v∥ p L p (Ω) d . □ Lemma 13 (Coercivity of the p-Dirichlet Energy). Let f ∈ W 1,p (Ω) * , p ∈ (1, ∞), be such that ⟨f, c⟩ W 1,p (Ω) = 0 for all c ∈ R. Moreover, we define E : W 1,p (Ω) → R for every v ∈ W 1,p (Ω) by E(v) := 1 p ˆΩ |∇v| p dx -⟨f, v⟩ W 1,p (Ω) . Then, for every v ∈ W 1,p (Ω), we can estimate ∥∇v∥ L p (Ω) d ≤ c(p, Ω) ∥f ∥ 1 p-1 W 1,p (Ω) * + E(v) 1 p . For a minimizer u * ∈ W 1,p (Ω) of E : W 1,p (Ω) → R, this reduces to ∥∇u * ∥ L p (Ω) d ≤ c(p, Ω) ∥f ∥ 1 p-1 W 1,p (Ω) * . The constant c(p, Ω) > 0 depends continuously on p and on the domain Ω. Proof. Using that f ∈ W 1,p (Ω) * vanishes on constant functions, the Poincaré-Wirtinger inequality and the ε-Young inequality, for every v ∈ W 1,p (Ω) and ε > 0, abbreviating ⟨v⟩ Ω := ffl Ω v dx, it holds E(v) = 1 p ∥∇v∥ p L p (Ω) d + ⟨f, v -⟨v⟩ Ω ⟩ W 1,p (Ω) ≥ 1 p ∥∇v∥ p L p (Ω) d -c(p, ε) ∥f ∥ p ′ W 1,p (Ω) * -ε ∥v -⟨v⟩ Ω ∥ W 1,p (Ω) ≥ 1 p -ε C P ∥∇v∥ p L p (Ω) d -c(p, ε) ∥f ∥ p ′ W 1,p (Ω) * , where c(p, ε) := (pε) 1-p ′ p -1 . Hence, choosing ε > 0 sufficiently small -depending on the value of the Poincaré constant C P -in equation 21, for every v ∈ W 1,p (Ω), we find that ∥∇v∥ L p (Ω) d ≤ c(p, Ω) E(v) + ∥f ∥ p ′ W 1,p (Ω) * , where the dependence of the Poincaré constant leads to dependence on the domain Ω. Remark 14 (On the Constant c(p, Ω)). While the dependence of c(p, Ω) on p can directly be understood from the proof, the dependence on Ω stems from the constant appearing in the Poincaré inequality -which we call the Poincaré constant and denote by C P . For convex domains, one has that C P ≤ π p diam(Ω) p , where π p := 2π (p -1) 1 p p sin(π/p) , we refer to Esposito et al. (2013) or Koerber (2018) . Finally, we provide the missing proof of the Main Theorem. Proof of Theorem 1. Let p ≥ 2 and v ∈ M be arbitrary. Using Proposition 9 and the inequality 10, we estimate ∥∇ x v -∇ x u * ∥ p L p (P×Ω) d = ˆP ∥∇ (v(τ ) -u * (τ ))∥ p L p (Ω) d dτ ≤ c(p) ˆP ρ 2 F (v(τ ), u * (τ )) dτ ≤ c(p) δ(v) + inf ṽ∈ M ˆP ρ 2 F (ṽ(τ ), u * (τ )) dτ =: ( * ) . We proceed by utilizing the relation of the natural distance ρ 2 F : W 1,p (Ω) × W 1,p (Ω) → R to the Sobolev topology from Lemma 7 and, subsequently, apply Hölder's inequality with the exponents ( p p-2 , p 2 ) ( * ) ≤ δ + inf ṽ∈ M ˆP ∥∇ṽ(τ )∥ L p (Ω) d + ∥∇u * (τ )∥ L p (Ω) d p-2 ∥∇ṽ(τ ) -∇u * (τ )∥ 2 L p (Ω) d dτ ≤ δ + inf ṽ∈ M ˆP ∥∇ṽ(τ )∥ L p (Ω) d + ∥∇u * (τ )∥ L p (Ω) d p dτ p-2 p ∥∇ x ṽ -∇ x u * ∥ 2 L p (P×Ω) d ≤ δ + 3 p-2 ∥∇ x u * ∥ p-2 L p (P×Ω) d inf ṽ∈ M ∥∇ x ṽ -∇ x u * ∥ 2 L p (P×Ω) d ≤ δ + 3 p-2 c(p, Ω) ∥f ∥ p-2 p-1 L p ′ (P×Ω) inf ṽ∈ M ∥∇ x ṽ -∇ x u * ∥ 2 L p (P×Ω) d . This implies the assertion in the case p ≥ 2. The proof in the situation of p < 2 works similarly and is therefore omitted.

B SMOOTHNESS ASSUMPTION

For linear elliptic equations with parametric right-hand side, higher order Sobolev regularity holds true, provided the right-hand side f ∈ L 2 (P × Ω) and the domain Ω ⊂ R dΩ , d Ω ∈ N, are smooth enough. In the following, we denote by H k ´(Ω), the Sobolev space with vanishing mean value. Lemma 15. Let k ∈ N be fixed and let P ⊂ R d P , d P ∈ N, be open and Ω ⊂ R dΩ , d Ω ∈ N, be a domain with ∂Ω ∈ C k+1,1 boundary. Let, furthermore, f ∈ C k+2 (P, H k ´(Ω)) be given. Then, the weak solution u * ∈ L 2 (P, H 1 ´(Ω)) to -∆u = f in P × Ω , ∂ n u = 0 on P × ∂Ω , is a member of the space H k+2 (P × Ω). Proof. We define e : H k+2 ´(Ω) × P → H k ´(Ω) for every u ∈ H k+2 ´(Ω) and τ ∈ P by e(u, τ ) := -∆u -f (τ ) H k ´(Ω) . Then, the zero level set (u, τ ) ⊤ ∈ H k+2 ´(Ω) × P | e(u, τ ) = 0 is parametrized through the solution map (τ → u * (τ )) : P → H k+2 ´(Ω). In fact, standard elliptic regularity theory yields that u * (τ ) ∈ H k+2 (Ω) for a.e. τ ∈ P, see for instance Grisvard (2011) . The implicit function theorem for Banach spaces guarantees that u * ∈ C k+2 (P, H k+2 (Ω)) provided the partial derivative of e with respect to the first component, i.e., ∂ 1 e(u, τ ) : H k+2 ´(Ω) → H k ´(Ω), for every (u, τ ) ⊤ ∈ H k+2 ´(Ω) × P given via ∂ 1 e(u, τ )[v] = -∆v in H k ´(Ω) for all v ∈ H k+2 ´(Ω) , is a linear homeomorphism. However, by using the exact same elliptic regularity result as used above, we see that this is in fact true. Hence, the assertion of the lemma follows.

C MORE GENERAL PARAMETRIC DEPENDENCIES

In the main part of the manuscript, we only considered parametric dependencies that were induced through a parameter dependent right-hand side. This was done to keep the technicality minimal, yet does not constitute the full generality of our analysis. In this Section, we outline more general parametric dependencies, including varying integrability exponents, domains and material tensors. In every situation, we guarantee the well-posedness of the parametric problem, providing an analogue result to Proposition 9 and, consequently, allows to deduce error decay rates. As error decay ratesassuming smoothness -follow the same pattern as in Theorem 1, we do not explicitly state them.

C.1 PARAMETRIC EXPONENTS AND PARAMETRIC RIGHT-HAND SIDES

We begin with a problem where both, the exponent p ∈ (1, ∞) and the right-hand side f are allowed to vary in a parameter space P. More precisely, we seek u * : P × Ω → R satisfying -div |∇ x u * (τ , x)| p(τ )-2 ∇ x u * (τ , x) = f (τ , x) for a.e. (τ , x) ⊤ ∈ P × Ω , subjected to suitable boundary conditions. The precise statement is the following. Proposition 16 (Variable Exponents). Let Ω ⊆ R dΩ , d Ω ∈ N, and P ⊆ R d P , d P ∈ N, be bounded domains and p ∈ L ∞ (P) such that there exist p -, p + ∈ (1, ∞) with p -≤ p(τ ) ≤ p + for a.e. τ ∈ P. Moreover, we define the variable exponent Lebesgue spacefoot_3 τ ) dx dτ → ∞ if ∥v∥ U → ∞ (cf. (Diening et al., 2011, Lemma 3.2.4 )) from equation 24 and equation 25 for ε ∈ (0, 1 p -] sufficiently small, we conclude that from ∥v∥ U → ∞, it follows that E(v) → ∞, i.e., E : U → R is weakly coercive, so that the direct method in the calculus of variations, cf. Dacorogna (2007) , yields the existence of a unique minimizer u * ∈ U of E : U → R.

ad (ii).

A standard calculation shows that E : U → R is continuously Fréchet differentiable with ⟨DE(u), v⟩ U = ˆP ⟨DE τ (u(τ , •)), v(τ , •)⟩ W 1,p(τ ) 0 (Ω) dτ



The subsets we have in mind consist of neural network functions of a given architecture and, thus, v ∈ M is a neural network. But any choice of M is admissible. To see this, we refer to equation 7 in Proposition 2. This result holds also for other activation functions, we refer to the original work. Here, L 0 (P × Ω) denotes the space of scalar (Lebesgue-)measurable functions on P × Ω. More precisely, these references prove only the case N = 1, since therein P represents a time interval in an unsteady fluid flow problem. However, the proofs can be generalized verbatimly to the case N > 1, so that we will refrain from proving these results again at this point. Here, we exploit that there exists K > 0 such that K -1 ≤ det(Dφτ ) ≤ K in Ω(τ ) for all τ ∈ P, cf.(Nägele et al., 2017, (3.1)).



be a bounded Lipschitz domain and P ⊆ R d P , d P ∈ N, an open set. Moreover, let f ∈ L p ′ (P × Ω), p ∈ (1, ∞), be such that ´Ω f (τ , •) dx = 0 for a.e. τ ∈ P. Denote by u * ∈ L p (P, W 1,p (Ω)), a weak solution of the parametric p-Laplace problem with homogeneous Neumann boundary conditions, i.e.,

); Jiao et al. (2021); Duan et al. (2021); Müller & Zeinhofer (

usually referred to as the Natural Distance, cf.Diening & Růžička  (2007);Diening et al. (2007);Diening & Ettwein (2008);Kaltenbach & Růžička (2022).

Figure 1: The plot on the left shows the relative L 2 errors obtained from Example 1 and the plot on the right reports the errors for Example 2. Here, dashed lines represent relative H 1 errors and solid lines stand for relative L 2 errors.

p(•) (P × Ω) := v ∈ L 0 (P × Ω) ˆP ˆΩ |v(τ , x)| p(τ ) dx dτ < ∞ ,and the variable exponent Bochner-Lebesgue spaceU := v ∈ L p(•) (P × Ω) | v(τ , •) ∈ W 1,p(τ ) 0(Ω) for a.e. τ ∈ P, |∇ x v| ∈ L p(•) (P × Ω) ,where the gradient ∇ x for a.e. τ ∈ P is to be understood with respect to the variable x ∈ Ω only.For fixed f ∈ L p ′ (•) (P × Ω), i.e., f ∈ L 0 (P × Ω) and ´P ´Ω |f (τ , x)| p ′ (τ ) dx dτ < ∞, where p ′ ∈ L ∞ (P) is defined by p ′ (τ ) := p(τ ) p(τ )-1 for a.e. τ ∈ P, we define variable exponent p(•)-Dirichlet energy E : U → R for every v ∈ U by E(v) := ˆP 1 p(τ ) ˆΩ |∇ x v(τ , •)| p(τ ) dx -ˆΩ f (τ , •) v(τ , •) dx dτ .Then, the following statements apply:(i) There exists a unique (parametric) minimizer u * ∈ U of E : U → R.(ii) For a.e. τ ∈ P,u * (τ , •) ∈ W 1,p(τ ) 0 (Ω) is a unique minimizer of E τ : W 1,p(τ ) 0 (Ω) → R, for every v ∈ W 1,p(τ ) 0 (Ω) defined by E τ (v) := 1 p(τ ) ˆΩ |∇v| p(τ ) dx -ˆΩ f (τ , •) v dx .(iii) For a.e. τ ∈ P and v ∈ W 1,p(τ ) 0(Ω), it holdsc(p(τ )) -1 F τ (∇v) -F τ (∇ x u * (τ , •)) 2 L 2 (Ω) d ≤ E τ (v) -E τ (u * (τ , •)) ≤ c(p(τ )) F τ (∇v) -F τ (∇ x u * (τ , •)) 2 L 2 (Ω) d ,where F τ : R d → R d , τ ∈ P, for every τ ∈ P is defined byF τ (a) := |a| p(τ )-2 2a for all a ∈ R d and c(p(τ )) > 0 is the constant from Theorem 2.(iv) Furthermore, for every v ∈ U , it holdsess inf τ ∈P c(p(τ )) -1 ρ 2 F (v, u * ) ≤ E(v) -E(u * ) ≤ ess sup τ ∈P c(p(τ )) ρ 2 F (v, u * ) , where ρ 2 F (v, u * ) := ˆP F τ (∇ x v(τ , •)) -F τ (∇ x u * (τ , •)) 2 L 2 (Ω) d dτ .Proof. ad (i). The space U equipped with the norm∥ • ∥ U := ∥ • ∥ L p(•) (P×Ω) + ∥ |∇ x • | ∥ L p(•) (P×Ω), where∥v∥ L p(•) (P×Ω) := inf λ > 0 ˆP ˆΩ v(τ , x) λ p(τ )dx dτ ≤ 1 denotes the Luxembourg norm, cf.Diening et al. (2011), is a reflexive Banach space, (Kaltenbach, 2021, Proposition 3.7 & Proposition 3.9) or(Kaltenbach & Růžička, 2021, Proposition 3.6 & Proposition 3.7) 5 . Apparently, E : U → R is strictly convex and continuous. In addition, for every v ∈ U , due to Poincaré's inequality applied for a.e. fixed τ ∈ P, which is allowed since v(τ , •) ∈ W 1,p(τ ) 0(Ω) for a.e. τ ∈ P, we have thatˆP ˆΩ |v(τ , x)| p(τ ) dx dτ ≤ ˆP 2 diam(Ω) p(τ ) ˆΩ |∇ x v(τ , x)| p(τ ) dx dτ ≤ 1 + 2diam(Ω) p + ˆP ˆΩ |∇ x v(τ , x)| p(τ ) dx dτ ,(24)which for every v ∈ U and ε ∈ (0, 1 p -], using for a.e. τ ∈ P, the ε-Young inequality with constant c(p(τ ), ε) :=(p(τ )ε) 1-p ′ (τ ) p ′ (τ ) ˆΩ |∇ x v(τ , •)| p(τ ) dx dτ -ˆP ˆΩ c(p(τ ), ε)|f (τ , •)| p ′ (τ ) -ε|v(τ , •)| p(τ ) dx dτ ≥ 1 p + -ε(1 + 2diam(Ω)) p + ˆP ˆΩ |∇ x v(τ , •)| p(τ ) dx dτ -(p -ε) 1-(p -) ′ (p + ) ′ ˆP ˆΩ |f (τ , •)| p ′ (τ ) dx dτ . ´P ´Ω |v(τ , •)| p(τ ) + |∇ x v(τ , •)| p(

Weinan, Jiequn Han, and Arnulf Jentzen. Algorithms for Solving High Dimensional PDEs: From Nonlinear Monte Carlo to Machine Learning. Nonlinearity, 35(1):278, 2021. Jinchao Xu. The finite Neuron Method and Convergence Analysis. Communications in Computational Physics, 28, 2020.

AUTHOR CONTRIBUTIONS

All authors contributed equally to all parts of the manuscript. for all u, v ∈ U . Therefore, due to the minimality of u * ∈ U , for every v ∈ U , we have that 0 = ⟨DE(u * ), v⟩ U = ˆP ⟨DE τ (u * (τ , •)), v(τ , •)⟩ W 1,p(τ ) 0 (Ω) dτ .(26)Inasmuch as W 1,p + 0 (Ω) → W 1,p(τ ) 0(Ω) densely for a.e. τ ∈ P and W 1,p + 0 (Ω) is separable and, thus, contains a countable dense subset (ψ k ) k∈N ⊆ W 1,p + 0 (Ω), the subset (ψ k ) k∈N lies even densely in W 1,p(τ ) 0 (Ω) for a.e. τ ∈ P. Next, choosing v = φψ k ∈ U in equation 26 for arbitrary φ ∈ C ∞ 0 (P) and k ∈ N, we further deduce thatso that for each fixed k ∈ N, the fundamental lemma of calculus of variations implies that for a.e. τ ∈ P,

0

(Ω) = 0. This, since the countable union of sets of zero measure has still zero measure, we deduce from equation 27 that for a.e. τ ∈ P, it holds for all k ∈ N(Ω) for a.e. τ ∈ P, from equation 28 we infer that for a.e. τ ∈ P, it holds for all v ∈ W 1,p(τ ) 0 (Ω)Eventually, since for a.e. τ ∈ P, the p(τ )-Dirichlet energy E τ : W 1,p(τ ) 0 (Ω) → R is strictly convex, for a.e. τ ∈ P, the slice u * (τ ,ad (iii) and (iv). Follows from point (ii) and Theorem 2.

C.2 PARAMETRIC DOMAINS

We consider parametric domains, where we focus on domains depending on only one parameter, as the required function spaces are only studied in this case. More precisely, we aim to solveThe precise requirements are given in the following proposition. Proposition 17 (Variable Domains). Let Ω ⊆ R dΩ , d Ω ∈ N, a bounded Lipschitz domain and p ∈ (1, ∞). Moreover, let φ τ : Ω → Ω(τ ), τ ∈ P := (0, T ), T > 0, the induced flow of a smooth, compactly supported vector field v : R × R d → R d , cf. (Delfour & Zolésio, 2011, Chapter 4) . For the non-cylindrical domain Q := τ ∈P {τ } × Ω(τ ), we define the variable domain Bochner-Lebesgue space, where the gradient ∇ x for a.e. τ ∈ P is to be understood with respect to the variable x ∈ Ω(τ ) only. For fixed f ∈ L p ′ (Q), we define the variable domain p-Dirichlet energy E : U → R for every v ∈ U byThen, the following statements apply:(i) There exists a unique (parametric) minimizer u * ∈ U of E : U → R.(ii) For a.e. τ ∈ P, u * (τ , •) ∈ W 1,p 0 (Ω(τ )) is a unique minimizer of E τ : W 1,p 0 (Ω(τ )) → R, for every v ∈ W 1,p 0 (Ω(τ )) defined by(iii) For a.e. τ ∈ P and v ∈ W 1,p(τ ) 0(Ω), it holds, where c(p) > 0 is the constant from Theorem 2.(iv) Furthermore, for every v ∈ U , it holdswhereProof. ad (i). The space U equipped with the norm which for any v ∈ U and ε ∈ (0, 1], using for each τ ∈ P, the ε-Young inequality with constant c(p, ε) :From equation 29 and equation 30, for ε > 0 sufficiently small, using that, by assumption, it holds sup τ ∈P diam(Ω(τ )) < ∞ 6 , we conclude that from ∥v∥ U → ∞, it follows that E(v) → ∞, i.e., E : U → R is weakly coercive, so that the direct method in the calculus of variations, cf. Dacorogna (2007) , yields the existence of a unique minimizer u * ∈ U of E : U → R.

ad (ii).

A direct calculation shows that E : U → R is continuously Fréchet differentiable withfor all u, v ∈ U . Therefore, due to the minimality of u * ∈ U , for every v ∈ U , we have thatSince W 1,p 0 (Ω(0)) is separable, there exists a countable dense subset (ψ k ) k∈N ⊆ W 1,p 0 (Ω(0)). Also, appealing to (Nägele, 2015, Lemma 2.1) , for any τ ∈ P, the pull-backs, are dense in W 1,p 0 (Ω(τ )). In addition, (Nägele et al., 2017, p. 6 (Ω) φ(τ ) dτ = 0 , so that, owing to the countability of (ψ k ) k∈N ⊆ U , the fundamental lemma of calculus of variations implies that for a.e. τ ∈ P, it holds for all k ∈ N ⟨DE τ (u * (τ , •)), (φ -1 τ ) * ψ k ⟩ W 1,p(τ ) 0(Ω) = 0 .As ((φ -1 τ ) * ψ k ) k∈N is dense in W 1,p 0 (Ω(τ )) for all τ ∈ P, we find that for a.e. τ ∈ P, it holds for all v ∈ W 1,p 0 (Ω(τ )) ⟨DE τ (u * (τ , •)), v⟩ W 1,p(τ ) 0(Ω) = 0 .Eventually, since for every τ ∈ P, the p-Dirichlet energy E τ : W 1,p 0 (Ω(τ )) → R is strictly convex, for a.e. τ ∈ P, the slice u * (τ , •) ∈ W 1,p 0 (Ω(τ )) is a unique minimizer of E τ : W 1,p 0 (Ω(τ )) → R. ad (iii) and (iv). Follow from point (ii) and Theorem 2.

