TOWARDS ANTISYMMETRIC NEURAL ANSATZ SEPA-RATION

Abstract

We study separations between two fundamental models (or Ansätze) of antisymmetric functions, that is, functions f of the form f (x σ(1) , . . . , x σ(N ) ) = sign(σ)f (x 1 , . . . , x N ), where σ is any permutation. These arise in the context of quantum chemistry, and are the basic modeling tool for wavefunctions of Fermionic systems. Specifically, we consider two popular antisymmetric Ansätze: the Slater representation, which leverages the alternating structure of determinants, and the Jastrow ansatz, which augments Slater determinants with a product by an arbitrary symmetric function. We construct an antisymmetric function that can be more efficiently expressed in Jastrow form, yet provably cannot be approximated by Slater determinants unless there are exponentially (in N 2 ) many terms. This represents the first explicit quantitative separation between these two Ansätze.

1. INTRODUCTION

Neural networks have proven very successful in parametrizing non-linear approximation spaces in high-dimensions, thanks to the ability of neural architectures to leverage the physical structure and symmetries of the problem at hand, while preserving universal approximation. The successes cover many areas of engineering and computational science, from computer vision (Krizhevsky et al., 2017) to protein folding (Jumper et al., 2021) . In each case, modifying the architecture (e.g. by adding layers, adjusting the activation function, etc.) has intricate effects in the approximation, statistical and optimization errors. An important aspect in this puzzle is to first understand the approximation abilities of a certain neural architecture against a class of target functions having certain assumed symmetry (LeCun et al., 1995; Cohen et al., 2018) . For instance, symmetric functions that are permutation-invariant, ie f (x σ(1) , . . . , x σ(N ) ) = f (x 1 , . . . x N ) for all x 1 , . . . , x N and all permutations σ : {1, N } → {1, N } can be universally approximated by several neural architectures, e.g DeepSets (Zaheer et al., 2017) or Set Transformers (Lee et al., 2019) ; their approximation properties (Zweig & Bruna, 2022) thus offer a first glimpse on their efficiency across different learning tasks. In this work, we focus on quantum chemistry applications, namely characterizing ground states of many-body quantum systems. These are driven by the fundamental Schröndinger equation, an eigenvalue problem of the form HΨ = λΨ , where H is the Hamiltonian associated to a particle system defined over a product space Ω ⊗N , and Ψ is the wavefunction, a complex-valued function Ψ : Ω ⊗N → C whose squared modulus |Ψ(x 1 , . . . , x N )| 2 describes the probability of encountering the system in the state (x 1 , . . . , x N ) ∈ Ω ⊗N . A particularly important object is to compute the ground state, associated with the smallest eigenvalue of H. On Fermionic systems, the wavefunction satisfies an additional property, derived from Pauli's exclusion principle: the wavefunction is antisymmetric, meaning that Ψ(x σ(1) , . . . , x σ(N ) ) = sign(σ)Ψ(x 1 , . . . , x N ) . The antisymmetric constraint is an uncommon one, and therefore demands unique architectures to enforce it. The quintessential antisymmetric function is a Slater determinant (Szabo & Ostlund, 2012), that we now briefly describe. Given functions f 1 , . . . , f N : Ω → C, they define a rank-one tensor mapping f 1 ⊗ • • • ⊗ f N : Ω ⊗N → C by (f 1 ⊗ • • • ⊗ f N )(x 1 , . . . , x N ) := j≤N f j (x j ). The Slater determinant is then the orthogonal projection of a tensor rank one function into antisymmetric space. In other words, the rank one tensor f 1 ⊗ • • • ⊗ f N is projected to A(f 1 ⊗ • • • ⊗ f N ) := 1 N ! σ∈S N (-1) σ f σ(1) ⊗ • • • ⊗ f σ(N ) . In coordinates, this expression becomes A(f 1 ⊗ • • • ⊗ f N )(x 1 , . . . , x N ) = 1 N ! det    f 1 (x 1 ) . . . f 1 (x N ) f 2 (x 1 ) . . . f 2 (x N ) . . . f N (x 1 ) . . . f N (x N )    , which shows that is antisymmetric following the alternating property of the determinant. The Slater Ansatz is then simply a linear combination of several Slater determinants, of the form F (x) = l≤L A(f l 1 ⊗ • • • ⊗ f l N ) , similarly as a shallow (Euclidean) neural network formed as a linear combination of simple non-linear ridge functions. While this defines a universal approximation class for antisymmetric functions (as a direct consequence of Weierstrass universal approximation theorems for polynomials), the approximation rates will generally be cursed by the dimensionality of the input space, as is also the case when studying lower bounds for standard shallow neural networks Maiorov & Meir (1998). In the case of particles in Ω = R or C, it is classical that all antisymmetric functions can be written as a product of a symmetric function with the Vandermonde (see Section 3). This setting is generally considered much easier than settings with higher-dimensional particles, as this Vandermonde factorization no longer applies, though there are still ansätze that mimic this formulation (Han et al., 2019b) . A more powerful variant is the Jastrow Ansatz, where each Slater determinant is 'augmented' with a symmetric prefactor (Jastrow, 1955)  , ie G = p • A(f 1 ⊗ • • • ⊗ f N ) where p is permutation-invariant. Clearly, G is still antisymmetric, since the product of an antisymmetric function with a symmetric one is again antisymmetric, but grants more representational power. Other parametrisations building from Jastrow are popularly used in the literature, e.g. backflow (Feynman & Cohen, 1956) , which models particle interactions by composing the Slater determinant with a permutation equivariant change of variables. Among practitioners, it is common knowledge that the Slater Ansatz is inefficient, compared to Jastrow or other more advanced parameterizations. Yet, there is no proven separation evinced by a particular hard antisymmetric function. We note that the Jastrow ansatz is strictly generalized by backflow (see Section 3), so separations between Slater and Jastrow would have immediate consequences for separations from the stronger architectures as well. In this work, we are interested in understanding quantitative differences in approximation power between these two classes. Specifically, we wish to find antisymmetric target functions G such that G can be efficiently approximated with the Jastrow ansatz, i.e. approximated to ϵ error in the infinity norm with some modest dependence on the parameters N and ϵ by a single Slater determinant with a single symmetric prefactor, yet no Slater representation can approximate G for reasonably small widths. This question mirrors the issue of depth separation in deep learning theory, where one seeks functions that exhibit a separation between, for example, two layer and three layer networks (Eldan & Shamir, 2016) , as well as recent separations between classes of symmetric representations (Zweig & Bruna, 2022) . Main Contribution: We prove the first explicit separation between the two ansätze, and construct an antisymmetric function G such that: • In some norm, G cannot be approximated better than constant error by the Slater ansatz, unless there are O(e N 2 ) many Slater Determinants. • G can be written in the Jastrow ansatz with neural network widths bounded by poly(N ) for specific activations, or in N O(N ) using complex ReLU

