ON REPRESENTING (ANTI)SYMMETRIC FUNCTIONS

Abstract

Permutation-invariant, -equivariant, and -covariant functions and anti-symmetric functions are important in quantum physics, computer vision, and other disciplines. (Anti)symmetric neural networks have recently been developed and applied with great success. A few theoretical approximation results have been proven, but many questions are still open, especially for particles in more than one dimension and the anti-symmetric case, which this work focusses on. More concretely, we derive natural polynomial approximations in the symmetric case, and approximations based on a single generalized Slater determinant in the antisymmetric case. Unlike some previous super-exponential and discontinuous approximations, these seem a more promising basis for future tighter bounds. In the supplementary we also provide a complete and explicit universality proof of the Equivariant MultiLayer Perceptron, which implies universality of symmetric MLPs and the FermiNet.

1. INTRODUCTION

Neural Networks (NN), or more precisely, Multi-Layer Perceptrons (MLP), are universal function approximators [Pin99] in the sense that every (say) continuous function can be approximated arbitrarily well by a sufficiently large NN. The true power of NN though stems from the fact that they apparently have a bias towards functions we care about and that they can be trained by local gradient-descent or variations thereof. For many problems we have additional information about the function, e.g. symmetries under which the function of interest is invariant or covariant. Here we consider functions that are covariant x under permutations. y Of particular interest are functions that are invariant z , equivariant { , or antisymmetric | under permutations. Definition 1 ((Anti)symmetric and equivariant functions) A function φ : X n → R in n ∈ N variables is called symmetric iff φ(x 1 , ..., x n ) = φ(x π(1) , ..., x π(n) ) for all x 1 , ..., x n ∈ X for all permutations π ∈ S n , where S n := {π : {1 : n} → {1 : n} ∧ π is bijection} is called the symmetric group and {1 : n} is short for {1, ..., n}. Similarly, a function ψ : X n → R is called anti-symmetric (AS) iff ψ(x 1 , ..., x n ) = σ(π)ψ(x π(1) , ..., x π(n) ) , where σ(π) = ±1 is the parity or sign of permutation π. A function ϕ : X n → X n is called equivariant under permutations iff ϕ(S π (x)) = S π (ϕ(x)), where x ≡ (x 1 , ..., x n ) and S π (x 1 , ..., x n ) := (x π(1) , ..., x π(n) ). Of course (anti)symmetric functions are also just functions, hence a NN of sufficient capacity can also represent (anti)symmetric functions, and if trained on an (anti)symmetric target could converge to an (anti)symmetric function. But NNs that can represent only (anti)symmetric functions are desirable for multiple reasons. Equivariant MLP (EMLP) are the basis for constructing symmetric functions by simply summing the output of the last layer, and for anti-symmetric (AS) functions by x In full generality, a function f : X → Y is covariant under group operations g ∈ G, if f (R X g (x)) = R Y g (f (x)), where R X g : X → X and R Y g : Y → Y are representations of group (element) g ∈ G. y The symmetric group G = Sn is the group of all permutations=bijections π : {1, ..., n} → {1, ..., n}. z R Y g =Identity. Permutation-invariant functions are also called 'totally symmetric functions' or simply 'symmetric function'. { General Y and X , often Y = X and R Y g = R X g , also called covariant. Approximation architectures need to satisfy a number of criteria to be practically useful: (a) they can approximate a large class of functions, e.g. all continuous (anti)symmetric functions, (b) only the (anti)symmetric functions can be represented, (c) a fast algorithm exists for computing the approximation, (d) the representation itself is continuous or differentiable, (e) the architecture is suitable for learning the function from data (which we don't discuss). Section 2 reviews existing approximation results for (anti)symmetric functions. Section 3 discusses various "naive" representations (linear, sampling, sorting) and their (dis)advantages, before introducing the "standard" solution that satisfies (a)-(e) based on algebraic composition of basis functions, symmetric polynomials, and polarized bases. For simplicity the section considers only totally symmetric functions of their n real-valued inputs (the d = 1 case), i.e. particles in one dimension. Section 4 proves the representation power of a single GSD for totally anti-symmetric (AS) functions (also d = 1). Technically we reduce the GSD to a Vandermonde determinant, and determine the loss of differentiability due to the Vandermonde determinant. From Sections 5 on we consider the general case of functions with n • d inputs that are (anti)symmetric when permuting their n ddimensional input vectors. The case d = 3 is particularly relevant for particles and point clouds in 3D space. The difficulties encountered for d = 1 transfer to d > 1, while the positive results don't, or only with considerable extra effort. The universality construction and proof for the EMLP is outlined in Section 6 with a proper treatment and all details in Sections 6-8 of the supplementary, which implies universality of symmetric MLPs and of the AS FermiNet. Section 7 concludes. We took great care to unify notation from different sources. The list of notation in the appendix should be helpful to disambiguate some similarly looking but different notation. Our main novel contributions are establishing the universality of the anti-symmetric FermiNet with a single GSD (Theorems 3&5&7) for d = 1 and d > 1 (the results are non-trivial and unexpected), and the universality of (2-hidden-layer) symmetric MLPs (Theorem 6) with a complete and explicit and self-contained equivariant universality construction based on (smooth) polynomials. We took care to avoid relying on results with inherently asymptotic or tabulation or discontinuous character, to enable (in future work) good approximation rates for specific function classes, such as smooth functions or those with 'nice' Fourier transform [Bar93, Mak96], The supplementary material contains the extended version of this paper with (more) details, discussion, and proofs.

2. RELATED WORK

The study of universal approximation properties of NN has a long history, see e. Functions on sets are necessarily invariant under permutation, since the order of set elements is irrelevant. For countable domain, [ZKR + 18] derive a general representation based on encoding domain elements as bits into the binary expansion of real numbers. They conjecture that the construction



g. [Pin99] for a pre-millennium survey, and e.g. [LSYZ20] for recent results and references. For (anti)symmetric NN such investigation has only recently begun [ZKR + 18, WFE + 19, HLL + 19, SI19].

with Vandermonde determinants or by computing their generalized Slater determinant (GSD) defined later.The most prominent application is in quantum physics which represents systems of identical (fermions) bosons with (anti)symmetric wave functions [PSMF20]. Another application is classification of point clouds in computer vision, which should be invariant under permutation of points[ZKR + 18].Even if a general NN can learn the (anti)symmetry, it will only do so approximately, but some applications require exact (anti)symmetry, for instance in quantum physics to guarantee upper bounds on the true ground state energy [PSMF20]. This has spawned interest in NNs that can represent only (anti)symmetric functions [ZKR + 18, HLL + 19]. A natural question is whether such NNs can represent all reasonable (anti)symmetric functions, which is the focus of this paper. We will answer this question for the (symmetric) EMLP [ZKR + 18] defined in Section 6 and for the (AS) FermiNet [PSMF20] defined in Sections 4&5&6.

