NEURAL IMPLICIT MANIFOLD LEARNING FOR TOPOLOGY-AWARE DENSITY ESTIMATION

Abstract

Natural data observed in R n is often constrained to an m-dimensional manifold M, where m < n. Current probabilistic models learn this manifold by mapping an m-dimensional latent variable through a neural network f θ : R m → R n . Such procedures, which we call pushforward models, incur a straightforward limitation: manifolds cannot in general be represented with a single parameterization, meaning that attempts to do so will incur either computational instability or the inability to learn probability densities within the manifold. To remedy this problem, we propose to model M as a neural implicit manifold: the set of zeros of a neural network. To learn the data distribution within M, we introduce constrained energybased models, which use a constrained variant of Langevin dynamics to train and sample within a learned manifold. The resulting model can be manipulated with an arithmetic of manifolds which allows practitioners to take unions and intersections of model manifolds. In experiments on synthetic and natural data, we show that constrained EBMs can learn manifold-supported distributions with complex topologies more accurately than pushforward models.

1. INTRODUCTION

Here we focus on the common statistical task of estimating an unknown probability distribution P * using a set of datapoints {x i } ⊂ R n sampled from P * . Commonly, the distribution of interest lies on an m-dimensional Riemannian submanifold M embedded in the ambient space R n , with m < n. For example, data from engineering or the natural sciences can be manifold-supported due to smooth physical constraints (Mardia et al., 2007; Boomsma et al., 2008; Brehmer & Cranmer, 2020) . In general, the underlying submanifold M may be unknown a priori, which calls for us to design models which learn M in the process of learning P * . The typical paradigm for modelling distributions on learned manifolds is a pushforward model: a neural parameterization f θ : R m → R n trained to transform an m-dimensional prior into a flexible distribution on the data manifold embedded in R n (e.g. Arjovsky et al. (2017); Tolstikhin et al. (2018) ; Arbel et al. ( 2021)). These techniques can generate high-resolution images, but are insufficiently flexible for learning distributions in settings where the true manifold structure is of interest. Modelling a manifold as the image of a single mapping f θ is topologically restrictive. For example, many approaches encourage an encoder g ϕ and decoder f θ to mutually invert each other at each datapoint (e.g. Donahue et al. ( 2017 2019)), an objective we can precisely reinterpret as training f θ to become a diffeomorphism between M and a subset of the latent space R m . This specification conflicts with the fact that, in general, M may have a complex topology which is not diffeomorphic to any such subset, exposing f θ to a frontier of tradeoffs between expressivity and numerical stability (Cornish et al., 2020; Behrmann et al., 2021; Salmona et al., 2022) . Even when f θ is not a diffeomorphism, its continuity dictates many topological properties of the model manifold, such as connectivity and the number of holes (Munkres, 2000) . In this paper we learn data manifolds with a much broader class of topologies using a novel approach outlined in Figure 1 . We first learn a manifold implicitly as the zero set of a neural network F θ , controlling the manifold dimension by regularizing the rank of its Jacobian. We then model the density within the manifold using a constrained energy-based model E ψ , which uses constrained Langevin dynamics to sample points on the learned manifold. We show that constrained energy-based models on manifolds can be composed with each other akin to standard energy-based models (Hinton, 2002): manifold-defining functions F θ along with their energies E ψ can be combined to take unions and intersections of data manifolds in what we call manifold arithmetic. We demonstrate theoretically and empirically that the proposed model can learn manifold-supported distributions more accurately than the pushforward paradigm prevalent in the current literature.

2.1. MODELLING MANIFOLD-SUPPORTED DATA

Manifold structure As above, suppose {x i } is a set of samples drawn from probability measure P * supported on M, an m-dimensional Riemannian submanifold of R n . We focus on the case where m < n, so that M is "infinitely thin" in R n , meaning P * does not admit a probability density with respect to the standard Lebesgue measure. However, we may assume it has a density p * (x) with respect to the Riemannian measure of M. We elaborate on this setup in Appendix A. Models for manifold-supported data have long been of interest in statistics, machine learning, and various applications (Diaconis et al., 2013; McInnes et al., 2018) . In particular, a number of past works have explored Monte Carlo methods on manifolds (Brubaker et al., 2012; Byrne & Girolami, 2013; Zappa et al., 2018) , which we put to use here. However, the problem of simultaneously learning a submanifold and an underlying density has only become of interest in tandem with recent advances in deep generative modelling (Brehmer & Cranmer, 2020) . To our knowledge, all such models fall under the umbrella of pushforward models. Density estimation with pushforward models When manifold-supported, P * is most commonly modelled as the pushforward of some latent distribution: z ∼ p ψ (z), x = f θ (z),



); Dumoulin et al. (2017); Xiao et al. (

Figure 1: In the top row, our method is depicted on simulated circular data from a von Mises distribution. From left to right: ground truth sample of von Mises data, a manifold-defining function F θ learned from the data, and an ambient energy E ψ trained with constrained Langevin dynamics on the learned manifold. In the bottom row, manifold learning and density estimation results from the resulting model are juxtaposed with a pushforward baseline. From left to right: the ground truth, a pushforward energy-based model, and a constrained energy-based model (ours). By defining the manifold with a constraint F θ (x) = 0, our method can model data with non-trivial topologies.

