DENSITY ESTIMATION ON LOW-DIMENSIONAL MANI-FOLDS: AN INFLATION-DEFLATION APPROACH

Abstract

Normalizing Flows (NFs) are universal density estimators based on Neuronal Networks. However, this universality is limited: the density's support needs to be diffeomorphic to a Euclidean space. In this paper, we propose a novel method to overcome this limitation without sacrificing the universality. The proposed method inflates the data manifold by adding noise in the normal space, trains an NF on this inflated manifold and, finally, deflates the learned density. Our main result provides sufficient conditions on the manifold and the specific choice of noise under which the corresponding estimator is exact. Our method has the same computational complexity as NFs, and does not require to compute an inverse flow. We also show that, if the embedding dimension is much larger than the manifold dimension, noise in the normal space can be well approximated by some Gaussian noise. This allows using our method for approximating arbitrary densities on non-flat manifolds provided that the manifold dimension is known.

1. INTRODUCTION

Many modern problems involving high dimensional data are formulated probabilistically. Key concepts, such as Bayesian Classification, Denoising or Anomaly Detection, rely on the data generating density p * (x). Therefore, a main research area and of crucial importance is learning this data generating density p * (x) from samples. For the case where the corresponding random variable X ∈ R D takes values on a manifold diffeomorphic to R D , a Normalizing Flow (NF) can be used to learn p * (x) exactly (Huang et al., 2018) . Recently, a few attempts have been made to overcome this topological constraint. However, to do so, all of these methods either need to know the manifold beforehand (Gemici et al. (2016) Our goal in this paper is to overcome both the aforementioned limitations of using NFs for density estimation on Riemannian manifolds. Given data points from a d-dimensional Riemannian manifold embedded in R D , d < D, we first inflate the manifold by adding a specific noise in the normal space direction of the manifold, then train an NF on this inflated manifold, and, finally, deflate the trained density by exploiting the choice of noise and the geometry of the manifold. See Figure 1 for a schematic overview of these points. Our main theorem states sufficient conditions on the manifold and the type of noise we use for the inflation step such that the deflation becomes exact. To guarantee the exactness, we do need to know the manifold as in e.g. Rezende et al. (2020) because we need to be able to sample in the manifold's normal space. However, as we will show, for the special case where D d, the usual Gaussian noise is an excellent approximation for a noise in the normal space component. This allows using our method for approximating arbitrary densities on Riemannian manifolds provided that the manifold dimension is known. In addition, our method is based on a single NF without the necessity to invert it. Hence, we don't add any additional complexity to the usual training procedure of NFs. Notations: We denote the determinant of the Gram matrix of f as g f (x) := | det J f (x) T J f (x) | where J f (x) is the Jacobian of f . We denote the Lebesque measure in R n as λ n . Random variables will be denoted with a capital letter, say X, and its corresponding state space with the calligraphical R D x ∼ p * (x) x x ∼ q(x) +σ 2 R D ←-X ⊂ R D R D = U x u = F -1 θ (x) -σ 2 Theorem 1 ←-X ⊂ R d R D x ∼ p(x) x 3 4 1 u ∼ p U (u) Figure 1 : Schematic overview of our method. 1. A density p * (x) with support on a d-dimensional manifold X (top left) is inflated by adding noise σfoot_0 in the normal space (top right). 2. We have an NF F -foot_1 θ (x) learn this inflated density q(x) using a well-known reference measure p U (u). 3. We deflate the learned density to obtain an estimate p(x) for p * (x). 4. Our main result provides sufficient conditions for the manifold X and the choice of noise such that p(x) = p * (x). version, i.e. X . Small letters correspond to vectors with dimensionality given by context. The letters d, D, n, and N are always natural numbers.

2. BACKGROUND AND PROBLEM STATEMENT

An NF transforms a known auxiliary random variable by using bijective mappings parametrized by Neuronal Networks such that the given data points are samples from this transformed random variable, see Papamakarios et al. (2019) . Formally, an NF is a diffeomorphism F θ : U → X and induces a density on X through p θ (x) = (g F (u) -1 2 p U (u) where p U (u) is known and u = F -1 θ (x). The parameters θ are updated such that the KL-divergence between p * (x) and p θ (x), D KL (p * (x)||p θ (x)) = E x∼p * (x) [log p θ (x)] + const. (1) is minimized. If F θ is expressive enough, it was proven that in the limit of infinitely many samples, updating θ to minimize this objective function converges to a θ * such that it holds P X -almost surely p * (x) = p θ * (x), see (Huang et al., 2018) . More generally, let X ∈ X ⊂ R D be generated by an unobserved random variable Z ∈ Z ⊂ R d with density π(z), that is X = f (Z) for some function f : Z → X where typically d < D. In Gemici et al. ( 2016), f is an embedding 1 , and it was shown that one can calculate probabilities such as P X (A) for measurable A ⊂ X using a density p * (x) with respect to the volume form dV f induced by f , that is P(X ∈ A) = f -1 (A) π(z)dz = A p * (x)dV f (x) (2) with p * (x) = π(z)g f (z) -1 2 and dV f (x) = g f (z)dz where z = f -1 (x). Hence, given an explicit mapping f and samples from p * (x), we can learn the unknown density π(z) using a usual NF in R d . However, in general, the generating function f is either unknown or not an embedding creating numerical instabilities for training inputs close to singularity points. In Brehmer & Cranmer (2020), f and the unknown density π are learned simultaneously. The main idea is to define f as a level set of a usual flow in R D and train it together with the flow in R d used to learn π(z). To evaluate the density, one needs to invert f and thus this approach may be very slow for high-dimensional data. Besides, to guarantee that f learns the manifold they proposed several ad hoc training strategies. We tie in with the idea to use an NF for learning p * (x) with unknown f and study the following problem.



F -1 θ Thus, a regular continuously differentiable mapping (called immersion) which is, restricted to its image, a homeomorphism.



, Rezende et al. (2020)) or they sacrifice the exactness of the estimate (Cornish et al. (2019), Dupont et al. (2019)).

