ON THE LATENT SPACE OF FLOW-BASED MODELS Anonymous authors Paper under double-blind review

Abstract

Flow-based generative models typically define a latent space with dimensionality identical to the observational space. In many problems, however, the data does not populate the full ambient data-space that they natively reside in, but rather inhabit a lower-dimensional manifold. In such scenarios, flow-based models are unable to represent data structures exactly as their density will always have support off the data manifold, potentially resulting in degradation of model performance. In addition, the requirement for equal latent and data space dimensionality can unnecessarily increase model complexity for contemporary flow models. Towards addressing these problems, we propose to learn a manifold prior that affords benefits to both the tasks of sample generation and representation quality. An auxiliary product of our approach is that we are able to identify the intrinsic dimension of the data distribution.

1. INTRODUCTION

Normalizing flows (Rezende and Mohamed, 2015; Kobyzev et al., 2020) have shown considerable potential for the tasks of modelling and inferring expressive distributions through the learning of well-specified probabilistic models. Contemporary flow-based approaches define a latent space with dimensionality identical to the data space, typically by parameterizing a complex model p X (x|θ) using an invertible neural network f θ . Samples drawn from an initial, simple distribution p Z (z) (e.g. Gaussian) can be mapped to a complex distribution as x = f θ (z). The process results in a tractable density that inhabits the full data space. However, contemporary flow models may make for an inappropriate choice to represent data that resides in a lower-dimensional manifold and thus does not populate the full ambient space. In such cases, the estimated model will necessarily have mass lying off the data manifold, which may result in under-fitting and poor generation qualities. Furthermore, principal objectives such as Maximum Likelihood Estimation (MLE) and Kullback-Leibler (KL) divergence minimization are ill-defined, bringing additional challenges for model training. In this work, we propose a principled strategy to model a data distribution that lies on a continuous manifold and we additionally identify the intrinsic dimension of the data manifold. Specifically, by using the connection between MLE and KL divergence minimization in Z space, we can address the important problem of ill-defined KL divergence under typical flow based assumptions. Flow models are based on the idea of "change of variable". Assume a random variable Z with distribution P Z and probability density p Z (z). We can transform Z to get a random variable X: X = f (Z), where f : R D → R D is an invertible function with inverse f -1 = g. Suppose X has distribution P X and density function p X (x), then log p X (x) will have the following form log p X (x) = log p Z (g(x)) + log det ∂g ∂x , where log det ∂g ∂x is the log determinant of the Jacobian matrix. We call f (or g) a volumepreserving function if the log determinant is equal to 0. Training of flow models typically makes use of MLE. We denote X d as the random variable of the data with distribution P d and density p d (x). In addition to the well-known connection between MLE and minimization of the KL divergence KL(p d (x)||p X (x)) in X space (see Appendix A for detail), MLE is also (approximately) equivalent to minimizing the KL divergence in Z space, this is due to the KL divergence is invariant under invertible transformation (Yeung, 2008; Papamakarios et al., 

