IMPLICIT NORMALIZING FLOWS

Abstract

Normalizing flows define a probability distribution by an explicit invertible transformation z = f (x). In this work, we present implicit normalizing flows (ImpFlows), which generalize normalizing flows by allowing the mapping to be implicitly defined by the roots of an equation F (z, x) = 0. ImpFlows build on residual flows (ResFlows) with a proper balance between expressiveness and tractability. Through theoretical analysis, we show that the function space of ImpFlow is strictly richer than that of ResFlows. Furthermore, for any ResFlow with a fixed number of blocks, there exists some function that ResFlow has a nonnegligible approximation error. However, the function is exactly representable by a single-block ImpFlow. We propose a scalable algorithm to train and draw samples from ImpFlows. Empirically, we evaluate ImpFlow on several classification and density modeling tasks, and ImpFlow outperforms ResFlow with a comparable amount of parameters on all the benchmarks.

1. INTRODUCTION

Normalizing flows (NFs) (Rezende & Mohamed, 2015; Dinh et al., 2014) are promising methods for density modeling. NFs define a model distribution p x (x) by specifying an invertible transformation f (x) from x to another random variable z. By change-of-variable formula, the model density is ln p x (x) = ln p z (f (x)) + ln |det(J f (x))| , where p z (z) follows a simple distribution, such as Gaussian. NFs are particularly attractive due to their tractability, i.e., the model density p x (x) can be directly evaluated as Eqn. (1). To achieve such tractability, NF models should satisfy two requirements: (i) the mapping between x and z is invertible; (ii) the log-determinant of the Jacobian J f (x) is tractable. Searching for rich model families that satisfy these tractability constraints is crucial for the advance of normalizing flow research. For the second requirement, earlier works such as inverse autoregressive flow (Kingma et al., 2016) and RealNVP (Dinh et al., 2017) restrict the model family to those with triangular Jacobian matrices. More recently, there emerge some free-form Jacobian approaches, such as Residual Flows (Res-Flows) (Behrmann et al., 2019; Chen et al., 2019) . They relax the triangular Jacobian constraint by utilizing a stochastic estimator of the log-determinant, enriching the model family. However, the Lipschitz constant of each transformation block is constrained for invertibility. In general, this is not preferable because mapping a simple prior distribution to a potentially complex data distribution may require a transformation with a very large Lipschitz constant (See Fig. 3 for a 2D example). Moreover, all the aforementioned methods assume that there exists an explicit forward mapping z = f (x). Bijections with explicit forward mapping only covers a fraction of the broad class of invertible functions suggested by the first requirement, which may limit the model capacity. In this paper, we propose implicit flows (ImpFlows) to generalize NFs, allowing the transformation to be implicitly defined by an equation F (z, x) = 0. Given x (or z), the other variable can be computed by an implicit root-finding procedure z = RootFind(F (•, x)). An explicit mapping z = f (x) used in prior NFs can viewed as a special case of ImpFlows in the form of F (z, x) = f (x) -z = 0. To balance between expressiveness and tractability, we present a specific from of ImpFlows, where each block is the composition of a ResFlow block and the inverse of another ResFlow block. We theoretically study the model capacity of ResFlows and ImpFlows in the function space. We show that the function family of single-block ImpFlows is strictly richer than that of two-block ResFlows by relaxing the Lipschitz constraints. Furthermore, for any ResFlow with a fixed number of blocks, there exists some invertible function that ResFlow has non-negligible approximation error, but ImpFlow can exactly model. On the practical side, we develop a scalable algorithm to estimate the probability density and its gradients, and draw samples from ImpFlows. The algorithm leverages the implicit differentiation formula. Despite being more powerful, the gradient computation of ImpFlow is mostly similar with that of ResFlows, except some additional overhead on root finding. We test the effectiveness of ImpFlow on several classification and generative modeling tasks. ImpFlow outperforms ResFlow on all the benchmarks, with comparable model sizes and computational cost.

2. RELATED WORK

Expressive Normalizing Flows There are many works focusing on improving the capacity of NFs. 2020) improve the capacity of NFs by operating in a higher-dimensional space. As mentioned in the introduction, all these existing works adopt explicit forward mappings, which is only a subset of the broad class of invertible functions. In contrast, the implicit function family we consider is richer. While we primarily discuss the implicit generalization of ResFlows (Chen et al., 2019) in this paper, the general idea of utilizing implicit invertible functions could be potentially applied to other models as well. Finally, Zhang et al. (2020) formally prove that the model capacity of ResFlows is restricted by the dimension of the residual blocks. In comparison, we study another limitation of ResFlows in terms of the bounded Lipschitz constant, and compare the function family of ResFlows and ImpFlows with a comparable depth. Continuous Time Flows (CTFs) (Chen et al., 2018b; Grathwohl et al., 2019; Chen et al., 2018a ) are flexible alternative to discrete time flows for generative modeling. They typically treat the invertible transformation as a dynamical system, which is approximately simulated by ordinary differential equation (ODE) solvers. In contrast, the implicit function family considered in this paper does not contain differential equations, and only requires fixed point solvers. Moreover, the theoretical guarantee is different. While CTFs typically study the universal approximation capacity under the continuous time case (i.e., "infinite depth" limit), we consider the model capacity of ImpFlows and ResFlows under a finite number of transformation steps. Finally, while CTFs are flexible, their learning is challenging due to instability (Liu et al., 2020; Massaroli et al., 2020) et al. (2020) incorporate periodic functions for representation learning. Different from these works, which consider implicit functions as a replacement to feed-forward networks, we develop invertible implicit functions for normalizing flows, discuss the conditions of the existence of such functions, and theoretically study the model capacity of our proposed ImpFlow in the function space.

3. IMPLICIT NORMALIZING FLOWS

We now present implicit normalizing flows, by starting with a brief overview of existing work.



For example, Dinh et al. (2014; 2017); Kingma & Dhariwal (2018); Ho et al. (2019); Song et al. (2019); Hoogeboom et al. (2019); De Cao et al. (2020); Durkan et al. (2019) design dedicated model architectures with tractable Jacobian. More recently, Grathwohl et al. (2019); Behrmann et al. (2019); Chen et al. (2019) propose NFs with free-form Jacobian, which approximate the determinant with stochastic estimators. In parallel with architecture design, Chen et al. (2020); Huang et al. (2020); Cornish et al. (2020); Nielsen et al. (

and exceedingly many ODE solver steps (Finlay et al., 2020), making their large-scale application still an open problem.

