GENERATIVE LEARNING WITH EULER PARTICLE TRANSPORT

Abstract

We propose an Euler particle transport (EPT) approach for generative learning. The proposed approach is motivated by the problem of finding the optimal transport map from a reference distribution to a target distribution characterized by the Monge-Ampere equation. Interpreting the infinitesimal linearization of the Monge-Ampere equation from the perspective of gradient flows in measure spaces leads to a stochastic McKean-Vlasov equation. We use the forward Euler method to solve this equation. The resulting forward Euler map pushes forward a reference distribution to the target. This map is the composition of a sequence of simple residual maps, which are computationally stable and easy to train. The key task in training is the estimation of the density ratios or differences that determine the residual maps. We estimate the density ratios (differences) based on the Bregman divergence with a gradient penalty using deep density-ratio (difference) fitting. We show that the proposed density-ratio (difference) estimators do not suffer from the "curse of dimensionality" if data is supported on a lower-dimensional manifold. Numerical experiments with multi-mode synthetic datasets and comparisons with the existing methods on real benchmark datasets support our theoretical results and demonstrate the effectiveness of the proposed method.

1. INTRODUCTION

The ability to efficiently sample from complex distributions plays a key role in a variety of prediction and inference tasks in machine learning and statistics (Salakhutdinov, 2015) . The long-standing methodology for learning an underlying distribution relies on an explicit statistical data model, which can be difficult to specify in many applications such as image analysis, computer vision and natural language processing. In contrast, implicit generative models do not assume a specific form of the data distribution, but rather learn a nonlinear map to transform a reference distribution to the target distribution. This modeling approach has been shown to achieve impressive performance in many machine learning tasks (Reed et al., 2016; Zhu et al., 2017) . Generative adversarial networks (GAN) (Goodfellow et al., 2014) In this paper, we propose an Euler particle transport (EPT) approach for learning a generative model by integrating ideas from optimal transport, numerical ODE, density-ratio estimation and deep neural networks. We formulate the problem of generative learning as that of finding a nonlinear transform that pushes forward a reference to the target based on the quadratic Wasserstein distance. Since it is challenging to solve the resulting Monge-Ampère equation, we consider the continuity equation derived from the linearization of the Monge-Ampère equation, which is a gradient flows converging to the target distribution. We solve the Mckean-Vlasov equation associated with the gradient flow using the forward Euler method. The resulting EPT that pushes forward the reference distribution to the target distribution is a composition of a sequence of simple residual maps, which are computationally stable and easy to train. The residual maps are completely determined by the density ratios between the distributions at the current iterations and the target distribution. We estimate density ratios based on the Bregman divergence with a gradient regularizer using deep density-ratio fitting. We establish bounds on the approximation errors due to linearization of the Monge-Ampère equation, Euler discretization of the Mckean-Vlasov equation, and deep density-ratio estimation. Our result on the error rate for the proposed density-ratio estimators improves the minimax rate of nonparametric estimation via exploring the low-dimensional structure of the data and circumvents the "curse of dimensionality". Experimental results on multi-mode synthetic data and comparisons with stateof-the-art GANs on benchmark data support our theoretical findings and demonstrate that EPT is computationally more stable and easier to train than GANs. Using simple ReLU ResNets without batch normalization and spectral normalization, we obtained results that are better than or comparable with those using GANs trained with such tricks.

2. EULER PARTICLE TRANSPORT

Let X ∈ R m be a random vector with distribution ν, and let Z be a random vector with distribution µ. We assume that µ has a known and simple form. Our goal is to construct a transformation T such that T # µ = ν, where T # µ denotes the push-forward distribution of µ by T , that is, the distribution of T (Z). Then we can sample from ν by first generating a Z ∼ µ and calculate T (Z). In practice, ν is unknown and only a random sample {X i } n i=1 i.i.d. ν is available. We must construct T based on the sample. There may exist multiple transports T with T # µ = ν. The optimal transport is the one that minimizes the quadratic Wasserstein distance between µ and ν defined by W 2 (µ, ν) = { inf γ∈Γ(µ,ν) E (Z,X)∼γ [ Z -X 2 2 ]} 1 2 , ( ) where Γ(µ, ν) denotes the set of couplings of (µ, ν) (Villani, 2008; Ambrosio et al., 2008) . Suppose that µ and ν have densities q and p with respect to the Lesbeque measure, respectively. Then the optimal transport map T such that T # µ = ν is characterized by the Monge-Ampère equation (Brenier, 1991; McCann, 1995; Santambrogio, 2015) . Specifically, the minimization problem in (1) admits a unique solution γ = (1, T ) # µ with T = ∇Ψ, µ-a.e., where 1 is the identity map and ∇Ψ is the gradient of the potential function Ψ : R m → R. This function is convex and satisfies the Monge-Ampère equation det(∇ 2 Ψ(z)) = q(z) p(∇Ψ(z)) , z ∈ R m . Therefore, to find the optimal transport T , it suffices to solve (2) for Ψ. However, it is challenging to solve this degenerate elliptic equation due to its highly nonlinear nature. Below we describe the proposed EPT method for obtaining an approximate solution of the Monge-Ampère equation (2). It consists of the following steps: (a) linearizing (2) via residual maps, (b) determining the velocity fields governing the stochastic McKean-Vlasov equation resulting from the linearization, (c) calculating the forward Euler particle transport map and, (d) training the EPT map by estimating the velocity fields from data. Since velocity fields are completely determined by density ratios, this step amounts to nonparametric density ratio estimation. We also provide bounds on the errors due to linearization, discretization and estimation. Mathematical details and proofs are given in the appendix. Linearization via residual map A basic approach to addressing the difficulty due to nonlinearity is linearization. We use a linearization method based on the residual map T t,Φt = ∇Ψ = 1 + t∇Φ t , t ≥ 0, where Φ t : R m → R 1 is a function to be chosen such that the law of T t,Φt (Z) approaches ν as t increases (Villani, 2008) . We give the specific form of Φ t below, see Theorem B.1 in the appendix for details. This linearization scheme leads to the stochastic process X t : R m → R m satisfying the McKean-Vlasov equation d dt X t (x) = v t (X t (x)), t ≥ 0, with X 0 ∼ µ, µa.e. x ∈ R m , (4) where v t is the velocity vector field of X t . In addition, we have v t = ∇Φ t . Thus v t also determines the residual map (3). The details of the derivation are given in Theorems B.2 and B.1. in the appendix. Therefore, estimating the residual map (3) is equivalent to estimating v t .



, variational auto-encoders (VAE) (Kingma & Welling, 2014) and flow-based methods (Rezende & Mohamed, 2015) are important representatives of implicit generative models.

