DEEP GENERATIVE WASSERSTEIN GRADIENT FLOWS

Abstract

Deep generative modeling is a rapidly-advancing field with a wealth of modeling choices developed in the past decade. Amongst them, Wasserstein gradient flows (WGF) are a powerful and theoretically rich class of methods. However, their applications to high-dimensional distributions remain relatively underexplored. In this paper, we present Deep Generative Wasserstein Gradient Flows (DGGF), which constructs a WGF minimizing the entropy-regularized f -divergence between two distributions. We demonstrate how to train a deep density ratio estimator that is required for the WGF and apply it to the task of generative modeling. Experiments demonstrate that DGGF is able to synthesize high-fidelity images of resolutions up to 128 × 128, directly in data space. We demonstrate that DGGF has an interpretable diagnostic of sample quality by naturally estimating the KL divergence throughout the gradient flow. Finally, we show DGGF's modularity by composition with external density ratio estimators for conditional generation, as well as for unpaired image-to-image translation without modifications to the underlying framework.

1. INTRODUCTION

Gradient flow methods are a powerful and general class of techniques with diverse applications ranging from physics (Carrillo et al., 2019; Adams et al., 2011) and sampling (Bernton, 2018) to neural network optimization (Chizat & Bach, 2018) and reinforcement learning (Richemond & Maginnis, 2017; Zhang et al., 2018) . In particular, Wasserstein gradient flow (WGF) methods are a popular specialization that model the gradient dynamics on the space of probability measures with respect to the Wasserstein metric; these methods aim to construct the optimal path between two probability measures -a source distribution q(x) and a target distribution p(x) -where the notion of optimality refers to the path of steepest descent in Wasserstein space. The freedom in choosing q(x) and p(x) when constructing the WGF makes the framework a natural fit for a variety of generative modeling tasks. For data synthesis, we choose q(x) to be a simple distribution easy to draw samples from (e.g., Gaussian), and p(x) to be a complex distribution which we would like to learn (e.g., the distribution of natural images). The WGF then constructs the optimal path from the simple distribution to synthesize data resembling that from the complex distribution. Furthermore, we could choose both p(x) and q(x) to be distributions from different domains of the same modality (e.g., images from separate domains). The WGF then naturally performs domain translation. However, despite this fit and the wealth of theoretical work established over the past decades (Ambrosio et al., 2005; Santambrogio, 2017) , applications of WGFs to generative modeling of highdimensional distributions remain under-explored and limited. A key difficulty is that the 2-Wasserstein distance and divergence functionals are generally intractable. Existing works rely on complex optimization schemes with constraints that contribute to model complexity, such as approximations of the 2-Wasserstein distance with input convex neural networks (Mokrov et al., 2021) , dual variational optimization schemes with the Fenchel conjugate (Fan et al., 2021) or adopting a particle simulation approach, but amortizing sample generation to auxiliary generators (Gao et al., 2019; 2022) . In this work, we take a step towards resolving the shortcomings of WGF methods for deep generative modeling. We propose Deep Generative Wasserstein Gradient Flows (DGGF), which is formulated using the gradient flow of entropy-regularized f -divergences (Fig. 1 ). As this formulation involves density ratio estimation, we introduce a novel algorithm for training deep density ratio estimators and show experimentally for the first time that gradient flow methods can scale to image dimensions as high as 128 × 128. Our gradient flow is formulated entirely in the data space, with no need for additional generator networks. The density ratio formulation allows DGGF to be composed with external density ratio estimators, which we show allows us to utilize pretrained external classifiers for class-conditional generation. In addition, we demonstrate that DGGF can be viewed as estimating the KL divergence of samples over the flow, providing it with an innate diagnostic to evaluate sample quality that also enhances model interpretability. We also show a simple technique of leveraging data-dependent priors to boost generative performance. Finally, by leveraging the freedom of choosing the source and target distributions, we show DGGF can be applied to unpaired image-to-image translation with no modifications to the framework.

2. BACKGROUND

In the following, we give a brief overview of gradient flows and density ratio estimation. For a more comprehensive introduction to gradient flows, please refer to Santambrogio (2017). A thorough overview of density ratio estimation can be found in Sugiyama et al. (2012a) . Wasserstein Gradient Flows. To motivate the concept of gradient flows, we consider Euclidean space equipped with the familiar L 2 distance metric (X , ∥ • ∥ 2 ). Given a function F : X → R, the curve {x(t)} t∈R + that follows the direction of steepest descent is called the gradient flow of F : x ′ (t) = -∇F (x(t)). (1) In generative modeling, we are interested in sampling from the probability distribution of a given dataset. Hence, instead of Euclidean space, we consider the space of probability measures with finite second moments equipped with the 2-Wasserstein metric (P 2 (Ω), W 2 ). Given a functional F : P 2 (Ω) → R in the 2-Wasserstein space, the gradient flow of F is the steepest descent curve of F. We call such curves Wasserstein gradient flows (WGF). Density Ratio Estimation via Bregman Divergence. Let q(x) and p(x) be two distributions over X ∈ R d where we have access to i.i.d samples x q ∼ q(x) and x p ∼ p(x). The goal of density ratio estimation (DRE) is to estimate the true density ratio r * (x) = q(x) p(x) based on samples x q and x p .



Figure 1: Left: illustration of the generative gradient flow process using DGGF. The evolution of the gradient flow is governed by the SDE shown in the figure. We visualize intermediate samples of the LSUN Church dataset. Right: visualization of application domains of DGGF. At its core, DGGF is able to perform high-fidelity unconditional image generation. The unconditional model can be used for class-conditional generation via density ratio composition with external pretrained classifiers. Additionally, DGGF is able to perform unpaired image-to-image translation with no modifications needed to the framework. Finally, DGGF possesses an innate sample diagnostic by estimating the KL divergence over the flow, which decreases as sample quality is improved over the flow.

