WASSERSTEIN-2 GENERATIVE NETWORKS

Abstract

We propose a novel end-to-end non-minimax algorithm for training optimal transport mappings for the quadratic cost (Wasserstein-2 distance). The algorithm uses input convex neural networks and a cycle-consistency regularization to approximate Wasserstein-2 distance. In contrast to popular entropic and quadratic regularizers, cycle-consistency does not introduce bias and scales well to high dimensions. From the theoretical side, we estimate the properties of the generative mapping fitted by our algorithm. From the practical side, we evaluate our algorithm on a wide range of tasks: image-to-image color transfer, latent space optimal transport, image-to-image style transfer, and domain adaptation.

1. INTRODUCTION

Generative learning framework has become widespread over the last couple of years tentatively starting with the introduction of generative adversarial networks (GANs) by Goodfellow et al. (2014) . The framework aims to define a stochastic procedure to sample from a given complex probability distribution Q on a space Y ⊂ R D , e.g. a space of images. The usual generative pipeline includes sampling from tractable distribution P on space X and applying a generative mapping g : X → Y that transforms P into the desired Q. In many cases for probability distributions P, Q, there may exist several different generative mappings. For example, the mapping in Figure 1b seems to be better than the one in Figure 1a and should be preferred: the mapping in Figure 1b is straightforward, wellstructured and invertible. Existing generative learning approaches mainly do not focus on the structural properties of the generative mapping. 



(a) An Arbitrary Mapping. (b) The Monotone Mapping.

Figure 1: Two possible generative mappings that transform distribution P to distribution Q.

For example, GAN-based approaches, such as f -GAN by Nowozin et al. (2016); Yadav et al. (2017), W-GAN by Arjovsky et al. (2017) and others Li et al. (2017); Mroueh & Sercu (2017), approximate generative mapping by a neural network with a problem-specific architecture.

