LEARNING DISCONNECTED MANIFOLDS: AVOIDING THE NO GAN'S LAND BY LATENT REJECTION Anonymous

Abstract

Standard formulations of GANs, where a continuous function deforms a connected latent space, have been shown to be misspecified when fitting disconnected manifolds. In particular, when covering different classes of images, the generator will necessarily sample some low quality images in between the modes. Rather than modify the learning procedure, a line of works aims at improving the sampling quality from trained generators. Thus, it is now common to introduce a rejection step within the generation procedure. Building on this, we propose to train an additional network and transform the latent space via an adversarial learning of importance weights. This idea has several advantages: 1) it provides a way to inject disconnectedness on any GAN architecture, 2) since the rejection happens in the latent space, it avoids going through both the generator and the discriminator saving computation time, 3) this importance weights formulation provides a principled way to reduce the Wasserstein's distance to the target distribution. We demonstrate the effectiveness of our method on different datasets, both synthetic and high dimensional.

1. INTRODUCTION

GANs (Goodfellow et al., 2014) are an effective way to learn complex and high-dimensional distributions, leading to state-of-the-art models for image synthesis in both unconditional (Karras et al., 2019) and conditional settings (Brock et al., 2019) . However, it is well-known that a single generator with a unimodal latent variable cannot recover a distribution composed of disconnected sub-manifolds (Khayatkhoei et al., 2018) . This leads to a common problem for practitioners: the necessary existence of very-low quality samples when covering different modes. This is formalized by Tanielian et al. (2020) which refers to this area as the no GAN's land and provides impossibility theorems on the learning of disconnected manifolds with standard formulations of GANs. Fitting a disconnected target distribution requires an additional mechanism inserting disconnectedness in the modeled distribution. A first solution is to add some expressivity to the model: Khayatkhoei et al. (2018) propose to train a mixture of generators while Gurumurthy et al. ( 2017) make use of a multi-modal latent distribution. A second solution is to improve the quality of a trained generative model by avoiding its poorest samples (Tao et al., 2018; Azadi et al., 2019; Turner et al., 2019; Grover et al., 2019; Tanaka, 2019) . This second line of research relies heavily on a variety of Monte-Carlo algorithms, such as Rejection Sampling or the Metropolis-Hastings. These methods aim at sampling from a target distribution, while having only access to samples generated from a proposal distribution. This idea was successfully applied to GANs, using the previously learned generative distribution as a proposal distribution. However, one of the main drawback is that Monte-Carlo algorithms only guarantee to sample from the target distribution under strong assumptions. First, we need access to the density ratios between the proposal and target distributions or equivalently to a perfect discriminator (Azadi et al., 2019) . Second, the support of the proposal distribution must fully cover the one of the target distribution, which means no mode collapse. This is known to be very demanding in high dimension since the intersection of supports between the proposal and target distribution is likely to be negligible (Arjovsky and Bottou, 2017, Lemma 3). In this setting, an optimal discriminator would give null acceptance probabilities for almost any generated points, leading to a lower performance. To tackle the aforementioned issue, we propose a novel method aiming at reducing the Wasserstein distance between the previously trained generative model and the target distribution. This is done via the adversarial training of a third network that learns importance weights in the latent space. The goal is to learn the redistribution of mass of the modeled distribution that best fits the target distribution. To better understand our approach, we first consider a simple 2D motivational example where the real data lies on four disconnected manifolds. To approximate this, the generator splits the latent space into four distinct areas and maps data points located in the frontiers, areas in orange in Figure 1b , out of the true manifold (see Figure 1a ). Our method consequently aims at learning latent importance weights that can identify these frontiers and simply avoid them. This is highlighted in Figure 1d where the importance weighter has identified these four frontiers. When sampling from the new latent distribution, we can now perfectly fit the mixture of four gaussians (see Figure 1c ). Figure 1 : Learning disconnected manifolds leads to the apparition of an area in the latent space generating points outside the target manifold. With the use of the importance weighter, one can avoid this specific area and better fit the target distribution. Our contributions are the following: • We discuss works improving the sampling quality of GANs and identify their limitations. • We propose a novel approach that directly modifies the latent space distribution. It provides a principled way to reduce the Wasserstein distance to the target distribution. • We thorougly compare our method with a large set of previous approaches on a variety of datasets and distributions. We empirically show that our solution significantly reduces the computational cost of inference while demonstrating an equal efficiency. Notation. Before moving to the related work section, we shortly present notation needed in the paper. The goal of the generator is to generate data points that are "similar" to samples collected from some target probability measure µ . The measure µ is defined on a potentially high dimensional space R D , equipped with the euclidean norm • . To approach µ , we use a parametric family of generative distribution where each distribution is the push-forward measure of a latent distribution Z and a continuous function modeled by a neural network. In most of all practical applications, the random variable Z defined on a low dimensional space R d is either a multivariate Gaussian distribution or uniform distribution. The generator is a parameterized class of functions from R d to R D , say G = {G θ : θ ∈ Θ }, where Θ ⊆ R p is the set of parameters describing the model. Each function G θ takes input from Z and outputs "fake" observations with distribution µ θ = G θ Z. On the other hand, the discriminator is described by a family of functions from R D to R, say D = {D α : α ∈ Λ }, Λ ⊆ R Q , where each D α . Finally, for any given distribution µ, we note S µ its support.



(a) WGAN: real samples in green and fake ones in blue.(b) Latent space: heatmap of the distance between a generated sample and its nearest real sample.(c) WGAN with latent rejection sampling: real samples in green and fake ones in blue.(d) Latent space: heatmap of the learned importance weights. The blue frontiers have zero weights.

