INTERVENTION GENERATIVE ADVERSARIAL NETS

Abstract

In this paper we propose a novel approach for stabilizing the training process of Generative Adversarial Networks as well as alleviating the mode collapse problem. The main idea is to incorporate a regularization term that we call intervention into the objective. We refer to the resulting generative model as Intervention Generative Adversarial Networks (IVGAN). By perturbing the latent representations of real images obtained from an auxiliary encoder network with Gaussian invariant interventions and penalizing the dissimilarity of the distributions of the resulting generated images, the intervention term provides more informative gradient for the generator, significantly improving training stability and encouraging modecovering behaviour. We demonstrate the performance of our approach via solid theoretical analysis and thorough evaluation on standard real-world datasets as well as the stacked MNIST dataset.

1. INTRODUCTION

As one of the most important advances in generative models in recent years, Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have been attracting great attention in the machine learning community. GANs aim to train a generator network that transforms simple vectors of noise to produce "realistic" samples from the data distribution. In the basic training process of GANs, a discriminator and a target generator are trained in an adversarial manner. The discriminator tries to distinguish the generated fake samples from the real ones, and the generator tries to fool the discriminator into believing the generated samples to be real. Though successful, there are two major challenges in training GANs: the instability of the training process and the mode collapse problem. To deal with these problems, one class of approaches focus on designing more informative objective functions (Salimans et al., 2016; Mao et al., 2016; Kodali et al., 2018; Arjovsky & Bottou; Arjovsky et al., 2017; Gulrajani et al., 2017; Zhou et al., 2019) . For example, Mao et al. (2016) proposed Least Squares GAN (LSGAN) that uses the least squares loss to penalize the outlier point more harshly. Arjovsky & Bottou discussed the role of the Jensen-Shannon divergence in training GANs and proposed WGAN (Arjovsky et al., 2017) and WGAN-GP (Gulrajani et al., 2017) that use the more informative Wasserstein distance instead. Other approaches enforce proper constraints on latent space representations to better capture the data distribution (Makhzani et al., 2015; Larsen et al., 2015; Che et al., 2016; Tran et al., 2018) . A representative work is the Adversarial Autoencoders (AAE) (Makhzani et al., 2015) which uses the discriminator to distinguish the latent representations generated by encoder from Gaussian noise. Larsen et al. (2015) employed image representation in the discriminator as the reconstruction basis of a VAE. Their method turns pixel-wise loss to feature-wise, which can capture the real distribution more simply when some form of invariance is induced. Different from VAE-GAN, Che et al. (2016) regarded the encoder as an auxiliary network, which can promote GANs to pay much attention on missing mode and derive an objective function similar to VAE-GAN. A more detailed discussion of related works can be found in Appendix C. In this paper we propose a novel technique for GANs that improve both the training stability and the quality of generated images. The core of our approach is a regularization term based on the latent representations of real images provided by an encoder network. More specifically, we apply auxiliary intervention operations that preserve the standard Gaussian (e.g., the noise distribution) to these latent representations. The perturbed latent representations are then fed into the generator to produce intervened samples. We then introduce a classifier network to identify the right intervention operations that would have led to these intervened samples. The resulting negative cross-entropy loss

annex

We show that the intervention loss is equivalent with the JS-divergence among multiple intervened distributions. Furthermore, these intervened distributions interpolate between the original generative distribution of GAN and the data distribution, providing useful information for the generator that is previously unavailable in common GAN models (see a thorough analysis on a toy example in Example 1). We show empirically that our model can be trained efficiently by utilizing the parameter sharing strategy between the discriminator and the classifier. The models trained on the MNIST, CIFAR-10, LSUN and STL-10 datasets successfully generate diverse, visually appealing objects, outperforming state-of-the-art baseline methods such as WGAN-GP and MRGAN in terms of the Frèchet Inception Distance (FID) (proposed in (Heusel et al., 2017) ). We also perform a series of experiments on the stacked MNIST dataset and the results show that our proposed method can also effectively alleviate the mode collapse problem. Moreover, an ablation study is conducted, which validates the effectiveness of the proposed intervention loss.In summary, our work offers three major contributions as follows. (i) We propose a novel method that can improve GAN's training as well as generating performance. (ii) We theoretically analyze our proposed model and give insights on how it makes the gradient of generator more informative and thus stabilizes GAN's training. (iii) We evaluate the performance of our method on both standard real-world datasets and the stacked MNIST dataset by carefully designed expriments, showing that our approach is able to stabilize GAN's training and improve the quality and diversity of generated samples as well.

2. PRELIMINARIES

Generative adversarial nets The basic idea of GANs is to utilize a discriminator to continuously push a generator to map Gaussian noise to samples drawn according to an implicit data distribution. The objective function of the vanilla GAN takes the following form:where p z is a prior distribution (e.g., the standard Gaussian). It can be easily seen that when the discriminator reaches its optimum, that is, D * (x) = p data (x) x) , the objective is equivalent to the Jensen-Shannon (JS) divergence between the generated distribution p G and data distribution p data :Minimizing this JS divergence guarantees that the generated distribution converges to the data distribution given adequate model capacity.

Multi-distribution JS divergence

The JS divergence between two distributions p 1 and p 2 can be rewritten aswhere H(p) denotes the entropy of distribution p. We observe that the JS-divergence can be interpreted as the entropy of the mean of the two distribution minus the mean of two distributions' entropy. So it is immediate to generalize the JS-divergence to the setting of multiple distributions. In particular, we define the JS-divergence of p 1 , p 2 , . . . , p n with respect to weights π 1 , π 2 , . . . , π n ( π i = 1 andThe two-distribution case described above is actually a special case of the 'multi-JS divergence', where π 1 = π 2 = 1 2 . When π i > 0 ∀i, it can be found immediately by Jensen's inequality that JS π1,...,πn (p 1 , p 2 , . . . , p n ) = 0 if and only if p 1 = p 2 = • • • = p n .

