QRGAN: QUANTILE REGRESSION GENERATIVE AD-VERSARIAL NETWORKS

Abstract

Learning high-dimensional probability distributions by competitively training generative and discriminative neural networks is a prominent approach of Generative Adversarial Networks (GANs) among generative models to model complex real-world data. Nevertheless, training GANs likely suffer from non-convergence problem, mode collapse and gradient explosion or vanishing. Least Squares GAN (LSGANs) and Wasserstein GANs (WGAN) are of representative variants of GANs in literature that diminish the inherent problems of GANs by proposing the modification methodology of loss functions. However, LSGANs often fall into local minima and cause mode collapse. While WGANs unexpectedly encounter with inefficient computation and slow training due to its constraints in Wasserstein distance approximation. In this paper, we propose Quantile Regression GAN (QR-GAN) in which quantile regression is adopted to minimize 1-Wasserstein distance between real and generated data distribution as a novel approach in modification of loss functions for improvement of GANs. To study the culprits of mode collapse problem, the output space of discriminator and gradients of fake samples are analyzed to see if the discriminator guides the generator well. And we found that the discriminator should not be bounded to specific numbers. Our proposed QRGAN exposes high robustness against mode collapse problem. Furthermore, QRGAN obtains an apparent improvement in the evaluation and comparison of Frechet Inception Distance (FID) for generation performance assessment compared to existing variants of GANs.

1. INTRODUCTION

Deep learning-based data generation techniques have proved their successes in many real-world applications. Thanks to the rising of generative models, the generation of audio, images and videos, either unconditionally or conditionally, has achieved remarkable advancements in recent years. Text or structured data can be generated easily as well in many recent studies. Data generation techniques bring about efficency and creativity in human activities at every conner of the world. Among the most influent and successful methods for data generation experiments, Variational Autoencoders (VAE) Kingma & Welling (2014) and Generative Adversarial Networks (GANs) Goodfellow et al. (2014) are the fundamental representatives of generative models. Variational Autoencoders (VAE): VAEs regularize the encoder output to be a known distribution. This regularization is applied to each sample. For latent variable z and input x, p(z|x), not p(z), is pushed to the prior distribution. With the additional reconstruction loss, the two objectives may conflict each other. Usually, mean square error (MSE) loss is used for the reconstruction loss. Because if the global minimum of MSE loss is at the expected value of the distribution, the decoders generate blurry outputs. PixelVAE Gulrajani et al. (2016) Generative Adversarial Networks (GANs): GANs regularize the entire decoder input distribution to be a known distribution. In other words, GANs regularize p(z), not p(z|x). Therefore, generators can generate sharp outputs without the conflicting objectives in VAEs. A number of GAN variants has been presented in literature to improve the data generation including Least-square 2016). LSGANs and WGANs improved the objective function for better generation quality. LatentGAN generates latent variables for the given autoencoder using GAN methods. AAEs replace the KL regularization from VAEs by a discriminator that distinguishes the encoder output distribution (generated data) and known distribution (real data). Although GANs can obtain better generation output, they are difficult to train because of the framework architecture in which two networks compete. Mode collapse is a common failure in GAN frameworks that generator outputs very similar samples only. Mode collapse is caused by unstable training and improper loss function. 2017) applies Lipschitz condition to approximate 1-Wasserstein distance. In the original WGAN, they implement the constraint by clipping weights of critic network into a range specified by hyperparameters. WGAN-GP is introduced to improve training of WGAN. WGAN-GP give gradient penalty to push weight gradient norm of critic to 1. Thus, the generator can easily escape from local minima. However, it takes more time for computation of gradients for random inputs.

Variants of GANs

Reinforcement Learning (RL): is to learn the optimal policy in a given environment. The agent read an observation (state) from the environment, calculate the optimal action, and take the action. The environment returns the next state with the reward by the action. By trial and error, the agent learns the optimal action for each state and gradually reach to the optimal policy. For given time t, state s, action a, decay factor γ, and reward at t is R t , Q -value is defined as Q(s, a) = E[ ∞ t=0 γ t R t |s, a] , which in the other words is the expected value of cumulative sum of decaying rewards. Previously, Q-Learning used a table to store every Q-value for each state. This method cannot be used for complex environments which have a great deal of states. Quantile Regression Deep Q Network (QR-DQN:) To learn a generalized state-value function, Deep Q Network (DQN) successfully applies deep neural network to predict Q-value Mnih et al. (2015) . With some tricks for stability, DQN succeeds to beat human level performance in over 29 of 49 Atari 2600 games. Meanwhile, there are attempts to learn a distribution of Q-value: C51 Bellemare et al. ( 2017), Quantile Regression-Deep Q Network (QR-DQN) Dabney et al. (2017) . C51 introduces the importance of distributional reinforcement learning (distributional RL) which learns Q-value distribution instead of the expected value. Distributional RL improves training stability and performance. However, the KL divergence used in C51 is not mathematically guaranteed to converge. In economics, the mean predicting quantile values is not only important. The work Koenker (2005) proposes a method called quantile regression to predict quantile values. QR-DQN demonstrates a solution that guarantees mathematical convergence by quantile regression that minimizes 1-Wasserstein distance without bias. Our contributions: Adropting the above-mentioned quantile regresstion approach in this work, we propose QRGAN, a GAN-based generative model adopting quantile regresstion, to minimize 1-Wasserstein distance between real samples and generated samples using quantile regression. We train the discriminator to predict quantile values of realisticity using quantile regression. Then, we train our generator to minimize the difference of quantile values of real and fake samples to minimize 1-Wasserstein distance between the two. We analyze and compare the discriminator output space for each method to find out if discriminator can guide the generator well. Discriminators whose target is specified tend not to make ambiguous outputs. Those discriminators create sharp minima, so generators may not learn from them. For example, when a generator generates fake samples near a real sample A, a discriminator lowers the probability of samples near A (which can be seen as "realisticity"). Then, generator should generate fake samples near another real sample B. However,



GANs (LSGANs) Mao et al. (2019), Wasserstein GANs (WGANs) Arjovsky et al. (2017), Latent-GAN Prykhodko et al. (2019), Adversarial Autoencoders (AAEs) Makhzani et al. (

Recent work targets to find the best method to train GANs with better quality and less mode collapse. The work Wiatrak et al. (2020) shows a huge number of existing studies that put an effort to improve GANs. The work also explains GANs' recent successes and problems. Accordingly, several GAN variants are introduced to stabilize the GAN training such as DCGAN Radford et al. (2016), LSGAN, WGAN-GP Gulrajani et al. (2017), and Fisher GAN Mroueh & Sercu (2017). In Least Square Generative Adversarial Networks Mao et al. (2017); Mao et al. (2019) (LSGANs), the authors propose to use mean square error (MSE) which does not saturate. Also, they found that training generator to make samples near the decision boundary instead of trying to overwhelm the discriminator results better models. Alternatively, Wasserstein GANs Arjovsky et al. (

fixes VAE's blurry output by replacing MSE loss by PixelCNN van den Oord et al. (2016) decoder.

