A TEXT GAN FOR LANGUAGE GENERATION WITH NON-AUTOREGRESSIVE GENERATOR Anonymous authors Paper under double-blind review

Abstract

Despite the great success of Generative Adversarial Networks (GANs) in generating high-quality images, GANs for text generation still face two major challenges: first, most text GANs are unstable in training mainly due to ineffective optimization of the generator, and they heavily rely on maximum likelihood pretraining; second, most text GANs adopt autoregressive generators without latent variables, which largely limits the ability to learn latent representations for natural language text. In this paper, we propose a novel text GAN, named NAGAN, which incorporates a non-autoregressive generator with latent variables. The non-autoregressive generator can be effectively trained with gradient-based methods and free of pretraining. The latent variables facilitate representation learning for text generation applications. Experiments show that our model is competitive compared with existing text GANs in unconditional text generation, and it outperforms existing methods on sentence manipulation in latent space and unsupervised text decipherment.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have achieved great success in generating continuous data, such as images with high resolution and fidelity (Brock et al., 2019) . Unsurprisingly, GANs are also widely studied for text generation, but the adaptation is by no means trivial. The mainstream text GANs (Yu et al., 2017b; Guo et al., 2018) apply a different framework tailored for discrete sequence data, but there are remaining several unsolved research problems. One problem lies in ineffective optimization. Most text GANs resort to gradient-free RL (reinforcement learning) algorithms, mainly due to the nature of discrete text data. However, since RL methods abandon the gradient information, they suffer from unstable training processes (Ke et al., 2019) . Though some works (Chen et al., 2018) explored the feasibility of gradient-based methods, the optimization is still ineffective. As a result, most text GANs heavily rely on MLE pretraining, and some even report worse performance after GAN training (Caccia et al., 2020) . Another problem can be attributed to the generative model. Most text GANs adopt an autoregressive generator, which defines an explicit likelihood without any latent variable. Latent variables have empowered image GANs with various applications, such as unsupervised style transfer (Taigman et al., 2017) and image editing (Brock et al., 2017) . However, most text GANs merely generate sentences from the learned distribution with autoregressive decoding, thereby hardly applicable to text style transfer or controlled text generation which may require latent representations. We therefore challenge the conventional design of existing text GANs and argue that incorporating a non-autoregressive generator can benefit from both efficient gradient-based optimization methods and the use of latent variables. Our proposed model, named Non-Autoregressive GAN (NAGAN), consists of a non-autoregressive generator and a regularized discriminator. The non-autoregressive generator naturally translates latent variables to the tokens in parallel, and the gradient-based optimization on our feed-forward structure is significantly more effective than the same method on an autoregressive generator. The discriminator is regularized by Max Gradient Penalty (Zhou et al., 2019) , which is another key to effective optimization. Our contributions are summarized as follows: • We propose non-autoregressive GAN, which characterizes itself by employing a non-autoregressive generator and latent variables, and efficiently training from scratch with gradient-based methods. To our knowledge, NAGAN is the first text GAN which learns latent representations from scratch. 

2. MOTIVATION OF NON-AUTOREGRESIVE GENERATION IN TEXT GANS

Non-autoregressive (NAR) generators have been widely used in image GANs (Goodfellow et al., 2014) and achieve great success in generating high-quality images (Brock et al., 2019) . However, for generating texts, text GANs apply a different framework where autoregressive (AR) generators are used. In this section, we will discuss the differences between image and text GANs, and show why we need non-autoregressive generators in text generation.

2.1. GENERATIVE MODELS

Image and text GANs differ substantially in generative models. The image GAN is an implicit generative model, where a sample x is generated in two steps: z ∼ p(z); x = G(z) where z is a latent variable, and p(z) is the prior distribution. G is a deterministic function from the latent space to the data space, usually parameterized by a NAR generator, where each pixel of x is generated simultaneously. A text GAN is usually an explicit generative model. A discrete sequence sample x = [x 1 , x 2 , • • • , x L ] is sampled by a stochastic process from the distribution P G (x), where P G (x) = L i=1 P G (x i |x 1 , • • • , x i-1 ). Eq (2) shows G is an autoregressive model, where tokens are sampled sequentially conditioned on previously generated prefixes. Most text GANs do not adopt latent variables, partially because Eq (2) is convenient for the MLE pretraining. There are also some explorations (Chen et al., 2018) on equipping text GANs with latent variables, but elaborate VAE-like pretraining is required. However, directly injecting latent variables into the explicit generative model of text GANs can be problematic. For instance, when defining P G (x) = E z∼p(z) L i=1 P G (x i |x <i , z), we may face two problems: (1) Solely optimizing P G does not make the model learn the latent representations. Without further constraintsfoot_1 , the model can degenerate and simply ignores z, which becomes almost a vanilla language model, even if the generation distribution P G perfectly fits the real distribution. (2) The representation in AR text GANs can be hardly applied to downstream tasks. In image GANs, x is fully determined by the sampled z, so we can control the generated images by manipulating the latent variable. Moreover, the mapping from the latent space to the data space is continuous and differentiable, thereby facilitating applications such as image editing (Brock et al., 2017) . In text



FMGAN uses explicit generative models in pretraining and implicit ones in GAN training. For example, the reconstruction loss in VAEs alleviates degeneration. More discussed in Appendix A.1.1.



Differences between various GANs. Autoregressive: using autoregressive generators. Explicit: using explicit generative models. Latent: equipped with latent variables. Pretrain: pretraining required or not. The mainstream TextGANs include Yu et al. (2017b); Guo et al. (2018); Shi et al. (2018); Che et al. (2017); Lin et al. (2017); Fedus et al. (2018). We conduct experiments on synthetic and real data, and show that NAGAN without MLE pretraining is competitive in unconditional text generation compared with the pretrained text GANs. • By taking advantage of the latent variables and the non-autoregressive generator, our model can be applied to sentence manipulation in latent space and unsupervised text decipherment, where NAGAN significantly outperforms previous methods.

