IMPROVING SEQUENCE GENERATIVE ADVERSARIAL NETWORKS WITH FEATURE STATISTICS ALIGNMENT

Abstract

Generative Adversarial Networks (GAN) are facing great challenges in synthesizing sequences of discrete elements, such as mode dropping and unstable training. The binary classifier in the discriminator may limit the capacity of learning signals and thus hinder the advance of adversarial training. To address such issues, apart from the binary classification feedback, we harness a Feature Statistics Alignment (FSA) paradigm to deliver fine-grained signals in the latent high-dimensional representation space. Specifically, FSA forces the mean statistics of the fake data distribution to approach that of real data as close as possible in a finite-dimensional feature space. Experiments on synthetic and real benchmark datasets show the superior performance in quantitative evaluation and demonstrate the effectiveness of our approach to discrete sequence generation. To the best of our knowledge, the proposed architecture is the first that employs feature alignment regularization in the Gumbel-Softmax based GAN framework for sequence generation.

1. INTRODUCTION

Unsupervised sequence generation is the cornerstone for a plethora of applications, such as machine translation (Wu et al., 2016) , image captioning (Anderson et al., 2018) , and dialogue generation (Li et al., 2017) . The most common approach to autoregressive sequence modeling is maximizing the likelihood of each token in the sequence given the previous partial observation. However, using maximum likelihood estimation (MLE) for sequence modeling is inherently prone to the exposure bias problem (Bengio et al., 2015) , which results from the discrepancy between the training and inference stage: the generator predicts the next token conditioned on its previously generated ones during inference but conditioned on its prefix ground-truth tokens during training, yielding accumulative mismatch along with the increment of generated sequence length. Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) can serve as an alternative to models trained by MLE, which have achieved promising results in generating sequences of discrete elements, in particular, language sequences (Kusner & Hernández-Lobato, 2016; Yu et al., 2017; Lin et al., 2017; Guo et al., 2018; Fedus et al., 2018; Nie et al., 2019; de Masson d'Autume et al., 2019; Zhou et al., 2020; Scialom et al., 2020) . GANs consist of two competing networks: a discriminator that is trained to distinguish the generated samples from real data, and a generator that aims to generate high-quality samples to fool the discriminator. Although having succeeded in avoiding exposure bias issues, GANs still suffer from some intrinsic problems, such as mode dropping, reward sparsity, and training instability. To enrich the informativeness of the discriminator's training signal, several approaches have been proposed by measuring the latent features, such as feature distribution matching (Zhang et al., 2017; Chen et al., 2018) and comparative discriminators (Lin et al., 2017; Zhou et al., 2020) Another approach is to compare the finite latent features with comparative discriminators like ranker and relativistic discriminator. Lin et al. (2017) maintained that the binary classification in the discriminator network limits the learning capacity of tasks because the diversity and richness are circumscribed by the degenerated distribution. RankGAN (Lin et al., 2017) replaced the binary clas-



. Zhang et al. (2017) and Chen et al. (2018) leveraged feature matching mechanism by minimizing the kernel-based moment-matching metric, such as Maximum Mean Discrepancy and Earth-Mover's Distance, between encoded features. However, merely adopting feature matching in lieu of the original learning signal may lack some guiding feedback at the initial stage of training.

