TRAINING GANS WITH STRONGER AUGMENTATIONS VIA CONTRASTIVE DISCRIMINATOR

Abstract

Recent works in Generative Adversarial Networks (GANs) are actively revisiting various data augmentation techniques as an effective way to prevent discriminator overfitting. It is still unclear, however, that which augmentations could actually improve GANs, and in particular, how to apply a wider range of augmentations in training. In this paper, we propose a novel way to address these questions by incorporating a recent contrastive representation learning scheme into the GAN discriminator, coined ContraD. This "fusion" enables the discriminators to work with much stronger augmentations without increasing their training instability, thereby preventing the discriminator overfitting issue in GANs more effectively. Even better, we observe that the contrastive learning itself also benefits from our GAN training, i.e., by maintaining discriminative features between real and fake samples, suggesting a strong coherence between the two worlds: good contrastive representations are also good for GAN discriminators, and vice versa. Our experimental results show that GANs with ContraD consistently improve FID and IS compared to other recent techniques incorporating data augmentations, still maintaining highly discriminative features in the discriminator in terms of the linear evaluation. Finally, as a byproduct, we also show that our GANs trained in an unsupervised manner (without labels) can induce many conditional generative models via a simple latent sampling, leveraging the learned features of ContraD.

1. INTRODUCTION

Generative adversarial networks (GANs) (Goodfellow et al., 2014) have become one of the most prominent approaches for generative modeling with a wide range of applications (Ho & Ermon, 2016; Zhu et al., 2017; Karras et al., 2019; Rott Shaham et al., 2019) . In general, a GAN is defined by a minimax game between two neural networks: a generator network that maps a random vector into the data domain, and a discriminator network that classifies whether a given sample is real (from the training dataset) or fake (from the generator). Provided that both generator and discriminator attain their optima at each minimax objective alternatively, it is theoretically guaranteed that the generator implicitly converges to model the data generating distribution (Goodfellow et al., 2014) . Due to the non-convex/stationary nature of the minimax game, however, training GANs in practice is often very unstable with an extreme sensitivity to many hyperparameters (Salimans et al., 2016; Lucic et al., 2018; Kurach et al., 2019) . Stabilizing the GAN dynamics has been extensively studied in the literature (Arjovsky et al., 2017; Gulrajani et al., 2017; Miyato et al., 2018; Wei et al., 2018; Jolicoeur-Martineau, 2019; Chen et al., 2019; Schonfeld et al., 2020) , and the idea of incorporating data augmentation techniques has recently gained a particular attention on this line of research: more specifically, Zhang et al. ( 2020) have shown that consistency regularization between discriminator outputs of clean and augmented samples could greatly stabilize GAN training, and Zhao et al. (2020c) further improved this idea. The question of which augmentations are good for GANs has been investigated very recently in several works (Zhao et al., 2020d; Tran et al., 2021; Karras et al., 2020a; Zhao et al., 2020a) , while they unanimously conclude only a limited range of augmentations (e.g., flipping and spatial translation) were actually helpful for the current form of training GANs. 1

