ASSISTING THE ADVERSARY TO IMPROVE GAN TRAINING

Abstract

Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, AdvAs is a theoretically motivated penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores.

1. INTRODUCTION

The generative adversarial network (GAN) framework (Goodfellow et al., 2014) trains a neural network known as a generator which maps from a random vector to an output such as an image. Key to training is another neural network, the adversary (sometimes called a discriminator or critic), which is trained to distinguish between "true" and generated data. This is done by maximizing one of the many different objectives proposed in the literature; see for instance Goodfellow et al. (2014) ; Arjovsky et al. (2017) ; Nowozin et al. (2016) . The generator directly competes against the adversary: it is trained to minimize the same objective, which it does by making the generated data more similar to the true data. GANs are efficient to sample from, requiring a single pass through a deep network, and highly flexible, as they do not require an explicit likelihood. They are especially suited to producing photo-realistic images (Zhou et al., 2019) compared to competing methods like normalizing flows, which impose strict requirements on the neural network architecture (Kobyzev et al., 2020; Rezende & Mohamed, 2015) and VAEs (Kingma & Welling, 2014; Razavi et al., 2019; Vahdat & Kautz, 2020) . Counterbalancing their appealing properties, GANs can have unstable training dynamics (Kurach et al., 2019; Goodfellow, 2017; Kodali et al., 2017; Mescheder et al., 2018) . Substantial research effort has been directed towards improving the training of GANs. These endeavors can generally be divided into two camps, albeit with significant overlap. The first develops better learning objectives for the generator/adversary to minimize/maximize. These are designed to have properties which improve training (Arjovsky et al., 2017; Li et al., 2017; Nowozin et al., 2016) . The other camp develops techniques to regularize the adversary and improve its training dynamics (Kodali et al., 2017; Roth et al., 2017; Miyato et al., 2018) . The adversary can then provide a better learning signal for the generator. Despite these contributions, stabilizing the training of GANs remains unsolved and continues to be an active research area. An overlooked approach is to train the generator in a way that accounts for the adversary not being trained to convergence. One such approach was introduced by Mescheder et al. ( 2017) and later built on by Nagarajan & Kolter (2017) . The proposed method is a regularization term based on the norm of the gradients used to train the adversary. This is motivated as a means to improve the convergence properties of the minimax game. The purpose of this paper is to provide a new perspective as to why this regularizer is appropriate. Our perspective differs in that we view it as promoting updates that lead to a solution that satisfies a sufficient condition for the adversary to be optimal. To be precise, it encourages the generator to move towards points where the adversary's current parameters are optimal. Informally, this regularizer "assists" the adversary, and for this reason we refer to this regularization method as the Adversary's Assistant (AdvAs). We additionally propose a version of AdvAs which is hyperparameter-free. Furthermore, we release a library which makes it simple to integrate into existing code. We demonstrate its application to a standard architecture with the WGAN-GP objective (Arjovsky et al., 2017; Gulrajani et al., 2017) 

2. BACKGROUND

A generator is a neural network g : Z → X ⊆ R dx which maps from a random vector z ∈ Z to an output x ∈ X (e.g., an image). Due to the distribution over z, the function g induces a distribution over its output x = g(z). If g is invertible and differentiable, the probability density function (PDF) over x from this "change of variables" could be computed. This is not necessary for training GANs, meaning that no such restrictions need to be placed on the neural network g. We denote this distribution p θ (x) where θ ∈ Θ ⊆ R dg denotes the generator's parameters. The GAN is trained on a dataset x 1 , . . . , x N , where each x i is in X . We assume that this is sampled i.i.d. from a data-generating distribution p true . Then the aim of training is to learn θ so that p θ is as close as possible to p true . Section 2.1 will make precise what is meant by "close." The adversary a φ : X → A has parameters φ ∈ Φ ⊆ R da which are typically trained alternately with the generator. It receives as input either the data or the generator's outputs. The set that it maps to, A, is dependent on the GAN type. For example, Goodfellow et al. ( 2014) define an adversary which maps from x ∈ X to the probability that x is a "real" data point from the dataset, as opposed to a "fake" from the generator. They therefore choose A to be [0, 1] and train the adversary by maximizing the associated log-likelihood objective, h(p θ , a φ ) = E x∼ptrue [log a φ (x)] + E x∼p θ [log(1 -a φ (x))] . (1) Using the intuition that the generator should generate samples that seem real and therefore "fool" the adversary, the generator is trained to minimize h(p θ , a φ ). Since we find θ to minimize this objective while fitting φ to maximize it, training a GAN is equivalent to solving the minimax game, min θ max φ h(p θ , a φ ). Eq. ( 1) gives the original form for h(p θ , a φ ) used by Goodfellow et al. (2014) but this form varies between different GANs, as we will discuss in Section 2.1. The minimization and maximization in Eq. ( 2) are performed with gradient descent in practice. To be precise, we define L gen (θ, φ) = h(p θ , a φ ) and L adv (θ, φ) = -h(p θ , a φ ). These are treated as losses for the generator and adversary respectively, and both are minimized. In other words, we turn the maximization of h(p θ , a φ ) w.r.t. φ into a minimization of L adv (θ, φ). Then on each iteration, θ and φ are updated one after the other using gradient descent steps along their respective gradients: ∇ θ L gen (θ, φ) = ∇ θ h(p θ , a φ ), (3) ∇ φ L adv (θ, φ) = -∇ φ h(p θ , a φ ). (4)

2.1. GANS MINIMIZE DIVERGENCES

A common theme in the GAN literature is analysis based on what we call the optimal adversary assumption. This is the assumption that, before each generator update, we have found the adversary a φ which maximizes h(p θ , a φ ) given the current value of θ. To be precise, we define a class of permissible adversary functions F. This is often simply the space of all functions mapping X → A



; the state-of-the-art StyleGAN2 architecture and objective introduced by Karras et al. (2020); and the AutoGAN architecture and objective introduced by Gong et al. (2019). We test these on theMNIST (Lecun et al., 1998),CelebA (Liu et al., 2015), CIFAR10(Krizhevsky et al., 2009)  datasets respectively. We show that AdvAs improves training on all datasets, as measured by the Fréchet Inception Distance (FID)(Heusel et al., 2017), and the inception score(Salimans et al., 2016)  where applicable.

