TAMING GANS WITH LOOKAHEAD-MINMAX

Abstract

Generative Adversarial Networks are notoriously challenging to train. The underlying minmax optimization is highly susceptible to the variance of the stochastic gradient and the rotational component of the associated game vector field. To tackle these challenges, we propose the Lookahead algorithm for minmax optimization, originally developed for single objective minimization only. The backtracking step of our Lookahead-minmax naturally handles the rotational game dynamics, a property which was identified to be key for enabling gradient ascent descent methods to converge on challenging examples often analyzed in the literature. Moreover, it implicitly handles high variance without using large mini-batches, known to be essential for reaching state of the art performance. Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient, in terms of performance and improved stability, for negligible memory and computational cost. Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels, bringing state-of-the-art GAN training within reach of common computational resources.

1. INTRODUCTION

Gradient-based methods are the workhorse of machine learning. These methods optimize the parameters of a model with respect to a single objective f : X → R . However, an increasing interest for multi-objective optimization arises in various domains-such as mathematics, economics, multiagent reinforcement learning (Omidshafiei et al., 2017) -where several agents aim at optimizing their own cost function f i : X 1 × • • • × X N → R simultaneously. A particularly successful class of algorithms of this kind are the Generative Adversarial Networks (Goodfellow et al., 2014, (GANs) ), which consist of two players referred to as a generator and a discriminator. GANs were originally formulated as minmax optimization f : X × Y → R (Von Neumann & Morgenstern, 1944) , where the generator and the discriminator aim at minimizing and maximizing the same value function, see § 2. A natural generalization of gradient descent for minmax problems is the gradient descent ascent algorithm (GDA), which alternates between a gradient descent step for the min-player and a gradient ascent step for the max-player. This minmax training aims at finding a Nash equilibrium where no player has the incentive of changing its parameters. Despite the impressive quality of the samples generated by the GANs-relative to classical maximum likelihood-based generative models-these models remain notoriously difficult to train. In particular, poor performance (sometimes manifesting as "mode collapse"), brittle dependency on hyperparameters, or divergence are often reported. Consequently, obtaining state-of-the-art performance was shown to require large computational resources (Brock et al., 2019) , making well-performing models unavailable for common computational budgets.

availability

https://github.com/Chavdarova/

