TWO STEPS AT A TIME -TAKING GAN TRAINING IN STRIDE WITH TSENG'S METHOD

Abstract

Motivated by the training of Generative Adversarial Networks (GANs), we study methods for solving minimax problems with additional nonsmooth regularizers. We do so by employing monotone operator theory, in particular the Forward-Backward-Forward (FBF) method, which avoids the known issue of limit cycling by correcting each update by a second gradient evaluation. Furthermore, we propose a seemingly new scheme which recycles old gradients to mitigate the additional computational cost. In doing so we rediscover a known method, related to Optimistic Gradient Descent Ascent (OGDA). For both schemes we prove novel convergence rates for convex-concave minimax problems via a unifying approach. The derived error bounds are in terms of the gap function for the ergodic iterates. For the deterministic and the stochastic problem we show a convergence rate of O( 1 /k) and O( 1 / √ k), respectively. We complement our theoretical results with empirical improvements in the training of Wasserstein GANs on the CIFAR10 dataset.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have proven to be a powerful class of generative models, producing for example unseen realistic images. Two neural networks, called generator and discriminator, compete against each other in a game. In the special case of a zero sum game this task can be formulated as a minimax (aka saddle point) problem. Conventionally, GANs are trained using variants of (stochastic) Gradient Descent Ascent (GDA) which are known to exhibit oscillatory behavior and thus fail to converge even for simple bilinear saddle point problems, see Goodfellow (2016) . We therefore propose the use of methods with provable convergence guarantees for (stochastic) convex-concave minimax problems, even though GANs are well known to not warrant these properties. Along similar considerations an adaptation of the Extragradient method (EG) (Korpelevich, 1976) . We however investigate the Forward-Backward-Forward (FBF) method (Tseng, 1991) from monotone operator theory, which uses two gradient evaluations per update, similar to EG, in order to circumvent the aforementioned issues. Instead of trying to improve GAN performance via new architectures, loss functions, etc., we contribute to the theoretical foundation of their training from the point of view of optimization. Contribution. Establishing the connection between GAN training and monotone inclusions motivates to use the FBF method, originally designed to solve this type of problems. This approach allows to naturally extend the constrained setting to a regularized one making use of the proximal operator. We also propose a variant of FBF reusing previous gradients to reduce the computational cost per iteration, which turns out to be a known method, related to OGDA. By developing a unifying scheme that captures FBF and a generalization of OGDA, we reveal a hitherto unknown connection. Using this approach we prove novel non asymptotic convergence statements in terms of the minimax gap for both methods in the context of saddle point problems. In the deterministic and stochastic setting we obtain rates of O( 1 /k) and O( 1 / √ k), respectively. Concluding, we highlight the relevance of



for the training of GANs was suggested in Gidel et al. (2019), whereas Daskalakis et al. (2018); Daskalakis & Panageas (2018); Liang & Stokes (2019) studied Optimistic Gradient Descent Ascent (OGDA) based on optimistic mirror descent (Rakhlin & Sridharan, 2013a;b)

