Adaptive Extra-Gradient Methods for Min-Max Optimization and Games

Abstract

We present a new family of min-max optimization algorithms that automatically exploit the geometry of the gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones. Thanks to this adaptation mechanism, the proposed method automatically detects whether the problem is smooth or not, without requiring any prior tuning by the optimizer. As a result, the algorithm simultaneously achieves order-optimal convergence rates, i.e., it converges to an ε-optimal solution within O(1/ε) iterations in smooth problems, and within O(1/ε 2 ) iterations in non-smooth ones.Importantly, these guarantees do not require any of the standard boundedness or Lipschitz continuity conditions that are typically assumed in the literature; in particular, they apply even to problems with singularities (such as resource allocation problems and the like). This adaptation is achieved through the use of a geometric apparatus based on Finsler metrics and a suitably chosen mirror-prox template that allows us to derive sharp convergence rates for the methods at hand.

1. Introduction

The surge of recent breakthroughs in generative adversarial networks (GANs) [20] , robust reinforcement learning [41] , and other adversarial learning models [27] has sparked renewed interest in the theory of min-max optimization problems and games. In this broad setting, it has become empirically clear that, ceteris paribus, the simultaneous training of two (or more) antagonistic models faces drastically new challenges relative to the training of a single one. Perhaps the most prominent of these challenges is the appearance of cycles and recurrent (or even chaotic) behavior in min-max games. This has been studied extensively in the context of learning in bilinear games, in both continuous [16, 31, 40] and discrete time [12, 18, 19, 32] , and the methods proposed to overcome recurrence typically focus on mitigating the rotational component of min-max games. The method with the richest history in this context is the extra-gradient (EG) algorithm of Korpelevich [25] and its variants. The EG algorithm exploits the Lipschitz smoothness of the problem and, if coupled with a Polyak-Ruppert averaging scheme, it achieves an O(1/T ) rate of convergence in smooth, convex-concave min-max problems [35] . This rate is known to be tight [34, 39] but, in order to achieve it, the original method requires the problem's Lipschitz constant to be known in advance. If the problem is not Lipschitz smooth (or the algorithm is run with a vanishing step-size schedule), the method's rate of convergence drops to O(1/ √ T ). Our contributions. Our aim in this paper is to provide an algorithm that automatically adapts to smooth / non-smooth min-max problems and games, and achieves order-optimal rates in both classes without requiring any prior tuning by the optimizer. In this regard, we propose a flexible algorithmic scheme, which we call AdaProx, and which exploits gradient data observed at earlier iterations to perform more informative extra-gradient steps in later ones. Thanks to this mechanism, and to the best of our knowledge, AdaProx is the first algorithm that simultaneously achieves the following:

