LINEAR LAST-ITERATE CONVERGENCE IN CON-STRAINED SADDLE-POINT OPTIMIZATION

Abstract

Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU) for saddle-point optimization have received growing attention due to their favorable last-iterate convergence. However, their behaviors for simple bilinear games over the probability simplex are still not fully understood -previous analysis lacks explicit convergence rates, only applies to an exponentially small learning rate, or requires additional assumptions such as the uniqueness of the optimal solution. In this work, we significantly expand the understanding of last-iterate convergence for OGDA and OMWU in the constrained setting. Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption. We then significantly extend the results to more general objectives and feasible sets for the projected OGDA algorithm, by introducing a sufficient condition under which OGDA exhibits concrete last-iterate convergence rates with a constant learning rate whose value only depends on the smoothness of the objective function. We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption. Our condition also holds for strongly-convex-stronglyconcave functions, recovering the result of (Hsieh et al., 2019). Finally, we provide experimental results to further support our theory.

1. INTRODUCTION

Saddle-point optimization in the form of min x max y f (x, y) dates back to (Neumann, 1928) , where the celebrated minimax theorem was discovered. Due to advances of Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) (which itself is a saddle-point problem), the question of how to find a good approximation of the saddle point, especially via an efficient iterative algorithm, has recently gained significant research interest. Simple algorithms such as Gradient Descent Ascent (GDA) and Multiplicative Weights Update (MWU) are known to cycle and fail to converge even in simple bilinear cases (see e.g., (Bailey & Piliouras, 2018) and (Cheung & Piliouras, 2019)). Many recent works consider resolving this issue via simple modifications of standard algorithms, usually in the form of some extra gradient descent/ascent steps. This includes Extra-Gradient methods (EG) (Liang & Stokes, 2019; Mokhtari et al., 2020b) , Optimistic Gradient Descent Ascent (OGDA) (Daskalakis et al., 2018; Gidel et al., 2019; Mertikopoulos et al., 2019) , Optimistic Multiplicative Weights Update (OMWU) (Daskalakis & Panageas, 2019b; Lei et al., 2021) , and others. In particular, OGDA and OMWU are suitable for the repeated game setting where two players repeatedly propose x t and y t and receive only ∇ x f (x t , y t ) and ∇ y f (x t , y t ) respectively as feedback, with the goal of converging to a saddle point or equivalently a Nash equilibrium using game theory terminology. One notable benefit of OGDA and OMWU is that they are also no-regret algorithms with important applications in online learning, especially when playing against adversarial opponents (Chiang et al., 2012; Rakhlin & Sridharan, 2013) . Despite considerable progress, especially those for the unconstrained setting, the behavior of these algorithms for the constrained setting, where x and y are restricted to closed convex sets X and Y respectively, is still not fully understood. This is even true when f is a bilinear function and X and Y are simplex, known as the classic two-player zero-sum games in normal form, or simply matrix games. Indeed, existing convergence results on the last iterate of OGDA or OMWU for matrix games are unsatisfactory -they lack explicit convergence rates (Popov, 1980; Mertikopoulos et al., 2019) , only apply to exponentially small learning rate thus not reflecting the behavior of the algorithms in practice (Daskalakis & Panageas, 2019b), or require additional conditions such as uniqueness of the equilibrium or a good initialization (Daskalakis & Panageas, 2019b) . Motivated by this fact, in this work, we first improve the last-iterate convergence result of OMWU for matrix games. Under the same unique equilibrium assumption as made by Daskalakis & Panageas (2019b), we show linear convergence with a concrete rate in terms of the Kullback-Leibler divergence between the last iterate and the equilibrium, using a learning rate whose value is set to a universal constant. We then significantly extend our results and consider OGDA for general constrained and smooth convex-concave saddle-point problems, without the uniqueness assumption. Specifically, we start with proving an average duality gap convergence of OGDA at the rate of O(1/ √ T ) after T iterations. Then, to obtain a more favorable last-iterate convergence in terms of the distance to the set of equilibria, we propose a general sufficient condition on X , Y, and f , called Saddle-Point Metric Subregularity (SP-MS), under which we prove concrete last-iterate convergence rates, all with a constant learning rate and without further assumptions. Our last-iterate convergence results of OGDA greatly generalize that of (Hsieh et al., 2019, Theorem 2), which itself is a consolidated version of results from several earlier works. The key implication of our new results is that, by showing that matrix games satisfy our SP-MS condition, we provide by far the most general last-iterate guarantee with a linear convergence for this problem using OGDA. Compared to that of OMWU, the convergence result of OGDA holds more generally even when there are multiple equilibria. More generally, the same linear last-iterate convergence holds for any bilinear games over polytopes since they also satisfy the SP-MS condition as we show. To complement this result, we construct an example of a bilinear game with a non-polytope feasible set where OGDA provably does not ensure linear convergence, indicating that the shape of the feasible set matters. Finally, we also provide experimental results to support our theory. In particular, we observe that OGDA generally converges faster than OMWU for matrix games, despite the facts that both provably converge exponentially fast and that OMWU is often considered more favorable compared to OGDA when the feasible set is the simplex.

2. RELATED WORK

Average-iterate convergence. While showing last-iterate convergence has been a challenging task, it is well-known that the average-iterate of many standard algorithms such as GDA and MWU enjoys a converging duality gap at the rate of O(1/ √ T ) (Freund & Schapire, 1999) . A line of works show that the rate can be improved to O(1/T ) using the "optimistic" version of these algorithms such as OGDA and OMWU (Rakhlin & Sridharan, 2013; Daskalakis et al., 2015; Syrgkanis et al., 2015) . For tasks such as training GANs, however, average-iterate convergence is unsatisfactory since averaging large neural networks is usually prohibited. Extra-Gradient (EG) algorithms. The saddle-point problem fits into the more general variational inequality framework (Harker & Pang, 1990) . A classic algorithm for variational inequalities is EG, first introduced in (Korpelevich, 1976) . Tseng (1995) is the first to show last-iterate convergence for EG in various settings such as bilinear or strongly-convex-strongly-concave problems. Recent works significantly expand the understanding of EG and its variants for unconstrained bilinear problems (Liang & Stokes, 2019) , unconstrained strongly-convex-strongly-concave problems (Mokhtari et al., 2020b), and more (Zhang et al., 2019; Lin et al., 2020; Golowich et al., 2020b) . The original EG is not applicable to a repeated game setting where only one gradient evaluation is possible in each iteration. Moreover, unlike OGDA and OMWU, EG is shown to have linear regret against adversarial opponents, and thus it is not a no-regret learning algorithm (Bowling, 2005;  

