STAY-ON-THE-RIDGE: GUARANTEED CONVERGENCE TO LOCAL MINIMAX EQUILIBRIUM IN NONCONVEX-NONCONCAVE GAMES

Abstract

Min-max optimization problems involving nonconvex-nonconcave objectives have found important applications in adversarial training and other multi-agent learning settings. Yet, no known gradient descent-based method is guaranteed to converge to (even local notions of) min-max equilibrium in the nonconvex-nonconcave setting. For all known methods, there exist relatively simple objectives for which they cycle or exhibit other undesirable behavior different from converging to a point, let alone to some game-theoretically meaningful one Vlatakis-Gkaragkounis et al. ( 2019); Hsieh et al. (2021). The only known convergence guarantees hold under the strong assumption that the initialization is very close to a local min-max equilibrium Wang et al. ( 2019). Moreover, the afore-described challenges are not just theoretical curiosities. All known methods are unstable in practice, even in simple settings. We propose the first method that is guaranteed to converge to a local min-max equilibrium for smooth nonconvex-nonconcave objectives. Our method is secondorder and provably escapes limit cycles as long as it is initialized at an easy-to-find initial point. Both the definition of our method and its convergence analysis are motivated by the topological nature of the problem. In particular, our method is not designed to decrease some potential function, such as the distance of its iterate from the set of local min-max equilibria or the projected gradient of the objective, but is designed to satisfy a topological property that guarantees the avoidance of cycles and implies its convergence.

1. INTRODUCTION

Min-max optimization lies at the foundations of Game Theory von Neumann (1928 ), Convex Optimization Dantzig (1951a) ; Adler (2013) and Online Learning Blackwell (1956) ; Hannan (1957) ; Cesa-Bianchi & Lugosi (2006) , and has found many applications in theoretical and applied fields including, more recently, in adversarial training and other multi-agent learning problems Goodfellow et al. (2014) ; Madry et al. (2018) ; Zhang et al. (2019) . In its general form, it can be written as min θ∈Θ max ω∈Ω f (θ, ω), where Θ and Ω are convex subsets of the Euclidean space, and f is continuous. Equation (1) can be viewed as a model of a sequential-move game wherein a player who is interested in minimizing f chooses θ first, and then a player who is interested in maximizing f chooses ω after seeing θ. Solving (1) corresponds to an equilibrium of this sequential-move game. We may also study the simultaneous-move game with the same objective f wherein the minimizing player and the maximizing player choose θ and ω simultaneously. The Nash equilibrium of the simultaneous-move game, also called a min-max equilibrium, is a pair (θ , ω ) ∈ Θ × Ω such that f (θ , ω ) ≤ f (θ, ω ), for all θ ∈ Θ and f (θ , ω ) ≥ f (θ , ω), for all ω ∈ Ω. It is easy to see that a Nash equilibrium of the simultaneous-move game also constitutes a Nash equilibrium of the sequential-move game, but the converse need not be true Jin et al. (2019) . Here, we focus on solving the (harder) simultaneous-move game. In particular, we study the existence of dynamics which converge to solutions of the simultaneous-move game, namely the existence of methods that make incremental updates to a pair (θ t , ω t ) so as the sequence (θ t , ω t ) converges, as t → ∞, to some (θ * , ω * ) satisfying equation 2 or some relaxation of it. This problem has been extensively studied in the special case where Θ and Ω are convex and compact and f is convex-concave -i.e. convex in θ for all ω and concave in ω for all θ. In this case, the set of Nash equilibria of the simultaneous-move game is equal to the set of Nash equilibria of the sequential-move game, and these sets are non-empty and convex von Neumann (1928) . Even in this simple setting, however, many natural dynamics surprisingly fail to converge: gradient descentascent, as well as various continuous-time versions of follow-the-regularized-leader, not only fail to converge to a min-max equilibrium, even for very simple objectives, but may even exhibit chaotic Our focus in this paper is on the more general case where f is not convex-concave, i.e. it may fail to be convex in θ for all ω, or may fail to be concave in ω for all θ, or both. We call this general setting where neither convexity with respect to θ nor concavity with respect to ω is assumed, the nonconvexnonconcave setting. This setting presents some substantial challenges. First, min-max equilibria are not guaranteed to exist, i.e. for general objectives there may be no (θ , ω ) satisfying equation 2; this happens even in very simple cases, e.g. when Θ = Ω = [0, 1] and f (θ, ω) = (θ -ω) 2 . Second, it is NP-hard to determine whether a min-max equilibrium exists Daskalakis et al. ( 2021) and, as is easy to see, it is also NP-hard to compute Nash equilibria of the sequential-move game (which do exist under compactness of the constraint sets). For these reasons, the optimization literature has targeted the computation of local and/or approximate solutions in this setting Daskalakis & Panageas ( 2018 (2021) . This is the approach we also take in this paper, targeting the computation of ( , δ)-local min-max equilibria, which were proposed in Daskalakis et al. (2021) . These are approximate and local Nash equilibria of the simultaneous-move game, defined as feasible points (θ , ω ) which satisfy a relaxed and local version of equation 2, namely: f (θ , ω ) < f (θ, ω ) + , for all θ ∈ Θ such that θ -θ ≤ δ; (3) f (θ , ω ) > f (θ , ω) -, for all ω ∈ Ω such that ω -ω ≤ δ. (4) Besides being a natural concept of local, approximate min-max equilibrium, an attractive feature of ( , δ)-local min-max equilibria is that they are guaranteed to exist when f is Λ-smooth and the locality parameter, δ, is chosen small enough in terms of the smoothness, Λ, and the approximation parameter, , namely whenever δ ≤ 2 Λ . Indeed, in this regime of parameters the ( , δ)-local min-max equilibria are in correspondence with the approximate fixed points of the Projected Gradient Descent/Ascent dynamics. Thus, the existence of the former can be established by invoking Brouwer's fixed point theorem to establish the existence of the latter. (Theorem 5.1 of Daskalakis et al. ( 2020)). There are a number of existing approaches which would be natural to use to find a solution (θ , ω ) satisfying equation 3 and equation 4, but all run into significant obstacles. First, the idea of averaging, which can be leveraged in the convex-concave setting to obtain provable guarantees for otherwise chaotic algorithms, such as online gradient descent, no longer works, as it critically uses Jensen's inequality which needs convexity/concavity. On the other hand, negative results abound for lastiterate convergence: Hsieh et al. (2021) show that a variety of zeroth, first, and second order methods may converge to a limit cycle, even in simple settings. Vlatakis-Gkaragkounis et al. (2019) study a



); Mazumdar & Ratliff (2018); Jin et al. (2019); Wang et al. (2019); Daskalakis et al. (2021); Mangoubi & Vishnoi

behavior Mertikopoulos et al. (2018); Vlatakis-Gkaragkounis et al. (2019); Hsieh et al. (2021). In order to circumvent these negative results, an extensive line of work has introduced other algorithms, such as extragradient Korpelevich (1976) and optimistic gradient descent Popov (1980), which exhibit last-iterate convergence to the set of min-max equilibria in this setting; see e.g.

