ACCELERATED SINGLE-CALL METHODS FOR CON-STRAINED MIN-MAX OPTIMIZATION

Abstract

We study first-order methods for constrained min-max optimization. Existing methods either require two gradient calls or two projections in each iteration, which may be costly in some applications. In this paper, we first show that a variant of the Optimistic Gradient (OG) method, a singlecall single-projection algorithm, has O( 1√ T ) best-iterate convergence rate for inclusion problems with operators that satisfy the weak Minty variation inequality (MVI). Our second result is the first single-call singleprojection algorithm -the Accelerated Reflected Gradient (ARG) method that achieves the optimal O( 1 T ) last-iterate convergence rate for inclusion problems that satisfy negative comonotonicity. Both the weak MVI and negative comonotonicity are well-studied assumptions and capture a rich set of non-convex non-concave min-max optimization problems. Finally, we show that the Reflected Gradient (RG) method, another single-call single-projection algorithm, has O( 1 √ T ) last-iterate convergence rate for constrained convex-concave min-max optimization, answering an open problem of (Hsieh et al., 2019). Our convergence rates hold for standard measures such as the tangent residual and the natural residual.

1. INTRODUCTION

Various Machine Learning applications, from the generative adversarial networks (GANs) (e.g., (Goodfellow et al., 2014; Arjovsky et al., 2017)) , adversarial examples (e.g., (Madry et al., 2017) ), robust optimization (e.g., (Ben-Tal et al., 2009) ), to reinforcement learning (e.g., (Du et al., 2017; Dai et al., 2018) ), can be captured by constrained min-max optimization. Unlike the well-behaved convex-concave setting, these modern ML applications often require solving non-convex non-concave min-max optimization problems in high dimensional spaces. Unfortunately, the general non-convex non-concave setting is intractable even for computing a local solution (Hirsch et al., 1989; Papadimitriou, 1994; Daskalakis et al., 2021) . Motivated by the intractability, researchers turn their attention to non-convex non-concave settings with structure. Significant progress has been made for several interesting structured non-convex non-concave settings, such as the ones that satisfy the weak Minty variation inequality (MVI) (Definition 2) (Diakonikolas et al., 2021; Pethick et al., 2022) and the ones that satisfy the more strict negatively comonotone condition (Definition 3) (Lee & Kim, 2021a; Cai et al., 2022a) . These algorithms are variations of the celebrated extragradient (EG) method (Korpelevich, 1976) , an iterative first-order method. Similar to the extragradient method, these algorithms all require two oracle calls per iteration, which may be costly in practice. We investigate the following important question in this paper: Can we design efficient single-call first-order methods for structured non-convex non-concave min-max optimization? (*) We provide an affirmative answer to the question. We first show that a single-call method known as the Optimistic Gradient (OG) method (Hsieh et al., 2019) is applicable to all non-convex non-concave settings that satisfy the weak MVI. We then provide the Accelerated Reflected Gradient (ARG) method that achieves the optimal convergence rate in all non-convex non-concave settings that satisfy the negatively comonotone condition. Single-call methods have been studied in the convex-concave settings (Hsieh et al., 2019) but not for the more general non-convex non-concave settings. See Table 1 for comparisons between our algorithms and other algorithms from the literature.  ✗ ✗ O( 1 √ T ) O( 1 √ T ) CEG+ (Pethick et al., 2022) ✗ ✓ O( 1 √ T ) O( 1 √ T ) OGDA (B öhm, 2022; Bot et al., 2022) ✓ ✗ O( 1 √ T ) O( 1 √ T ) OG[this paper] ✓ ✓ O( 1 √ T ) O( 1 √ T ) Accelerated FEG (Lee & Kim, 2021b) ✗ ✗ O( 1 T ) AS (Cai et al., 2022a) ✗ ✓ O( 1 T ) ARG [This paper] ✓ ✓ O( 1 T ) Table 1 : Existing results for min-max optimization problem with non-monotone operators. A ✓ in "Constraints?" means the algorithm works in the constrained setting. The convergence rate is in terms of the operator norm (in the unconstrained setting) and the residual (in the constrained setting).

1.1. OUR CONTRIBUTIONS

Throughout the paper, we adopt the more general and abstract framework of inclusion problems, which includes constrained min-max optimization as a special case. More specifically, we consider the following problem. Inclusion Problem. Given E = F + A where F : R n → R n is a single-valued (possibly non-monotone) operator and A : R n ⇒ R n is a set-valued maximally monotone operator, the inclusion problem is defined as follows find z * ∈ Z such that 0 ∈ E(z * ) = F(z * ) + A(z * ). (IP) As shown in the following example, we can interpret a min-max optimization problem as an inclusion problem. Example 1 (Min-Max Optimization). The following structured min-max optimization problem captures a wide range of applications in machine learning such as GANs, adversarial examples, robust optimization, and reinforcement learning: min x∈R nx max y∈R ny f (x, y) + g(x) -h(y), where f (•, •) is possibly non-convex in x and non-concave in y. Regularized and constrained minmax problems are covered by appropriate choices of lower semi-continuous and convex functions g and h. Examples include the ℓ 1 -norm, the ℓ 2 -norm, and the indicator function of a closed convex feasible set. Let z = (x, y), if we define F(z) = (∂ x f (x, y), -∂ y f (x, y)) and A(z) = (∂g(x), ∂h(y)), where A is maximally monotone, then the first-order optimality condition of (1) has the form of an inclusion problem. ( Daskalakis et al., 2021) shows that without any assumption on the operator E = F + A, the problem is intractable. 1 The most well understood setting is when E is monotone, i.e., ⟨uv, zz ′ ⟩ ≥ 0 for all z, z ′ and u ∈ E(z), v ∈ E(z ′ ), which captures convex-concave min-max optimization. Motivated by non-convex non-concave min-max optimization, we consider the two most widely studied families of non-monotone operators: (i) negatively



Indeed, even if A is maximally monotone,(Daskalakis et al., 2021) implies that the problem is still intractable without further assumptions on F.

