Solving stochastic weak Minty variational inequalities without increasing batch size

Abstract

This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty variational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per iteration which can however be prohibitively expensive. In contrast, our proposed methods involves two stepsizes and only requires one additional oracle evaluation per iteration. We show that it is possible to keep one fixed stepsize while it is only the second stepsize that is taken to be diminishing, making it interesting even in the monotone setting. Almost sure convergence is established and we provide a unified analysis for this family of schemes which contains a nonlinear generalization of the celebrated primal dual hybrid gradient algorithm.

1. Introduction

Stochastic first-order methods have been at the core of the current success in deep learning applications. These methods are mostly well-understood for minimization problems at this point. This is even the case in the nonconvex setting where there exists matching upper and lower bounds on the complexity for finding an approximately stable point (Arjevani et al., 2019) . The picture becomes less clear when moving beyond minimization into nonconvex-nonconcave minimax problems-or more generally nonmonotone variational inequalities. Even in the deterministic case, finding a stationary point is in general intractable (Daskalakis et al., 2021; Hirsch & Vavasis, 1987) . This is in stark contrast with minimization where only global optimality is NP-hard. An interesting nonmonotone class for which we do have efficient algorithms is characterized by the so called weak Minty variational inequality (MVI) (Diakonikolas et al., 2021) . This problem class captures nontrivial structures such as attracting limit cycles and is governed by a parameter ρ whose negativity increases the degree of nonmonotonicity. It turns out that the stepsize γ for the exploration step in extragradient-type schemes lower bounds the problem class through ρ > -γ /2 (Pethick et al., 2022) . In other words, it seems that we need to take γ large to guarantee convergence for a large class. This reliance on a large stepsize is at the core of why the community has struggled to provide a stochastic variants for weak MVIs. The only known results effectively increase the batch size at every iteration (Diakonikolas et al., 2021, Thm. 4 .5)-a strategy that would be prohibitively expensive in most machine learning applications. Pethick et al. (2022) proposed (SEG+) which attempts to tackle the noise by only diminishing the second stepsize. This suffices in the special case of unconstrained quadratic games but can fail even in the monotone case as illustrated in Figure 1 . This naturally raises the following research question: Can stochastic weak Minty variational inequalities be solved without increasing the batch size? We resolve this open problem in the affirmative when the stochastic oracles are Lipschitz in mean, with a modification of stochastic extragradient called bias-corrected stochastic extragradient (BC-SEG+). The scheme only requires one additional first order oracle call, while crucially maintaining the fixed stepsize. Specifically, we make the following contributions: (i) We show that it is possible to converge for weak MVI without increasing the batch size, by introducing a bias-correction term. The scheme introduces no additional hyperparameters and recovers the maximal range ρ ∈ (-γ /2, ∞) of explicit deterministic schemes. The rate we establish is interesting already in the star-monotone case where only asymptotic convergence of the norm of the operator was known when refraining from increasing the batch size (Hsieh et al., 2020, Thm. 1) . Our result additionally carries over to another class of problem treated in Appendix G, which we call negative weak MVIs. (ii) We generalize the result to a whole family of schemes that can treat constrained and regularized settings. First and foremost the class includes a generalization of the forward-backwardforward (FBF) algorithm of Tseng (2000) to stochastic weak MVIs. The class also contains a stochastic nonlinear extension of the celebrated primal dual hybrid gradient (PDHG) algorithm (Chambolle & Pock, 2011) . Both methods are obtained as instantiations of the same template scheme, thus providing a unified analysis and revealing an interesting requirement on the update under weak MVI when only stochastic feedback is available. (iii) We prove almost sure convergence under the classical Robbins-Monro stepsize schedule of the second stepsize. This provides a guarantee on the last iterate, which is especially important in the nonmonotone case, where average guarantees cannot be converted into a single candidate solution. Almost sure convergence is challenging already in the monotone case where even stochastic extragradient may not converge (Hsieh et al., 2020, Fig. 1 ).

2. Related work

Weak MVI Diakonikolas et al. ( 2021) was the first to observe that an extragradient-like scheme called extragradient+ (EG+) converges globally for weak MVIs with ρ ∈ (-1 /8L F , ∞). This results was later tightened to ρ ∈ (-1 /2L F , ∞) and extended to constrained and regularized settings in (Pethick et al., 2022) . A single-call variant has been analysed in Böhm ( 2022). Weak MVI is a star variant of cohypomonotonicity, for which an inexact proximal point method was originally studied in Combettes & Pennanen (2004) . Later, a tight characterization was carried out by Bauschke et al. ( 2021) for the exact case. It was shown that acceleration is achievable for an extragradient-type scheme even for cohypomonotone problems (Lee & Kim, 2021) . Despite this array of positive results the stochastic case is largely untreated for weak MVIs. The only known result (Diakonikolas et al., 2021, Theorem 4.5) requires the batch size to be increasing. Similarly, the accelerated method in Lee & Kim (2021, Thm. 6.1) requires the variance of the stochastic oracle to decrease as O(1/k). Stochastic & monotone When more structure is present the story is different since diminishing stepsizes becomes permissible. In the monotone case rates for the gap function was obtained for stochastic Mirror-Prox in Juditsky et al. ( 2011) under bounded domain assumption, which was later relaxed for the extragradient method under additional assumptions (Mishchenko et al., 2020) 2019)). However, our use of the assumption are different in a crucial way. Whereas the variance reduction literature uses the stepsize γ ∝ 1/L F (see e.g. Alacaoglu & Malitsky (2021, Theorem 2.5)), we aim at using the much larger γ ∝ 1/L F . For instance, in the special case of a finite sum problem of size N, the mean square smoothness constant L F from Assumption III can be √ N times larger than L F (see Appendix I for details). This would lead to a prohibitively strict requirement on the degree of allowed nonmonotonicity through the relationship ρ > -γ /2.



. The norm of the operator was shown to asymptotically converge for unconstrained MVIs in Hsieh et al. (2020) with a double stepsize policy. There exists a multitude of extensions for monotone problems: Single-call stochastic methods are covered in detail by Hsieh et al. (2019), variance reduction was applied to Halpern-type iterations (Cai et al., 2022), cocoercivity was used in Beznosikov et al. Variance reduction The assumptions we make about the stochastic oracle in Section 3 are similar to what is found in the variance reduction literature (see for instance Alacaoglu & Malitsky (2021, Assumption 1) or Arjevani et al. (

