SOLVING CONSTRAINED VARIATIONAL INEQUALITIES VIA A FIRST-ORDER INTERIOR POINT-BASED METHOD

Abstract

We develop an interior-point approach to solve constrained variational inequality (cVI) problems. Inspired by the efficacy of the alternating direction method of multipliers (ADMM) method in the single-objective context, we generalize ADMM to derive a first-order method for cVIs, that we refer to as ADMM-based interiorpoint method for constrained VIs (ACVI). We provide convergence guarantees for ACVI in two general classes of problems: (i) when the operator is ξ-monotone, and (ii) when it is monotone, some constraints are active and the game is not purely rotational. When the operator is, in addition, L-Lipschitz for the latter case, we match known lower bounds on rates for the gap function of O(1/ √ K) and O(1/K) for the last and average iterate, respectively. To the best of our knowledge, this is the first presentation of a first-order interior-point method for the general cVI problem that has a global convergence guarantee. Moreover, unlike previous work in this setting, ACVI provides a means to solve cVIs when the constraints are nontrivial. Empirical analyses demonstrate clear advantages of ACVI over common first-order methods. In particular, (i) cyclical behavior is notably reduced as our methods approach the solution from the analytic center, and (ii) unlike projection-based methods that zigzag when near a constraint, ACVI efficiently handles the constraints.

1. INTRODUCTION

We are interested in the constrained variational inequality problem (Stampacchia, 1964) : find x ⋆ ∈ X s.t. ⟨x -x ⋆ , F (x ⋆ )⟩ ≥ 0, ∀x ∈ X , ( ) where X is a subset of the Euclidean n-dimensional space R n , and where F : X → R n is a continuous map. Finding (an element of) the solution set S ⋆ X ,F of cVI is a key problem in multiple fields such as economics and game theory. More pertinent to machine learning, CVIs generalize standard single-objective optimization, complementarity problems (Cottle & Dantzig, 1968 ), zerosum games (von Neumann & Morgenstern, 1947; Rockafellar, 1970) and multi-player games. For example, solving cVI is the optimization problem underlying reinforcement learning (e.g., Omidshafiei et al., 2017) -and generative adversarial networks (Goodfellow et al., 2014) . Moreover, even when training one set of parameters with one loss f , that is F (x) ≡ ∇ x f (x), a natural way to improve the model's robustness in some regard is to introduce an adversary to perturb the objective or the input, or to consider the worst sample distribution of the empirical objective. As has been noted in many problem domains, including robust classification (Mazuelas et al., 2020 ), adversarial training (Szegedy et al., 2014 ), causal inference (Christiansen et al., 2020) , and robust objectives (e.g., Rothenhäusler et al., 2018) , this leads to a min-max structure, which is an instance of the cVI problem. To see this, consider two sets of parameters (agents), x 1 ∈ X 1 and x 2 ∈ X 2 , that share a loss/utility function, f : X 1 × X 2 → R, which the first agent aims to minimize  min x1∈R+ max x2∈R+ 0.05 • x 2 1 + x 1 x 2 -0.05 • x 2 2 . The constraints are depicted with dashed lines and the iterates with circles. ACVI gets close to the Nash Equilibrium (⋆) in a single step, whereas EG zigzags when hitting a constraint. The remaining commonly used methods-GDA, OGDA, and LA-GDA-perform similarly to EG, see App. E. and the second agent aims to maximize. Then the problem is to find a saddle point of f , i.e., a point (x ⋆ 1 , x ⋆ 2 ) such that f (x ⋆ 1 , x 2 ) ≤ f (x ⋆ 1 , x ⋆ 2 ) ≤ f (x 1 , x ⋆ 2 ) . This corresponds to a cVI with F (x) ≡ [∇ x1 f (x 1 , x 2 ) -∇ x2 f (x 1 , x 2 )] ⊺ . Solving cVIs is significantly more challenging than single-objective optimization problems, due to the fact that F is a general vector field, leading to "rotational" trajectories in parameter space (App. A). In response, the development of efficient algorithms with provable convergence has recently been the focus of interest in machine learning and optimization, particularly in the unconstrained setting, where X ≡ R n (e.g., Tseng, 1995; Daskalakis et al., 2018; Mokhtari et al., 2019; 2020; Golowich et al., 2020b; Azizian et al., 2020; Chavdarova et al., 2021a; Gorbunov et al., 2022; Bot et al., 2022) . In many applications, however, we have constraints on (part of) the decision variable x, that is, X is often a strict subset of R n . As an example, let us revisit the aforementioned distributionally robust prediction problem: consider a linear setting (cf. Eq. 1 in Rothenhäusler et al., 2018) and class of parametrized distributions △ ≡ {w ∈ R d |w ≥ 0, e ⊺ w = 1}, where e ∈ R d is a vector of all ones. Thus, the robust problem is: min x∈R n max w∈R d w ⊺ (y -Dx), subject to w ≥ 0, e ⊺ w = 1 , where D ∈ R d×n contains d samples of an n-dimensional covariate vector, and y ∈ R d is the vector of target variables (the constraint w ≤ 1 is implied). This illustrates that given a standard minimization problem, its robustification immediately leads to an instance of the cVI problem; see further examples in § 5. Additional example applications include (i) machine learning applications in business, finance, and economics where often the sum of the decision variables-representing, for example, resources-cannot exceed a specific value, (ii) contract theory (e.g. §2.3.2 in (Bates et al., 2022) where one player is the parameters of a probability distribution as above), and (iii) solving optimal control problems numerically, among others. Significantly fewer works address the convergence of first-order optimization methods in the constrained setting; see § 2 for an overview. Recently, Cai et al. ( 2022) established a convergence rate for the projected extragradient method (Korpelevich, 1976), when F is monotone and Lipschitz (see § 3 for definitions). However, (i) the proof that the authors presented is computer-assisted, which makes it hard to interpret and of limited usefulness for inspiring novel (e.g., accelerated) methods, and (ii) the considered setting assumes the projection is fast to compute and thus ignores the projection in the rate. The latter assumption only holds in rare cases when the constraints are relatively simple so that operations such as clipping suffice. However, when the inequality and/or equality constraints are of a general form, each EG update requires two projections (see App. A.4). Each projection requires solving a new/separate constrained optimization problem, which if given general constraints implies the need for a second-order method as explained next. Interior point (IP) methods are the de facto family of iterative algorithms for constrained optimization. These methods enjoy well-established guarantees and theoretical understanding in the context of single-objective optimization [see, e.g., Boyd & Vandenberghe (2004 ), Ch.11, Megiddo (1989 ), Wright (1997) ], and have extensions to a wide range of problem settings (e.g., Tseng, 1993; Nesterov & Nemirovski, 1994; Nesterov & Todd, 1998; Renegar, 2001; Wright, 2001) . They build on a natural idea of solving a simplified homotopic problem that makes it possible to "smoothly" transition to the original complex problem; see § 3.1. Several works extend IP methods to cVI, by applying the second-order Newton method to a modified Karush-Kuhn-Tucker (KKT) system appropriate for the cVI (Ralph & Wright, 2000; Qi & Sun, 2002; Fan & Yan, 2010; Monteiro & Pang, 1996; Chen et al., 1998) . Many of these approaches, however, rely on strong assumptions-see § 2. Moreover, although these methods enjoy fast convergence in terms of the number of iterations, each iteration involves the computation of the Jacobian of F (or Hessian when F ≡ ∇f (x)) which quickly becomes prohibitive for large dimension n of x. Hence first-order methods are preferred in practice.



Figure 1: ACVI (Algorithm 1) and EG iteratesdepicted in red and green, resp.-on the game: min

