LINEARLY CONSTRAINED BILEVEL OPTIMIZATION: A SMOOTHED IMPLICIT GRADIENT APPROACH

Abstract

This work develops an analysis and algorithms for solving a class of bilevel optimization problems where the lower-level (LL) problems have linear constraints. Most of the existing approaches for constrained bilevel problems rely on value function based approximate reformulations, which suffer from issues such as nonconvex and non-differentiable constraints. In contrast, in this work, we develop an implicit gradient-based approach, which is easy to implement, and is suitable for machine learning applications. We first provide an in-depth understanding of the problem, by showing that the implicit objective for such problems is in general non-differentiable. However, if we add some small (linear) perturbation to the LL objective, the resulting implicit objective becomes differentiable almost surely. This key observation opens the door for developing (deterministic and stochastic) gradient-based algorithms similar to the state-of-the-art ones for unconstrained bi-level problems. We show that when the implicit function is assumed to be strongly-convex, convex and weakly-convex, the resulting algorithms converge with guaranteed rate. Finally, we experimentally corroborate the theoretical findings and evaluate the performance of the proposed framework on numerical and adversarial learning problems. To our knowledge, this is the first time that (implicit) gradientbased methods have been developed and analyzed for the considered class of bilevel problems.

1. INTRODUCTION

Bilevel optimization problems (Colson et al., 2005; Dempe & Zemkoho, 2020) can be used to model an important class of hierarchical optimization tasks with two levels of hierarchy, the upper-level (UL) and the lower-level (LL). The key characteristics of bilevel problems are: 1) the solution of the UL problem requires access to the solution of the LL problem and, 2) the LL problem is parametrized by the UL variable. Bilevel optimization problems arise in a wide range of machine learning applications, such as meta-learning (Rajeswaran et al., 2019; Franceschi et al., 2018) , data hypercleaning (Shaban et al., 2019) , hyperparameter optimization (Sinha et al., 2020; Franceschi et al., 2018; 2017; Pedregosa, 2016) , adversarial learning (Li et al., 2019; Liu et al., 2021a; Zhang et al., 2021) , as well as in other application domains such as network optimization (Migdalas, 1995), economics (Cecchini et al., 2013) , and transport research (Didi-Biha et al., 2006; Kalashnikov et al., 2010) . In this work, we focus on a special class of stochastic bilevel optimization problems, where the LL problem involves the minimization of a strongly convex objective over a set of linear inequality constraints. More precisely, we consider the following formulation: min x∈X G(x) := f (x, y * (x)) := E ξ [ f (x, y * (x); ξ)] , s.t. y * (x) ∈ arg min y∈R d ℓ h(x, y) Ay ≤ b , where ξ ∼ D represents a stochastic sample of the objective f (•, •), X ⊆ R du is a convex and closed set, f : X × R d ℓ → R is the UL objective, h : X × R d ℓ → R is the LL objective, and f, h are smooth functions. We focus on the problems where h(x, y) is strongly convex with respect to y. The matrix A ∈ R k×d ℓ , and vector b ∈ R k define the linear constraints. In the following, we refer to (1a) as the UL problem, and to (1b) as the LL one.

