MIN-MAX MULTI-OBJECTIVE BILEVEL OPTIMIZATION WITH APPLICATIONS IN ROBUST MACHINE LEARNING

Abstract

We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descentascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary point at a rate of O(n 1 /2 K -2 /5 ) for a class of weakly convex problems with n objectives upon K iterations of the algorithm. Our analysis utilizes novel results to handle the non-smooth min-max multi-objective setup and to obtain a sublinear dependence in the number of objectives n. Experimental results on robust representation learning and robust hyperparameter optimization showcase (i) the advantages of considering the min-max multi-objective setup, and (ii) convergence properties of the proposed MORBiT. Our code is at https://github.com/minimario/MORBiT.

1. INTRODUCTION

We begin by examining the classic bilevel optimization (BLO) problem as follows: min x∈X ⊆R dx f (x, y ⋆ (x)) subject to y ⋆ (x) ∈ arg min y∈Y=R dy g(x, y) (1) where f : X × Y → R is the upper-level (UL) objective function and g : X × Y → R is the lower-level (LL) objective function. X and Y, respectively, denote the domains for the UL and LL optimization variables x and y, incorporating any respective constraints. Equation 1 is called BLO because the UL objective f depends on both x and the solution y ⋆ (x) of the LL objective g. BLO is well-studied in the optimization literature (Bard, 2013; Dempe, 2002) . Recently, stochastic BLO has found various applications in machine learning (Liu et al., 2021; Chen et al., 2022a) , such as hyperparameter optimization (Franceschi et al., 2018) In this work, we focus on a robust generalization of equation 1 to the multi-objective setting, where there are n different objective function pairs (f i , g i ). Let [n] ≜ {1, 2, • • • , n} and f i : X × Y i → R, g i : X × Y i → R denote the i th UL and LL objectives respectively. We study the following problem: min x∈X ⊆R dx max i∈[n] f i (x, y ⋆ i (x)) subject to y ⋆ i (x) ∈ arg min yi∈Yi=R dy i g i (x, y i ), ∀i ∈ [n]. Here, the optimization variable x is shared across all objectives f i , g i , i ∈ [n], while the variables y i , i ∈ [n] are only involved in their corresponding objectives f i , g i . The goal is to find a robust solution x ∈ X , such that, the worst-case across all objectives is minimized. This is a generic problem which reduces to equation 1 if we have a single objective pair, that is n = 1. Such a robust optimization problem is useful in various applications, and especially necessary in any safety-critical ones. For example, in decision optimization, the different objectives (f i , g i ) can correspond to different "scenarios" (such as plans for different scenarios), with x being the shared decision variable and y i 's being scenario-specific decision variables. The goal of equation 2 is to find the robust shared decision x which provides robust performance across all the n considered scenarios, so that such a robust assignment of decision variables will generalize well on other scenarios. In machine

