MIN-MAX MULTI-OBJECTIVE BILEVEL OPTIMIZATION WITH APPLICATIONS IN ROBUST MACHINE LEARNING

Abstract

We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descentascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary point at a rate of O(n 1 /2 K -2 /5 ) for a class of weakly convex problems with n objectives upon K iterations of the algorithm. Our analysis utilizes novel results to handle the non-smooth min-max multi-objective setup and to obtain a sublinear dependence in the number of objectives n. Experimental results on robust representation learning and robust hyperparameter optimization showcase (i) the advantages of considering the min-max multi-objective setup, and (ii) convergence properties of the proposed MORBiT. Our code is at https://github.com/minimario/MORBiT.

1. INTRODUCTION

We begin by examining the classic bilevel optimization (BLO) problem as follows: min x∈X ⊆R dx f (x, y ⋆ (x)) subject to y ⋆ (x) ∈ arg min y∈Y=R dy g(x, y) (1) where f : X × Y → R is the upper-level (UL) objective function and g : X × Y → R is the lower-level (LL) objective function. X and Y, respectively, denote the domains for the UL and LL optimization variables x and y, incorporating any respective constraints. Equation 1 is called BLO because the UL objective f depends on both x and the solution y ⋆ (x) of the LL objective g. BLO is well-studied in the optimization literature (Bard, 2013; Dempe, 2002) . Recently, stochastic BLO has found various applications in machine learning (Liu et al., 2021; Chen et al., 2022a) , such as hyperparameter optimization (Franceschi et al., 2018) , reinforcement learning or RL (Hong et al., 2020) , multi-task representation learning (Arora et al., 2020 ), model compression (Zhang et al., 2022) , adversarial attack generation (Zhao et al., 2022) and invariant risk minimization (Zhang et al., 2023) . In this work, we focus on a robust generalization of equation 1 to the multi-objective setting, where there are n different objective function pairs (f i , g i ). Let [n] ≜ {1, 2, • • • , n} and f i : X × Y i → R, g i : X × Y i → R denote the i th UL and LL objectives respectively. We study the following problem: min x∈X ⊆R dx max i∈[n] f i (x, y ⋆ i (x)) subject to y ⋆ i (x) ∈ arg min yi∈Yi=R dy i g i (x, y i ), ∀i ∈ [n]. Here, the optimization variable x is shared across all objectives f i , g i , i ∈ [n], while the variables y i , i ∈ [n] are only involved in their corresponding objectives f i , g i . The goal is to find a robust solution x ∈ X , such that, the worst-case across all objectives is minimized. This is a generic problem which reduces to equation 1 if we have a single objective pair, that is n = 1. Such a robust optimization problem is useful in various applications, and especially necessary in any safety-critical ones. For example, in decision optimization, the different objectives (f i , g i ) can correspond to different "scenarios" (such as plans for different scenarios), with x being the shared decision variable and y i 's being scenario-specific decision variables. The goal of equation 2 is to find the robust shared decision x which provides robust performance across all the n considered scenarios, so that such a robust assignment of decision variables will generalize well on other scenarios. In machine learning, robust representation learning is important in object recognition and facial recognition where we desire robust worst-case performance across different groups of objects or different population demographics. In RL applications with multiple agents (Busoniu et al., 2006; Li et al., 2019; Gronauer & Diepold, 2022) , our robust formulation in equation 2 would generate a shared model of the worldthe UL variable x -such that the worst-case utility, max i f i (x, y ⋆ i (x)), of the agent-specific optimal action -the LL variable y ⋆ i (x) -is optimized, ensuring robust performance across all agents. An additional technical advantage of the general multi-objective problem in equation 2 is that it allows the objective-specific variables y i ∈ Y i to come from different domains, that is, Y i ̸ = Y j , i, j ∈ [n]; as stated in equation 2, this implies that the dimensionality d yi for the per-objective y i need not be the same across all objectives. This allows for a larger class of problems where each objective can then have different number of objective specific variables but we still require a robust shared variable x. For example, in multi-agent RL, different agents can have different action spaces because they need to operate in different mediums (land, water, air, etc) . Focusing on stochastic objectives common in ML, the main contributions of this work are as follows: ▶ (New algorithm design) We present a single loop Multi-Objective Robust Bilevel Two-timescale optimization algorithm, MORBiT, which uses (i) SGD for the unconstrained strongly convex LL problem, and (ii) projected SGD for the constrained weakly convex UL problem. ▶ (Theoretical convergence guarantees) We demonstrate that, under standard smoothness and regularity conditions, MORBiT with n objectives converges to a O(n 1 /2 K -5 /2 )-stationary point with K iterations, matching the best convergence rate for single-loop single-objective (n = 1) BLO algorithms with the constrained UL problem while using vanilla SGD for the LL problem, and providing a sublinear n 1 /2 -dependence on the number of objective pairs n. ▶ (Two sets of applications) We present two applications involving min-max multi-objective bilevel problems, robust representation learning and robust hyperparameter optimization (HPO), and demonstrate the effectiveness of our proposed algorithm MORBiT. Paper Outline. In the following section 2, we further discuss the different aspects of the problem in equation 2 and compare that to the problems and solutions considered in existing literature. We present our novel algorithm, MORBiT, and analyse its convergence properties in section 3, and empirically evaluate it in section 4. We conclude with future directions in section 5.

2. PROBLEM AND RELATED WORK

We first discuss the different aspects of the robust multi-objective BLO problem with constrained UL in equation 2. While BLO is used in machine learning (Liu et al., 2021; Chen et al., 2022a) , multi-objective BLO has not received much attention. In multi-task learning (MTL), the optimization problem is a multi-objective problem in nature, but is usually solved by summing the objectives and using a single-objective solver, that is, optimizing the objective i f i . The robust min-max extension of MTL (Mehta et al., 2012; Collins et al., 2020) and RL (Li et al., 2019) have been shown to improve generalization performance, supporting the need for a more complex multi-objective optimization problem that replaces the objective i f i with the objective max i f i . For SGD-based solutions to stochastic BLO, one critical aspect is whether the algorithm is single-loop (a single update for both x and y in each iteration) or double-loop (multiple updates for the LL y between each update of the UL x). Double-loop algorithms can have faster empirical convergence, but are more computationally intensive, and their performance is extremely sensitive to the step-sizes and termination criterion for the LL updates. Double-loop algorithms are not applicable when the (stochastic) gradients of the LL and UL problems are only provided sequentially, such as in logistics, motion planning and RL problems. Hence, we develop and analyse a single-loop algorithm. A final aspect of BLO is the constrained UL problem. When the UL variable x corresponds to some decision variable in a decision optimization problem or a hyperparameter in HPO, we must consider a constrained form, x ∈ X ⊂ R dx . To capture a more general form of the bilevel problem, we focus on the constrained UL setup. In the remainder of this section, we will review existing literature on single-objective and multi-objective BLO and robust optimization, especially in the context of machine learning. Table 1 provides a snapshot of the properties of the problems and algorithms (with rigorous convergence analysis) studied in recent machine learning literature.

