PROMETHEUS: ENDOWING LOW SAMPLE AND COM-MUNICATION COMPLEXITIES TO CONSTRAINED DE-CENTRALIZED STOCHASTIC BILEVEL LEARNING

Abstract

In recent years, constrained decentralized stochastic bilevel optimization has become increasingly important due to its versatility in modeling a wide range of multi-agent learning problems, such as multi-agent reinforcement learning and multi-agent meta-learning with safety constraints. However, one under-explored and fundamental challenge in constrained decentralized stochastic bilevel optimization is how to achieve low sample and communication complexities, which, if not addressed appropriately, could affect the long-term prospect of many emerging multi-agent learning paradigms that use decentralized bilevel optimization as a bedrock. In this paper, we investigate a class of constrained decentralized bilevel optimization problems, where multiple agents collectively solve a nonconvexstrongly-convex bilevel problem with constraints in the upper-level variables. Such problems arise naturally in many multi-agent reinforcement learning and meta learning problems. In this paper, we propose an algorithm called Prometheus (proximal tracked stochastic recursive estimator) that achieves the first O( -1 ) results in both sample and communication complexities for constrained decentralized bilevel optimization, where > 0 is a desired stationarity error. Collectively, the results in this work contribute to a theoretical foundation for low sample-and communicationcomplexity constrained decentralized bilevel learning.

1. INTRODUCTION

In recent years, the problem of constrained decentralized bilevel optimization has attracted increasing attention due to its foundational role in many emerging multi-agent learning paradigms with safety or regularization constraints. Such applications include, but are not limited to, safety-constrained multiagent reinforcement learning for autonomous driving (Bennajeh et al., 2019) , sparsity-regularized multi-agent meta-learning (Poon & Peyré, 2021) , and rank-constrained decentralized matrix completion for recommender systems (Pochmann & Von Zuben, 2022) , etc. As its name suggests, a defining feature of constrained decentralized bilevel optimization is "decentralized," which implies that the problem needs to be solved over a network without any coordination from a centralized server. As a result, all agents must rely on communications to reach a consensus on an optimal solution. Due to the potentially unreliable network connections and the limited computation capability at each agent, such network-consensus approaches for constrained decentralized bilevel optimization typically call for low sample and communication complexities. To date, however, none of the existing works on sample-and communication-efficient decentralized bilevel optimization in the literature considered domain constraints (e.g., Gao et al. (2022); Yang et al. (2022); Lu et al. (2022); Chen et al. (2022b) and Section 2 for detailed discussions). In light of the growing importance of constrained decentralized bilevel optimization, our goal in this paper is to fill this gap by developing sample-and communication-efficient consensus-based algorithms that can effectively handle domains constraints. Specifically, this paper focuses on a class of constrained decentralized multi-task bilevel optimization problems, where we aim to solve a decentralized nonconvex-strongly-convex bilevel optimization problem with i) multiple lower-level problems and ii) consensus and domain constrains on the upper level. Such problems naturally arise in security-constrained bi-level model for integrated natural gas and electricity system (Li et al., 2017) , multi-agent actor-critic reinforcement learning (Zhang et al., 2020) and constraint meta-learning (Liu et al., 2019) . In the optimization literature, a natural approach for handling domain constraints is the proximal operator. However, as will be shown later, proximal algorithm design and theoretical analysis for constrained decentralized bilevel optimization problems is much more complicated than those of unconstrained counterparts and the results are very limited. In fact, in the literature, the proximal operator for constrained bilevel optimization has been under-explored even in the single-agent setting, not to mention the more complex multi-agent settings. The most related works in terms of handling domain constraints can be found in (Hong et al., 2020; Chen et al., 2022a; Ghadimi & Wang, 2018) , which rely on direct projected (stochastic) gradient descent to solve the constrained bilevel problem. In contrast, our work considers general domain constraints that require evaluation of proximal operators in each iteration. Also, these works only considered the single-agent setting, and hence their techniques are not implementable over networks. Actually, up until this work, it is unclear how to design proximal algorithms to handle domain constraints for decentralized bilevel optimization. Moreover, it is worth noting that existing methods for hyper-gradient approximation in both single-and multi-agent bilevel optimization are either based on first-order Taylor-type approximation approaches (Khanduri et al., 2021; Ghadimi & Wang, 2018; Hong et al., 2020) , implicit differentiation (Ghadimi & Wang, 2018; Gould et al., 2016; Ji et al., 2021) , or iterative differentiation (Franceschi et al., 2017; Maclaurin et al., 2015; Ji et al., 2021) , all of which suffer from high communication and sample complexities that are problematic in decentralized settings over networks. The main contribution of this paper is that we propose a series of new proximal-type algorithmic techniques to overcome the challenges mentioned above and achieve low sample and communication complexities for constrained decentralized bilevel optimization problem. The main technical contributions of this work are summarized below: • We propose a decentralized optimization approach called Prometheus (proximal tracked stochastic recursive estimator), which is a cleverly designed hybrid algorithm that integrates proximal operations, recursive variance reduction, lower-level gradient tracking, and upper-level consensus techniques. We show that, to achieve an -stationary point, Prometheus enjoys a convergence rate of O(1/T ), where T is the maximum number of iterations. This implies O( -1 ) communication complexity and O( √ nK -1 + n) sample complexity per agent. • We propose a new hyper-gradient estimator for the upper-level function, which leads to a far more accurate stochastic estimation than the conventional stochastic estimator used in (Khanduri et al., 2021; Ghadimi & Wang, 2018; Hong et al., 2020; Liu et al., 2022) . We show that our new hyper-gradient stochastic estimator has a smaller variance and outperforms existing estimators both theoretically and experimentally. We note that our proposed estimator could be of independent interest for other bilevel optimization problems. • We reveal an interesting insight that the variance reduction in Prometheus is not only sufficient but also necessary in the following sense: a "non-variance-reduced" special version of Prometheus could only achieve a much slower O(1/ √ T ) convergence to a constant error-ball rather than an -stationary point with arbitrarily small -tolerance. This insight advances our understanding and state of the art of algorithm design for constrained decentralized bilevel optimization. The rest of the paper is organized as follows. In Section 2, we review related literature. In Section 3, we provide the preliminaries of the decentralized bilevel optimization problem. In Section 4, we provide details on our proposed Prometheus algorithm, including the convergence rate, communication complexity, and sample complexity results. Section 5 provides numerical results to verify our theoretical findings and Section 6 concludes this paper.

2. RELATED WORK

In this section, we first provide a quick overview of the state-of-the-art on single-agent constrained bilevel optimization as well as decentralized bilevel optimization. 1) Constrained Bilevel Optimization in the Single-Agent Setting: As mentioned in Section 1, various techniques have been proposed to solve single-agent bilevel optimization, such as utilizing full-gradient-based techniques (e.g., AID-based methods (Rajeswaran et al., 2019; Franceschi et al., 2018; Ji et al., 2021) , ITD-based methods (Pedregosa, 2016; Maclaurin et al., 2015; Ji et al., 2021) ), stochastic gradient-based techniques (Ghadimi & Wang, 2018; Khanduri et al., 2021; Guo & Yang, 2021) , STORM-based techniques (Cutkosky & Orabona, 2019) , and VR-based techniques (Yang

