PERSONALIZED DECENTRALIZED BILEVEL OPTIMIZA-TION OVER STOCHASTIC AND DIRECTED NETWORKS

Abstract

While personalization in distributed learning has been extensively studied, existing approaches employ dedicated algorithms to optimize their specific type of parameters (e.g., client clusters or model interpolation weights), making it difficult to simultaneously optimize different types of parameters to yield better performance. Moreover, their algorithms require centralized or static undirected communication networks, which can be vulnerable to center-point failures or deadlocks. This study proposes optimizing various types of parameters using a single algorithm that runs on more practical communication environments. First, we propose a gradient-based bilevel optimization that reduces most personalization approaches to the optimization of client-wise hyperparameters. Second, we propose a decentralized algorithm to estimate gradients with respect to the hyperparameters, which can run even on stochastic and directed communication networks. Our empirical results demonstrated that the gradient-based bilevel optimization enabled combining existing personalization approaches which led to state-of-the-art performance, confirming it can perform on multiple simulated communication environments including a stochastic and directed network.

1. INTRODUCTION

In distributed learning, providing personally tuned models to clients, or personalization, has shown to be effective when the clients' data are heterogeneously distributed (Tan et al., 2022) . While various approaches have been proposed, they are dedicated to optimizing specific types of parameters for personalization. A typical example is clustering-based personalization (Sattler et al., 2020) , which employs similarity-based clustering specifically for seeking client clusters. Another approach called model interpolation (Mansour et al., 2020; Deng et al., 2020) also specializes in optimizing interpolation weights between local and global models. These dedicated algorithms prevent developers from combining different personalization methods to achieve better performance. Another limitation of previous personalization algorithms is that they can run only on centralized or static undirected networks. Most approaches for federated learning (Smith et al., 2017; Sattler et al., 2020; Jiang et al., 2019) require centralized settings in which a host server can communicate with any client. Although a few studies (Lu et al., 2022; Marfoq et al., 2021) consider fully-decentralized settings, they assume that the communication edge between any clients is static and undirected (i.e., synchronized). These commutation networks are known to be vulnerable to practical issues, such as bottlenecks or central point failures on the host servers (Assran et al., 2019) , or failing nodes and deadlocks on the static undirected networks (Tsianos et al., 2012) . This study proposes optimizing various parameters for personalization using a single algorithm while allowing more practical communication environments. First, we propose a gradient-based Personalized Decentralized Bilevel Optimization (PDBO), which reduces many personalization approaches to the optimization of hyperparameters possessed by each client. Second, we propose Hyper-gradient Push (HGP) that allows any client to solve PDBO by estimating the gradient with respect to its hyperparameters (hyper-gradient) via stochastic and directed communications, that are immune to the practical problems of centralized or static undirected communications (Assran et al., 2019) . We also introduce a variance-reduced HGP to avoid estimation variance, which is particularly effective when communications are stochastic, providing its theoretical error bound.

