FIND YOUR FRIENDS: PERSONALIZED FEDERATED LEARNING WITH THE RIGHT COLLABORATORS

Abstract

In the traditional federated learning setting, a central server coordinates a network of clients to train one global model. However, the global model may serve many clients poorly due to data heterogeneity. Moreover, there may not exist a trusted central party that can coordinate the clients to ensure that each of them can benefit from others. To address these concerns, we present a novel decentralized framework, FedeRiCo, where each client can learn as much or as little from other clients as is optimal for its local data distribution. Based on expectationmaximization, FedeRiCo estimates the utilities of other participants' models on each client's data so that everyone can select the right collaborators for learning. As a result, our algorithm outperforms other federated, personalized, and/or decentralized approaches on several benchmark datasets, being the only approach that consistently performs better than training with local data only.

1. INTRODUCTION

Federated learning (FL) (McMahan et al., 2017) offers a framework in which a single server-side model is collaboratively trained across decentralized datasets held by clients. It has been successfully deployed in practice for developing machine learning models without direct access to user data, which is essential in highly regulated industries such as banking and healthcare (Long et al., 2020; Sadilek et al., 2021) . For example, several hospitals that each collect patient data may want to merge their datasets for increased diversity and dataset size but are prohibited due to privacy regulations. where the x-axis and y-axis correspond to input and output respectively. The corresponding model learned by FedAvg (dotted line) fails to adapt to the local data seen by each client, in contrast to the models learned by each client using our FedeRiCo (dashed lines). Right: The weights used by FedeRiCo to average participant outputs for each client. As the client index increases, the data is generated from successive intervals of the sine curve, and collaborator weights change accordingly. Traditional FL methods like Federated Averaging (FedAvg) (McMahan et al., 2017) can achieve noticeable improvement over local training when the participating clients' data are homogeneous. However, each client's data is likely to have a different distribution from others in practice (Zhao et al., 2018; Adnan et al., 2022) . Such differences make it much more challenging to learn a global model that works well for all participants. As an illustrative example, consider a simple scenario where each client seeks to fit a linear model to limited data, on an interval of the sine curve as shown in Fig. 1 . This is analogous to the FL setting where several participating clients would like to collaborate, but each client only has access to data from its own data distribution. It is clear that no single linear model can be adequate to describe the entire joint dataset, so a global model learned by FedAvg can perform poorly, as shown by the dotted line. Ideally, each client should benefit from collaboration by increasing the effective size and diversity of data, but in practice, forcing everyone to use the same global model without proper personalization can hurt performance on their own data distribution (Kulkarni et al., 2020; Tan et al., 2022) . To address this, we propose Federating with the Right Collaborators (FedeRiCo), a novel framework suitable for every client to find other participants with similar data distributions to collaborate with. Back to our illustration in Fig. 1 . FedeRiCo enables each client to choose the right collaborators as shown on the plots on the right-hand side: each client is able to correctly leverage information from the neighboring clients when it is beneficial to do so. The final personalized models can serve the local distributions well, as demonstrated in the left plot. More specifically, our FedeRiCo assumes that each client has an underlying data distribution, and exploits the hidden relationship among the clients' data. By selecting the most relevant clients, each client can collaborate as much or as little as they need, and learn a personalized mixture model to fit the local data. Additionally, FedeRiCo achieves this in a fully decentralized manner that is not beholden to any central authority (Li et al., 2021a; Huang et al., 2021; Kalra et al., 2021) . Our contributions We propose FedeRiCo, a novel decentralized and personalized FL framework derived based on expectation-maximization (EM). Within this framework, we propose a communication-efficient protocol suitable for fully-decentralized learning. Through extensive experiments on several benchmark datasets, we demonstrate that our approach finds good client collaboration and outperforms other methods in the non-i.i.d. data distributions setting. Paper outline The rest of the paper is organized as follows. In Section 2 we discuss related approaches towards decentralized federated learning and personalization. Section 3 describes our algorithm formulation and its relationship to expectation-maximization, and an efficient protocol for updating clients. We provide experimental results in Section 4, and conclude in Section 5.

2. RELATED WORK FOR PERSONALIZED FL

Meta-learning Federated learning can be interpreted as a meta-learning problem, where the goal is to extract a global meta-model based on data from several clients. This meta-model can be learned using, for instance, the well-known Federated Averaging (FedAvg) algorithm (McMahan et al., 2017) , and personalization can then be achieved by locally fine-tuning the meta-model (Jiang et al., 2019) . Later studies explored methods to learn improved meta-models. 



Figure 1: Left: Noisy data points generated for each client along a sine curve (solid magenta line)where the x-axis and y-axis correspond to input and output respectively. The corresponding model learned by FedAvg (dotted line) fails to adapt to the local data seen by each client, in contrast to the models learned by each client using our FedeRiCo (dashed lines). Right: The weights used by FedeRiCo to average participant outputs for each client. As the client index increases, the data is generated from successive intervals of the sine curve, and collaborator weights change accordingly.

Khodak et al. (2019)  proposed ARUBA, a meta-learning algorithm based on online convex optimization, and demonstrates that it can improve upon FedAvg's performance.Per-FedAvg (Fallah et al., 2020)  uses the Model Agnostic Meta-Learning (MAML) framework to build the initial meta-model. However, MAML requires computing or approximating the Hessian term and can therefore be computationally prohibitive. Acar et al. (2021) adopted gradient correction methods to explicitly de-bias the meta-model from the statistical heterogeneity of client data and achieved sample-efficient customization of the meta-model.Model regularization / interpolationSeveral works improve personalization performance by regularizing the divergence between the global and local models (Hanzely & Richtárik, 2020; Li et al., 2021b; Huang et al., 2021). Similarly, PFedMe (T Dinh et al., 2020) formulates personalization as a proximal regularization problem using Moreau envelopes. FML (Shen et al., 2020) adopts knowledge distillation to regularize the predictions between local and global models and handle model heterogeneity. In recent work, SFL (Chen et al., 2022) also formulates the personalization as a bi-level optimization problem with an additional regularization term on the distance between local models and its neighbor models according to a connection graph. Specifically, SFL adopts GCN to represent the connection graph and learns the graph as part of the optimization to encourage useful client collaborations. Introduced by Mansour et al. (2020) as one of the three methods for achieving personalization in FL, model interpolation involves mixing a client's local model with a jointly

