PERSONALIZED FEDERATED LEARNING WITH FIRST ORDER MODEL OPTIMIZATION

Abstract

While federated learning traditionally aims to train a single global model across decentralized local datasets, one model may not always be ideal for all participating clients. Here we propose an alternative, where each client only federates with other relevant clients to obtain a stronger model per client-specific objectives. To achieve this personalization, rather than computing a single model average with constant weights for the entire federation as in traditional FL, we efficiently calculate optimal weighted model combinations for each client, based on figuring out how much a client can benefit from another's model. We do not assume knowledge of any underlying data distributions or client similarities, and allow each client to optimize for arbitrary target distributions of interest, enabling greater flexibility for personalization. We evaluate and characterize our method on a variety of federated settings, datasets, and degrees of local data heterogeneity. Our method outperforms existing alternatives, while also enabling new features for personalized FL such as transfer outside of local data distributions.

1. INTRODUCTION

Federated learning (FL) has shown great promise in recent years for training a single global model over decentralized data. While seminally motivated by effective inference on a general test set similar in distribution to the decentralized data in aggregate (McMahan et al., 2016; Bonawitz et al., 2019) , here we focus on federated learning from a client-centric or personalized perspective. We aim to enable stronger performance on personalized target distributions for each participating client. Such settings can be motivated by cross-silo FL, where clients are autonomous data vendors (e.g. hospitals managing patient data, or corporations carrying customer information) that wish to collaborate without sharing private data (Kairouz et al., 2019) . Instead of merely being a source of data and model training for the global server, clients can then take on a more active role: their federated participation may be contingent on satisfying client-specific target tasks and distributions. A strong FL framework in practice would then flexibly accommodate these objectives, allowing clients to optimize for arbitrary distributions simultaneously in a single federation. In this setting, FL's realistic lack of an independent and identically distributed (IID) data assumption across clients may be both a burden and a blessing. Learning a single global model across non-IID data batches can pose challenges such as non-guaranteed convergence and model parameter divergence (Hsieh et al., 2019; Zhao et al., 2018; Li et al., 2020) . Furthermore, trying to fine-tune these global models may result in poor adaptation to local client test sets (Jiang et al., 2019) . However, the non-IID nature of each client's local data can also provide useful signal for distinguishing their underlying local data distributions, without sharing any data. We leverage this signal to propose a new framework for personalized FL. Instead of giving all clients the same global model average weighted by local training size as in prior work (McMahan et al., 2016) , for each client we compute

