ADAPTIVE PERSONALIZED FEDERATED LEARNING

Abstract

Investigation of the degree of personalization in federated learning algorithms has shown that only maximizing the performance of the global model will confine the capacity of the local models to personalize. In this paper, we advocate an adaptive personalized federated learning (APFL) algorithm, where each client will train their local models while contributing to the global model. We derive the generalization bound of mixture of local and global models, and find the optimal mixing parameter. We also propose a communication-efficient optimization method to collaboratively learn the personalized models and analyze its convergence in both smooth strongly convex and nonconvex settings. The extensive experiments demonstrate the effectiveness of our personalization schema, as well as the correctness of established generalization theories.

1. INTRODUCTION

With the massive amount of data generated by the proliferation of mobile devices and the internet of things (IoT), coupled with concerns over sharing private information, collaborative machine learning and the use of federated optimization (FO) is often crucial for the deployment of large-scale machine learning (McMahan et al., 2017; Kairouz et al., 2019; Li et al., 2020b) . In FO, the ultimate goal is to learn a global model that achieves uniformly good performance over almost all participating clients without sharing raw data. To achieve this goal, most of the existing methods pursue the following procedure to learn a global model: (i) a subset of clients participating in the training is chosen at each round and receive the current copy of the global model; (ii) each chosen client updates the local version of the global model using its own local data, (iii) the server aggregates over the obtained local models to update the global model, and this process continues until convergence (McMahan et al., 2017; Mohri et al., 2019; Karimireddy et al., 2019; Pillutla et al., 2019) . Most notably, FedAvg by McMahan et al. (2017) uses averaging as its aggregation method over local models. Due to inherent diversity among local data shards and highly non-IID distribution of the data across clients, Fe-dAvg is hugely sensitive to its hyperparameters, and as a result, does not benefit from a favorable convergence guarantee (Li et al., 2020c) . In Karimireddy et al. (2019) , authors argue that if these hyperparameters are not carefully tuned, it will result in the divergence of FedAvg, as local models may drift significantly from each other. Therefore, in the presence of statistical data heterogeneity, the global model might not generalize well on the local data of each client individually (Jiang et al., 2019) . This is even more crucial in fairness-critical systems such as medical diagnosis (Li & Wang, 2019) , where poor performance on local clients could result in damaging consequences. This problem is exacerbated even further as the diversity among local data of different clients is growing. To better illustrate this fact, we ran a simple experiment on MNIST dataset where each client's local training data is sampled from a subset of classes to simulate heterogeneity. Obviously, when each client has samples from less number of classes of training data, the heterogeneity among them will be high and if each of them has samples from all



Figure 1: Comparing generalization and training losses of our proposed personalized model with the global models of FedAvg and SCAFFOLD by increasing the diversity among the data of clients on MNIST dataset with a logistic regression model.

