PFEDKT: PERSONALIZED FEDERATED LEARNING VIA KNOWLEDGE TRANSFER

Abstract

Federated learning (FL) has been widely studied as a new paradigm to achieve multi-party collaborative modelling on decentralized data with privacy protection. Unfortunately, traditional horizontal FL suffers from Non-IID data distribution, where clients' private models after FL are even inferior to models trained standalone. To tackle this challenge, most existing approaches focus on personalized federated learning (PFL) to improve personalized private models but present limited accuracy improvements. To this end, we design pFedKT, a novel personalized federated learning framework with private and global knowledge transfer, towards boosting the performances of personalized private models on Non-IID data. It involves two types of knowledge transfer: a) transferring historical private knowledge to new private models by local hypernetworks; b) transferring the global model's knowledge to private models through contrastive learning. After absorbing the historical private knowledge and the latest global knowledge, the personalization and generalization of private models are both enhanced. Besides, we derive pFedKT's generalization and prove its convergence theoretically. Extensive experiments verify that pFedKT presents 1.38% -1.62% accuracy improvements of private models compared with the state-of-the-art baseline.

1. INTRODUCTION

With frequent privacy leakage, directly collecting data and modelling it would violate privacy protection regulations such as GDPR (Kairouz & et al., 2021) . To implement collaborative modelling while protecting data privacy, horizontal federated learning (FL) came into being (McMahan & et al, 2017) . As shown in Fig. 1 (a), FL consists of a central server and multiple clients. In each communication round, the server broadcasts the global model (abbr. GM) to selected clients; then clients train it locally on their local datasets and upload trained private models (abbr. PMs) to the server; finally, the server aggregates received private models to update the global model. The whole procedure is repeated until the global model converges. In short, FL fulfils collaborative modelling by allowing clients to only communicate model updates with the server, while data is always stored locally. However, FL still faces several challenges such as communication efficiency, robustness to attacks, and model accuracy which we focus on in this work. The motivation for clients to participate in FL is to improve their local models' quality. However, the decentralized data held by clients are often not independent and identically distributed (Non-IID) (Kairouz & et al., 2021) , and the global model aggregated through a typical FL algorithm FedAvg (McMahan & et al, 2017) based on Non-IID data may perform worse than clients' solely trained models. Zhao & et al (2018) To further improve personalized private models on Non-IID data, we propose a novel personalized FL framework named pFedKT with two types of transferred knowledge: 1) private knowledge: we deploy a local hypernetwork for each client to transfer historical PMs' knowledge to new PMs; 2) global knowledge: we exploit contrastive learning to enable PMs to absorb the GM's knowledge. We analyzed pFedKT's generalization and proved its convergence theoretically. We also conducted extensive experiments to verify that pFedKT fulfils the state-of-the-art PM's accuracy.

Contributions.

Our main contributions are summarized as follows: a) We devised two types of knowledge transfer to simultaneously enhance the generalization and personalization of private models. b) We analyzed pFedKT's generalization and convergence in theory. c) Extensive experiments verified the superiority of pFedKT on the accuracy of personalized private models. 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0 1 7 0 1 8 0 1 9 0 2 0 

2. RELATED WORK

Recent personalized federated learning (PFL) approaches include: a) Fine-tuning, in FL's last round, clients fine-tune the received GM on local data to get PMs (Wang & et al, 2019; Mansour & et al, 2020) . b) Federated meta-learning, some methods apply meta-learning in FL, such as MAML-based distributed variants (Li & et al, 2017; Fallah & et al, 2020b; a) . c) Federated multi-task learning, it treats each client as a learning task, e.g., MOCHA (Smith & et al, 2017 ), FedU (Dinh & et al, 2021) . d) Model mixup, the PM's parameters are split into two parts, only one part is shared through the server and another is trained locally, as in FedPer (Arivazhagan & et al, 2019 ), FedFu (Yao et al., 2019 ), FLDA (Peterson et al., 2019) , LG-FEDAVG (Liang & et al, 2020) , MAPPER (Mansour et al., 2020 ), FedRep (Collins & et al, 2021 ), pFedGP (Achituve & et al, 2021) , (Sun & et al, 2021) . e) Aggregation Delay, RADFed Xue et al. ( 2021) proposed redistribution rounds that delay aggregation to alleviate the negative impacts on model performance due to Non-IID data. f) Federated clustering, the server clusters PMs with similar parameter distributions and performs aggregation within clusters, e.g., HYPCLUSTER (Mansour & et al, 2020 ), ClusterFL (Ouyang & et al, 2021) , CFL (Agrawal & et al, 2021) . g) Local aggregation, instead of aggregation within the server's clustered groups, FedFOMO (Zhang & et al, 2021) makes each client pull other clients' PMs and selects more beneficial ones for local aggregation to update its own PMs. h) Knowledge distillation-based, FedPHP (Li & et al., 2021) linearly accumulates historical PMs and new trained PMs to teach the received GM through knowledge distillation in each round of FL. FML (Shen et al., 2020) makes each client's PM interact with the GM through mutual learning. KT-pFL (Zhang et al., 2021) allocates a public dataset to each client, and only logits computed on the public dataset are shared through the server. i) Contrastive learning-based, MOON (Li et al., 2021) utilizes contrastive learning to make PMs close to the GM, towards obtaining a better GM. j) Hypernetwork-based, pFedHN Shamsian & et al (2021) 

3.1. UTILITY OF PRIVATE MODELS

As the workflow of FedAvg shown in Fig. 1 (a), we abbreviate the private model as PM and the global model as GM, and the detailed definition of FL is introduced in Appendix A. It's worth noting that: in FedAvg, 1) clients no longer store PMs after uploading them to the server; 2) in the next round, clients regard the received GM as PM and then train PM on local datasets, i.e., the trained PMs only play as "temporary models" for aggregation and their utilities are not sufficiently developed. To explore the utilities of PMs, we train a CNN model on a natural Non-IID FEMINIST dataset in an FL system with 20 clients. From Fig. 1 (b), we observe that there are always some PMs performing better than GM in each round (some PMs show lighter pixels than GM), so we can further develop PMs' self-utility during FL to boost the accuracy of personalized private models.



have verified this fact experimentally and argued that the global model aggregated by skewed local models trained on Non-IID data deviates from the optima (model trained on all local data). To alleviate the accuracy degradation caused by Non-IID data, personalized FL (PFL) methods (Shamsian & et al, 2021) have been widely studied to improve clients' personalized model quality. Existing researches implement PFL by fine-tuning Mansour & et al (2020); Wang & et al (2019), model mixup Arivazhagan & et al (2019); Collins & et al (2021) and etc. But they suffer from limited improvements in the accuracy of private models.

Figure 1: (a): Workflow of FedAvg in the t-th round. (b): The test accuracy of the GM and 20 PMs are recorded per 10 rounds. Since the server has no data, we evaluate the test accuracy of the GM on clients' test datasets and calculate mean test accuracy as the GM's accuracy. We evaluate the test accuracy of a local model P M after local training on its local test data as the PM's accuracy.

deploys a global hypernetwork on the server to learn PMs' parameter distributions and generate personalized parameters for PMs. The latest work Fed-RoD (Chen & Chao, 2022) trains private personalized headers with parameters generated by local hypernetworks. It improves both the GM and PMs, but extra communication cost incurs by communicating hypernetworks.

