FEDPSE: PERSONALIZED SPARSIFICATION WITH ELEMENT-WISE AGGREGATION FOR FEDERATED LEARNING

Abstract

Federated learning (FL) is a popular distributed machine learning framework in which clients aggregate models' parameters instead of sharing their individual data. In FL, clients communicate with the server under limited network bandwidth frequently, which arises the communication challenge. To resolve this challenge, multiple compression methods have been proposed to reduce the transmitted parameters. However, these techniques show that the federated performance degrades significantly with Non-IID (non-identically independently distributed) datasets. To address this issue, we propose an effective method, called FedPSE, which solves the efficiency challenge of FL with heterogeneous data. FedPSE compresses the local updates on clients using Top-K sparsification and aggregates these updates on the server by element-wise average. Then clients download the personalized sparse updates from the server to update their individual local models. We then theoretically analyze the convergence of FedPSE under the non-convex setting. Moreover, extensive experiments on four benchmark tasks demonstrate that our FedPSE outperforms the state-of-the-art methods on Non-IID datasets in terms of both efficiency and accuracy.

1. INTRODUCTION

Federated learning (FL) is a prevailing distributed framework that can prevent sensitive data of clients from being disclosed (Kairouz et al., 2021; McMahan et al., 2017b) . The naive FL includes three steps: uploading clients' models to the server after local training, global aggregation, and downloading the aggregated model from the server. In practice, weight updates ∆W = W new -W old can be communicated instead of model weights W (Asad et al., 2021; Li et al., 2021a) . Recently, FL is increasingly applied in multiple tasks, such as computer vision, recommender systems, and medical diagnosis (Bibikar et al., 2021; Kairouz et al., 2021; Qayyum et al., 2020; Xu et al., 2021) .

1.1. EXISTING PROBLEM

Despite the aforementioned advantage, the communication cost of FL is overburdened by the fact that the server and clients exchange massive parameters frequently (Asad et al., 2021; Kairouz et al., 2021) . Furthermore, there usually is a limited upstream/downstream bandwidth between the server and clients, such as wireless connection in the cross-device (ToC) FL and dedicated network in the cross-silo (ToB) setting, which further decreases the communication efficiency (Li et al., 2021a; Sattler et al., 2019) . FL is much more time-consuming than traditional centralized machine learning, especially when the model parameters are massive under the cross-silo FL scenarios (Qayyum et al., 2020; Shi et al., 2020) . Therefore, it is necessary to optimize the bidirectional communication cost to minimize the training time of FL (Bernstein et al., 2018; Philippenko & Dieuleveut, 2021; Sattler et al., 2019; Wen et al., 2017) . In order to resolve the aforementioned challenge, various methods have been proposed, such as matrix decomposition (Li et al., 2021c; McMahan et al., 2017b ), quantization (Li et al., 2021a; Sattler et al., 2019), and sparsification (Gao et al., 2021; Mostafa & Wang, 2019; Wu et al., 2020; Yang et al., 2021b) . Although these novel algorithms can reduce the quantity of communicated information significantly, most of them can only work well 

1.2. SOLUTION

To bridge this gap, we propose a Personalized Sparsification with Element-wise aggregation for the cross-silo federated learning (FedPSE) paradigm, as shown in Figure 1 . For the first step of FedPSE, under the concern of efficiency and personalization, clients train their models with local datasets and upload the sparse updates to the server, as shown in Figure 1(a) . The kept indices of these compressed updates are probably different from each other due to the heterogeneity of clients' datasets. Secondly, we leverage element-wise averaging to aggregate the collected sparse updates on the server, which can relieve the bias of the traditional aggregation method, as shown in Figure 1(b) . Lastly, the server sparsifies the downstream parameters for each client in a personalized manner, as shown in Figure 1(c ). Especially, the downstream updates, transferred from the server to each client, also possess individual k elements to keep the overall compression ratio. Please see Section 4 for more details. To this end, FedPSE compresses both upstream and downstream communication overhead with personalization concerns.

1.3. CONTRIBUTION

We summarize our main contributions as follows: • We propose a novel personalized sparsification with an element-wise aggregation framework for FL, which resolves the bidirectional communication challenge on Non-IID datasets. • We propose an element-wise aggregation method, which can promote the performance of FL with sparse aggregated matrices. • We propose a downstream selection mechanism to personalize the clients' models, which adapts to various distributions and significantly increases the performance in the Non-IID setting. • We provide a convergence analysis of our method as well as extensive experiments on four benchmark datasets, and the results demonstrate that our proposed FedPSE outperforms the existing state-of-the-art FL framework on Non-IID datasets in terms of both efficiency and accuracy.

2. RELATED WORK

In this section, we briefly review optimization methods that focus on the core challenges in FL.

2.1. COMMUNICATION EFFICIENCY

Although FedAVG (McMahan et al., 2017a) , the naive federated algorithm, can decrease the communication cost by allowing multiple local steps, the massive transmitted parameters in one communication step are still a critical bottleneck. In general, there are three kinds of compression



Figure 1: The proposed framework of FedPSE.under the ideal condition with IID (identically and independently distributed) datasets(Li et al.,  2021c; Sattler et al., 2019; Wen et al., 2017). In fact, the isolated datasets in clients are usually heterogeneous, due to the reason that each dataset belongs to a particular client with a specific geographic location and time window of data collection Kairouz et al. (2021); Kulkarni et al. (2020); Xu & Huang (2022); Yang et al. (2021a). Hence, the current compression techniques, ignoring the personalization of clients, face a significant performance degradation on Non-IID datasets Liu et al. (2022); Sattler et al. (2019); Wu et al. (2020).

