DELTA: DIVERSE CLIENT SAMPLING FOR FASTING FEDERATED LEARNING

Abstract

Partial client participation has been widely adopted in Federated Learning (FL) to efficiently reduce the communication burden. However, an improper client sampling scheme will select unrepresentative subsets, which will cause a large variance in the model update and slows down the convergence. Existing sampling methods are either biased or can be further improved to accelerate the convergence. In this paper, we propose an unbiased sampling scheme, termed DELTA, to alleviate this problem. In particular, DELTA characterizes the impact of client diversity and local variance and samples the representative clients who carry valuable information for global model updates. Moreover, DELTA is a provably optimal unbiased sampling scheme that minimizes the variance caused by partial client participation and achieves better convergence than other unbiased sampling schemes. We corroborate our results with experiments on both synthetic and real data sets.

1. INTRODUCTION

Federated Learning (FL) has recently emerged as a critical distributed learning paradigm where a number of clients collaborate with a central server to train a model. Edge clients finish the update locally without any data sharing, thus preserving client privacy. Communication can become the primary bottleneck of FL since edge devices have limited bandwidth and connection availability (Wang et al., 2021) . In order to reduce the communication burden, only a portion of clients will be chosen for training in practice. However, an improper client sampling strategy, such as uniform client sampling adopted in FedAvg (McMahan et al., 2017) , might exacerbate the issues of data heterogeneity in FL, as the randomly-selected unrepresentative subsets can increase the variance introduced by client sampling and directly slow down the convergence. Existing sampling strategies can usually be categorized into two classes: biased and unbiased. Considering the crucial unbiased client sampling that may preserve the optimization objective, only a few strategies are proposed, e.g., in terms of multinomial distribution (MD) sampling and cluster sampling, including clustering based on sample size and clustering based on similarity methods. However, these sampling methods usually suffer from a slow convergence with large variance and computation overhead problems (Balakrishnan et al., 2021; Fraboni et al., 2021b) . To accelerate the convergence of FL with partial client participation, Importance Sampling (IS), another unbiased sampling strategy, is proposed in recent literature (Chen et al., 2020; Rizk et al., 2020) . IS will select clients with the large gradient norm, as shown in Fig 1(a) . As for another sampling method in Figure 1 (a), cluster-based IS will first cluster the clients according to the gradient norm and then use IS to select the clients with a large gradient norm within each cluster. Though IS, and cluster-based IS have their advantages, 1) IS suffers from learning inefficiency due to the transmission of excessive important yet similar updates from clients to the server. This problem has been pointed out in recent works (Fraboni et al., 2021a; Shen et al., 2022) , and some efforts are being conducted to solve this problem. One of them is cluster-based IS, which avoids redundant sampling of clients by first clustering similar clients into groups. Though clustering operation can somewhat alleviate this problem, 2) vanilla cluster-based IS does not work well because the high-dimensional gradient is too complicated to be a good clustering feature and can bring about poor clustering results, as pointed out by Shen et al. (2022) . In addition, clustering is known to be susceptible to biased performance if the samples are chosen from a group that is clustered based on a biased opinion, as shown in Sharma (2017); Thompson (1990) . From the above discussion, we know though IS and cluster-based IS have their own advantages in

