COMMUNICATION-EFFICIENT AND DRIFT-ROBUST FEDERATED LEARNING VIA ELASTIC NET

Abstract

Federated learning (FL) is a distributed method to train a global model over a set of local clients while keeping data localized. It reduces the risks of privacy and security but faces important challenges including expensive communication costs and client drift issues. To address these issues, we propose FedElasticNet, a communicationefficient and drift-robust FL framework leveraging the elastic net. It repurposes two types of the elastic net regularizers (i.e., ℓ 1 and ℓ 2 penalties on the local model updates): (1) the ℓ 1 -norm regularizer sparsifies the local updates to reduce the communication costs and (2) the ℓ 2 -norm regularizer resolves the client drift problem by limiting the impact of drifting local updates due to data heterogeneity. FedElasticNet is a general framework for FL; hence, without additional costs, it can be integrated into prior FL techniques, e.g., FedAvg, FedProx, SCAFFOLD, and FedDyn. We show that our framework effectively resolves the communication cost and client drift problems simultaneously.

1. INTRODUCTION

Federated learning (FL) is a collaborative method that allows many clients to contribute individually to training a global model by sharing local models rather than private data. Each client has a local training dataset, which it does not want to share with the global server. Instead, each client computes an update to the current global model maintained by the server, and only this update is communicated. FL significantly reduces the risks of privacy and security (McMahan et al., 2017; Li et al., 2020a) , but it faces crucial challenges that make the federated settings distinct from other classical problems (Li et al., 2020a) such as expensive communication costs and client drift problems due to heterogeneous local training datasets and heterogeneous systems (McMahan et al., 2017; Li et al., 2020a; Konečnỳ et al., 2016a; b) . Communicating models is a critical bottleneck in FL, in particular when the federated network comprises a massive number of devices (Bonawitz et al., 2019; Li et al., 2020a; Konečnỳ et al., 2016b) . In such a scenario, communication in the federated network may take a longer time than that of local computation by many orders of magnitude because of limited communication bandwidth and device power (Li et al., 2020a) . To reduce such communication cost, several strategies have been proposed (Konečnỳ et al., 2016b; Li et al., 2020a) . In particular, Konečnỳ et al. (2016b) proposed several methods to form structured local updates and approximate them, e.g., subsampling and quantization. Reisizadeh et al. ( 2020); Xu et al. ( 2020) also proposed an efficient quantization method for FL to reduce the communication cost. Also, in general, as the datasets that local clients own are heterogeneous, trained models on each local data are inconsistent with the global model that minimizes the global empirical loss (Karimireddy et al., 2020; Malinovskiy et al., 2020; Acar et al., 2021) . This issue is referred to as the client drift problem. In order to resolve the client drift problem, FedProx (Li et al., 2020b) et al., 2018; Li et al., 2020a) . FedProx (Li et al., 2020b) addressed the data heterogeneity problem. FedProx introduces an ℓ 2 -norm regularizer to the local objective functions to penalize local updates that are far from the server's model and thus to limit the impact of variable local updates (Li et al., 2020b 



Zou & Hastie (2005) proposed the elastic net to encourage the grouping effect, in other words, to encourage strongly correlated covariates to be in or out of the model description together(Hu et al., 2018). Initially, the elastic net was proposed to overcome the limitations of Lasso(Tibshirani, 1996) imposing an ℓ 1 -norm penalty on the model parameters. For instance of a linear least square problem,



added a proximal term to a local objective function and regulated local model updates. Karimireddy et al. (2020) proposed SCAFFOLD algorithm that transfers both model updates and control variates to resolve the client drift problem. FedDyn (Acar et al., 2021) dynamically regularizes local objective functions to resolve the client drift problem. Unlike most prior works focusing on either the communication cost problem or the client drift problem, we propose a technique that effectively resolves the communication cost and client drift problems simultaneously. Comparison of prior methods and the proposed FedElasticNet. Contributions In this paper, we propose FedElasticNet, a new framework for communicationefficient and drift-robust FL. It repurposes the ℓ 1 -norm and ℓ 2 -norm regularizers of the elastic net (Zou & Hastie, 2005), by which it successfully improves (i) communication efficiency by adopting the ℓ 1 -norm regularizer and (ii) robustness to heterogeneous local datasets by adopting the ℓ 2 -norm regularizer.FedElasticNet is a general framework; hence, it can be integrated with prior FL algorithms such as FedAvg(McMahan et al., 2017), FedProx (Li et al., 2020b),SCAFFOLD (Karimireddy et al.,  2020), and FedDyn (Acar et al., 2021) so as to resolve the client drift problem as well as the communication cost problem. Further, it incurs no additional costs in training. Empirically, we show that FedElasticNet enhances communication efficiency while maintaining the classification accuracy even for heterogeneous datasets, i.e., the client drift problem is resolved. Theoretically, we characterize the impact of the regularizer terms. Table1compares the prior methods and the proposed FedElasticNet if integrated with FedDyn (Algorithm 3).McMahan et al., 2017)  is one of the most commonly used methods. FedAvg tackles the communication bottleneck issue by performing multiple local updates before communicating to the server. It works well for homogeneous datasets across clients(McMahan et al., 2017; Karimireddy  et al., 2020), but it is known that FedAvg may diverge when local datasets are heterogeneous (Zhao

). Although FedProx is more robust to heterogeneous datasets than FedAvg, the regularizer does not result in aligning the global and local stationary points(Acar et al., 2021). Also, we note that FedProx does not improve communication efficiency compared to that of FedAvg. SCAFFOLD(Karimireddy et al., 2020)  defined client drift that the model created by aggregating local models and the optimal global model is inconsistent because of heterogeneous local datasets. SCAFFOLD communicates the trained local models and the clients' control variates so as to resolve the client drift problem. Hence, SCAFFOLD requires twice the communication cost compared to other FL algorithms.FedDyn(Acar et al., 2021)  dynamically updates its local regularizers at each round to ensure that the local clients' optima are asymptotically consistent with stationary points of the global empirical loss. Unlike SCAFFOLD, FedDyn resolves the client drift problem without incurring additional communication costs. However, FedDyn's communication cost is not improved compared to FedAvg and FedProx.

