FEDREP: A BYZANTINE-ROBUST, COMMUNICATION-EFFICIENT AND PRIVACY-PRESERVING FRAMEWORK FOR FEDERATED LEARNING Anonymous authors Paper under double-blind review

Abstract

Federated learning (FL) has recently become a hot research topic, in which Byzantine robustness, communication efficiency and privacy preservation are three important aspects. However, the tension among these three aspects makes it hard to simultaneously take all of them into account. In view of this challenge, we theoretically analyze the conditions that a communication compression method should satisfy to be compatible with existing Byzantine-robust methods and privacy-preserving methods. Motivated by the analysis results, we propose a novel communication compression method called consensus sparsification (ConSpar). To the best of our knowledge, ConSpar is the first communication compression method that is designed to be compatible with both Byzantine-robust methods and privacypreserving methods. Based on ConSpar, we further propose a novel FL framework called FedREP, which is Byzantine-robust, communication-efficient and privacypreserving. We theoretically prove the Byzantine robustness and the convergence of FedREP. Empirical results show that FedREP can significantly outperform communication-efficient privacy-preserving baselines. Furthermore, compared with Byzantine-robust communication-efficient baselines, FedREP can achieve comparable accuracy with an extra advantage of privacy preservation.

1. INTRODUCTION

Federated learning (FL), in which participants (also called clients) collaborate to train a learning model while keeping data privately-owned, has recently become a hot research topic (Konevcnỳ et al., 2016; McMahan & Ramage, 2017) . Compared to traditional data-center based distributed learning (Haddadpour et al., 2019; Jaggi et al., 2014; Lee et al., 2017; Lian et al., 2017; Shamir et al., 2014; Sun et al., 2018; Yu et al., 2019a; Zhang & Kwok, 2014; Zhao et al., 2017; 2018; Zhou et al., 2018; Zinkevich et al., 2010) , service providers have less control over clients and the network is usually less stable with smaller bandwidth in FL applications. Furthermore, participants will also take the risk of privacy leakage in FL if privacy-preserving methods are not used. Consequently, Byzantine robustness, communication efficiency and privacy preservation have become three important aspects of FL methods (Kairouz et al., 2021) and have attracted much attention in recent years. Byzantine robustness. In FL applications, failure in clients or network transmission may not get discovered and resolved in time (Kairouz et al., 2021) . Moreover, some clients may get attacked by an adversarial party, sending incorrect or even harmful information purposely. The clients in failure or under attack are also called Byzantine clients. To obtain robustness against Byzantine clients, there are mainly three different ways, which are known as redundant computation, server validation and robust aggregation, respectively. Redundant computation methods (Chen et al., 2018; Konstantinidis & Ramamoorthy, 2021; Rajput et al., 2019) require different clients to compute gradients for the same training instances. These methods are mostly for traditional data-center based distributed learning, but unavailable in FL due to the privacy principle. In server validation methods (Xie et al., 2019b; 2020b) , server validates clients' updates based on a public dataset. However, the performance of server validation methods depends on the quantity and quality of training instances. In many scenarios, it is hard to obtain a large-scale high-quality public dataset. The third way is to replace the mean aggregation on server with robust aggregation (Alistarh et al., 2018; Bernstein et al., 2019; Blanchard et al., 2017; Chen et al., 2017; Ghosh et al., 2020; Karimireddy et al., 2021; Li et al Communication efficiency. In many FL applications, server and clients are connected by wide area network (WAN), which is usually less stable and has smaller bandwidth than the network in traditional data-center based distributed machine learning. Therefore, communication cost should also be taken into consideration. Local updating technique (Konevcnỳ et al., 2016; McMahan et al., 2017; Yu et al., 2019b; Zhao et al., 2017; 2018) , where clients locally update models for several iterations before global aggregation, is widely used in FL methods. Communication cost can also be reduced by communication compression techniques, which mainly include quantization (Alistarh et al., 2017; Faghri et al., 2020; Gandikota et al., 2021; Safaryan & Richtárik, 2021; Seide et al., 2014; Wen et al., 2017 ), sparsification (Aji & Heafield, 2017; Chen et al., 2020; Stich et al., 2018; Wangni et al., 2018) and sketching 1 (Rothchild et al., 2020). Error compensation (also known as error feedback) technique (Gorbunov et al., 2020; Wu et al., 2018; Xie et al., 2020c) is proposed to alleviate the accuracy decrease for communication compression methods. Moreover, different techniques can be combined to further reduce communication cost (Basu et al., 2020; Lin et al., 2018) . Privacy preservation. Most of the existing FL methods send gradients or model parameters during training process while keeping data decentralized due to the privacy principle. However, sending gradients or model parameters may also cause privacy leakage problems (Kairouz et al., 2021; Zhu et al., 2019) . Random noise is used to hide the true input values in some privacy-preserving techniques such as differential privacy (DP) (Abadi et al., 2016; Jayaraman et al., 2018; McMahan et al., 2018) and sketching (Liu et al., 2019; Zhang & Wang, 2021) . Secure aggregation (SecAgg) (Bonawitz et al., 2017; Choi et al., 2020) is proposed to ensure the privacy of computation. Based on secure multiparty computation (MPC) and Shamir's t-out-of-n secret sharing (Shamir, 1979) , SecAgg allows server to obtain only the average value for global model updating without knowing each client's local model parameters (or gradients). Since noises can be simply added to stochastic gradients in most of the exsiting FL methods to provide input privacy, we mainly focus on how to combine SecAgg with Byzantine-robust and communication-efficient methods in this work. There are also some methods that consider two of the three aspects (Byzantine robustness, communication efficiency and privacy preservation), including RCGD (Ghosh et al., 2021) , F 2 ed-Learning (Wang et al., 2020) , SHARE (Velicheti et al., 2021) and SparseSecAgg (Ergun et al., 2021) , which we summarize in Table 1 . However, the tension among these three aspects makes it hard to simultaneously take all of the three aspects into account. In view of this challenge, we theoretically analyze the tension among Byzantine robustness, communication efficiency and privacy preservation, and propose a novel FL framework called FedREP. The main contributions are listed as follows: • We theoretically analyze the conditions that a communication compression method should satisfy to be compatible with Byzantine-robust methods and privacy-preserving methods. Motivated by the analysis results, we propose a novel communication compression method called consensus sparsification (ConSpar). To the best of our knowledge, ConSpar is the first communication compression method that is designed to be compatible with both Byzantine-robust methods and privacy-preserving methods. • Based on ConSpar, we further propose a novel FL framework called FedREP, which is Byzantine-robust, communication-efficient and privacy-preserving.



Sketching technique can be used in different ways for reducing communication cost or protecting privacy. Thus, sketching appears in both communication-efficient methods and privacy-preserving methods.



., Comparison among different methods in terms of the three aspects of federated learningSohn et al., 2020; Yin et al., 2018; 2019). Compared to redundant computation and server validation, robust aggregation usually has a wider scope of application. Many Byzantine-robust FL methods(Wang et al., 2020; Xie et al., 2019a) take this way.

