D 2 P-FED: DIFFERENTIALLY PRIVATE FEDERATED LEARNING WITH EFFICIENT COMMUNICATION

Abstract

In this paper, we propose the discrete Gaussian based differentially private federated learning (D 2 P-FED), a unified scheme to achieve both differential privacy (DP) and communication efficiency in federated learning (FL). In particular, compared with the only prior work taking care of both aspects, D 2 P-FED provides stronger privacy guarantee, better composability and smaller communication cost. The key idea is to apply the discrete Gaussian noise to the private data transmission. We provide complete analysis of the privacy guarantee, communication cost and convergence rate of D 2 P-FED. We evaluated D 2 P-FED on INFIMNIST and CIFAR10. The results show that D 2 P-FED outperforms the-state-of-the-art by 4.7% to 13.0% in terms of model accuracy while saving one third of the communication cost. The results might be surprising at its first glance but is reasonable because the quantization level k in D 2 P-FED is independent of q. As long as q is large enough, the probability that the noise exceeds q is small and thus has negligible impact on the model accuracy.

1. INTRODUCTION

Federated learning (FL) is a popular machine learning paradigm that allows a central server to train models over decentralized data sources. In federated learning, each client performs training locally on their data source and only updates the model change to the server, which then updates the global model based on the aggregated local updates. Since the data stays locally, FL can provide better privacy protection than traditional centralized learning. However, FL is facing two main challenges: (1) FL lacks a rigorous privacy guarantee (e.g., differential privacy (DP)) and indeed, it has been shown to be vulnerable to various inference attacks (Nasr et al., 2019; Pustozerova & Mayer; Xie et al., 2019) ; (2) FL incurs considerable communication costs. In many potential applications of FL such as mobile devices, these two challenges are present simultaneously. However, privacy and communication-efficiency have mostly been studied independently in the past. As regards privacy, existing work has applied a gold-standard privacy notion -differential privacy (DP) -to FL, which ensures that the server could hardly determine the participation of each client by observing their updates (Geyer et al., 2017) . To achieve DP, each client needs to inject noise to their local updates and as a side effect, the performance of the trained model would inevitably degrade. To improve model utility, secure multiparty computation (SMC) has been used in tandem with DP to reduce noise (Jayaraman et al., 2018; Truex et al., 2019) . The key idea is to prevent the server from observing the individual updates, make only the aggregate accessible, and thus transform from local DP to central DP. However, SMC introduces extra communication overhead to each client. There has been extensive research on improving communication efficiency of FL while ignoring the privacy aspect (Tsitsiklis & Luo, 1987; Balcan et al., 2012; Zhang et al., 2013; Arjevani & Shamir, 2015; Chen et al., 2016) . However, these communication reduction methods either have incompatible implementations with the existing DP mechanisms or would break the DP guarantees when combined with SMC. The only existing work that tries to reconcile DP and communication efficiency in FL is cpSGD (Agarwal et al., 2018) . The authors leveraged the Binomial mechanism, which adds Binomial noise into local updates to ensure differential privacy. The discrete nature of Binomial noise allows it to be transmitted efficiently. However, cpSGD faces several limitations when applied to real-world applications. Firstly, with Binomial noise, the output of a learning algorithm would have different supports on different input datasets; as a result, Binomial noise can only guarantee approx-imate DP where the participation of the client can be completely exposed with nonzero probability. Also, there lacks a tight composition for DP with Binomial noise and the resulting privacy budget skyrockets in a multi-round FL protocol. Hence, the Binomial mechanism cannot produce a useful model with a reasonable privacy budget on complex tasks. Last but not least, the Binomial mechanism involves several mutually constrained hyper-parameters and the privacy formula is extremely complicated, which makes hyper-parameter tuning a difficult task. In this paper, we propose the discrete Gaussian based differential private federated learning (D 2 P-FED), an alternative technique to reduce communication costs while maintaining differential privacy in FL. Our key idea is to leverage the discrete Gaussian mechanism in FL, which adds discrete Gaussian noise into client updates. We show that the discrete Gaussian mechanism satisfies Rényi DP which provides better composability. We employ secure aggregation along with the discrete Gaussian mechanism to lower the noise and exhibit the privacy guarantee for this hybrid privacy protection approach. To save the communication cost, we integrate the stochastic quantization and random rotation into the protocol. We then cast FL as a general distributed mean estimation problem and provide the analysis of the utility for the overall protocol. Our theoretical analysis sheds light on the superiority of D 2 P-FED to cpSGD. Our experiments show that D 2 P-FED can lead to state-ofthe-art performance in terms of managing the trade-off among privacy, utility, and communication. 2016)). However, most of the approaches either require communication between the workers or are designed for specific learning tasks so they cannot be applied directly to generalpurpose FL. The most relevant work is Suresh et al. ( 2017) which proposed to use stochastic quantization to save the communication cost and random rotation to lower mean squared error of the estimated mean. We follow their approach to improve the communication efficiency and model utility of D 2 P-FED. Nevertheless, our work differs from theirs in that we also study how to ensure DP for rotated and quantized data transmission and prove a convergence result for the learning algorithm with both communication cost reduction and privacy protection steps in place. 2018)). However, these methods mainly focus on improving utility under a small privacy budget and ignore the issue of communication cost. In particular, we adopt a similar hybrid approach to Truex et al. (2019) , which combines SMC with DP for reducing the noise. SMC ensures that the centralized server can only see the aggregated update but not individual ones from clients and as a result, the noise added by each client can be reduced by a factor of the number of clients participating in one round. The difference of our work from theirs is that we inject discrete Gaussian noise to local updates instead of the continuous Gaussian noise. This allows us to use secure aggregation (Bonawitz et al., 2017) which is much cheaper than threshold homomorphic encryption used by Truex et al. (2019) . We further study the interaction between discrete Gaussian noise and the secure aggregation as well as their effects on the learning convergence.

2. RELATED WORK

We identify cpSGD (Agarwal et al. ( 2018)) as the most comparable work to D 2 P-FED. Just like D 2 P-FED, cpSGD aims to improve both the communication cost and the utility under rigorous privacy guarantee. However, cpSGD suffers from three main defects discussed in Section 1. This paper proposes to use the discrete Gaussian mechanism to mitigate these issues in cpSGD.

3. BACKGROUND AND NOTATION

In this section, we provide an overview of FL and DP and establish the notation system. We use bold lower-case letters (e. 



It is well studied how to improve the communication cost in traditional distributed learning settings (Tsitsiklis & Luo (1987); Balcan et al. (2012); Zhang et al. (2013); Arjevani & Shamir (2015); Chen et al. (

On the other hand, differentially private FL is undergoing rapid development during the past few years (Geyer et al. (2017); McMahan et al. (2017); Jayaraman et al. (

g. a,b,c) to denote vectors, and bold upper-case letters (e.g. A, B, C) for matrices. We denote 1 • • • n by [n]. FL Overview. In a FL system, there are one server and n clients C i , i ∈ [n]. The server holds a global model of dimension d. Each client holds (IID or non-IID) samples drawn from some unknown distribution D. The goal is to learn the global model w ∈ R d that minimizes some loss function L(w, D). To achieve this, the system runs a T -round FL protocol. The server initializes

