VOTING-BASED APPROACHES FOR DIFFERENTIALLY PRIVATE FEDERATED LEARNING

Abstract

While federated learning (FL) enables distributed agents to collaboratively train a centralized model without sharing data with each other, it fails to protect users against inference attacks that mine private information from the centralized model. Thus, facilitating federated learning methods with differential privacy (DPFL) becomes attractive. Existing algorithms based on privately aggregating clipped gradients require many rounds of communication, which may not converge, and cannot scale up to large-capacity models due to explicit dimension-dependence in its added noise. In this paper, we adopt the knowledge transfer model of private learning pioneered by Papernot et al. (2017; 2018) and extend their algorithm PATE, as well as the recent alternative PrivateKNN (Zhu et al., 2020) to the federated learning setting. The key difference is that our method privately aggregates the labels from the agents in a voting scheme, instead of aggregating the gradients, hence avoiding the dimension dependence and achieving significant savings in communication cost. Theoretically, we show that when the margins of the voting scores are large, the agents enjoy exponentially higher accuracy and stronger (data-dependent) differential privacy guarantees on both agent-level and instancelevel. Extensive experiments show that our approach significantly improves the privacy-utility trade-off over the current state-of-the-art in DPFL.

1. INTRODUCTION

With increasing ethical and legal concerns on leveraging private data, federated learning (McMahan et al., 2017) (FL) has emerged as a paradigm that allows agents to collaboratively train a centralized model without sharing local data. In this work, we consider two typical settings of federated learning: (1) Local agents are in large number, i.e., learning user behavior over many mobile devices (Hard et al., 2018) . (2) Local agents are in small number with sufficient instances, i.e., learning a health related model across multiple hospitals without sharing patients' data (Huang et al., 2019) . When implemented using secure multi-party computation (SMC) (Bonawitz et al., 2017) , federated learning eliminates the need for any agent to share its local data. However, it does not protect the agents or their users from inference attacks that combine the learned model with side information. Extensive studies have established that these attacks could lead to blatant reconstruction of the proprietary datasets (Dinur & Nissim, 2003) and identification of individuals (a legal liability for the participating agents) (Shokri et al., 2017) . Motivated by this challenge, there had been a number of recent efforts (Truex et al., 2019b; Geyer et al., 2017; McMahan et al., 2018) in developing federated learning methods with differential privacy (DP), which is a well-established definition of privacy that provably prevents such attacks. Among the efforts, DP-FedAvg (Geyer et al., 2017; McMahan et al., 2018) extends the NoisySGD method (Song et al., 2013; Abadi et al., 2016) to the federated learning setting by adding Gaussian noise to the clipped accumulated gradient. The recent state-of-the-art DP-FedSGD (Truex et al., 2019b) is under the same framework but with per-sample gradient clipping. A notable limitation for these gradient-based methods is that they require clipping the magnitude of gradients to τ and adding noise proportional to τ to every coordinate of the shared global model with d parameters. The clipping and perturbation steps introduce either large bias (when τ is small) or large variance (when τ is large), which interferes the SGD convergence and makes it hard to scale up to largecapacity models. In Sec. 3, we concretely demonstrate these limitations with examples and theory. Particularly, we show that the FedAvg may fail to decrease the loss function together with gradient clipping, and DP-FedAvg requires many outer-loop iterations (i.e., many rounds of communication to synchronize model parameters) to converge under differential privacy. To avoid the gradient clipping, we propose to conduct the aggregation over the label space, as shown to be an effective approach in standard (non-federated) learning settings, i.e., voting-based modelagnostic approaches (Papernot et al., 2017; 2018; Zhu et al., 2020) . To achieve it, we relax the traditional federated learning setting to allow unlabeled public data at the server side. We also consider a more complete scenario for federated learning, where there are a large number of local agents or a limited number of local agents. The agent-level privacy as introduced in DP-FedAvg, works seamlessly with our setting having many agents. However, when there are few agents, hiding each data belonging to one specific agent becomes burdensome or unnecessary. To this end, we provide a more complete privacy notion, i.e., agent-level and instance-level. Under each of the setting, we theoretically and empirically show that the proposed label aggregation method effectively removes the sensitivity issue caused by gradient clipping or noise addition, and achieves favorable privacy-utility trade-off compared to other DPFL algorithms. Our contributions are summarized as the following: 1. We propose two voting-based DPFL algorithms via label aggregation (PATE-FL and Private-KNN-FL) and demonstrate their clear advantages over gradient aggregation based DPFL methods (e.g., DP-FedAvg) in terms of communication cost and scalability to high-capacity models. 2. We provide provable differential privacy guarantees under two levels of granularity: agentlevel DP and instance-level DP. Each is natural in a particular regime of FL depending on the number of agents and the size of their data. 3. Extensive evaluation demonstrates that our method improves the privacy-utility trade-off over randomized gradient-based approaches in both agent-level and instance-level cases. A remark of our novelty. Though PATE-FL and Private-kNN-FL are algorithmically similar to the original PATE (Papernot et al., 2018) and Private-KNN (Zhu et al., 2020), they are not the same and we are adapting them to a new problem -federated learning. The adaptation itself is nontrivial and requires substantial technical innovations. We highlight three challenges below. • Several key DP techniques that contributed to the success of PATE and Private-KNN in the standard setting are no longer applicable (e.g., Privacy amplification by Sampling and Noisy Screening). This is partly due to that in standard private learning, the attacker only sees the final models; but in FL, the attacker can eavesdrop in all network traffic. • Moreover, PATE and Private-kNN only provide instance-level DP. We show PATE-FL and Private-kNN-FL also satisfy the stronger agent-level DP. PATE-FL's agent-level DP parameter is, surprisingly, a factor of 2 better than its instance-level DP parameter. And Private-kNN-FL in addition enjoys a factor of k amplification for the instance-level DP. • A key challenge of FL is the data heterogeneity of individual agents, while PATE randomly splits the dataset so each teacher is identically distributed. The heterogeneity does not affect our privacy analysis but does make it unclear whether PATE would work. We are the first to report strong empirical evidence that the PATE-style DP algorithms remain highly effective in the non-iid case.

2. PRELIMINARY

In this section, we start with introducing the typical notations of federated learning and differential privacy. Then, two randomized gradient-based baselines, DP-FedAvg and DP-FedSGD, are introduced as the DPFL background. 2.1 FEDERATED LEARNING Federated learning (McMahan et al., 2017; Bonawitz et al., 2017; Mohassel & Zhang, 2017; Smith et al., 2017) is a distributed machine learning framework that allows clients to collaboratively train a global model without sharing local data. We consider N agents, each agent i has n i data kept locally

