PRACTICAL LOCALLY PRIVATE FEDERATED LEARN-ING WITH COMMUNICATION EFFICIENCY Anonymous

Abstract

Federated learning (FL) is a technique that trains machine learning models from decentralized data sources. We study FL under local differential privacy constraints, which provides strong protection against sensitive data disclosures via obfuscating the data before leaving the client. We identify two major concerns in designing practical privacy-preserving FL algorithms: communication efficiency and highdimensional compatibility. We then develop a gradient-based learning algorithm called sqSGD (selective quantized stochastic gradient descent) that addresses both concerns. The proposed algorithm is based on a novel privacy-preserving quantization scheme that uses a constant number of bits per dimension per client. Then we improve the base algorithm in two ways: first, we apply a gradient subsampling strategy that offers simultaneously better training performance and smaller communication costs under a fixed privacy budget. Secondly, we utilize randomized rotation as a preprocessing step to reduce quantization error. We also initialize a discussion about the role of quantization and perturbation in FL algorithm design with privacy and communication constraints. Finally, the practicality of the proposed framework is demonstrated on benchmark datasets. Experiment results show that sqSGD successfully learns large models like LeNet and ResNet with local privacy constraints. In addition, with fixed privacy and communication level, the performance of sqSGD significantly dominates that of baseline algorithms.

1. INTRODUCTION

1.1 BACKGROUND Federated learning (FL) Kairouz et al. (2019) ; Konečnỳ et al. (2016) is a rapidly evolving application of distributed optimization to large-scale learning or estimation scenarios where multiple entities. called clients, collaborate in solving a machine learning problem, under the coordination of a central server. Each client's raw data is stored locally and not exchanged or transferred. To achieve the learning objective, the server collects minimal information from the clients for immediate aggregation. FL is particularly suitable for mobile and edge device applications since the (sensitive) individual data never directly leave the device and has seen deployments in industries (?Hard et al., 2019; Leroy et al., 2019) . While FL offers significant practical privacy improvements over centralizing all the training data, it lacks a formal privacy guarantee. As discussed in Melis et al. (2018) , even if only model updates (i.e. gradient updates) are transmitted, it is easy to compromise the privacy of individual clients. Differential privacy (DP) (Dwork et al., 2014) is the state-of-the-art approach to address information disclosure. Differentially private algorithms fuse participation of any individual via injecting algorithm-specific random noise. In FL setting, DP is suitable for protecting against external adversaries, i.e. a malicious analyst that tries to infer individual data via observing final or intermediate model results. However, DP paradigms typically assume a trusted curator, which corresponds to the server in the FL setting. This assumption is often not satisfied in practical cases, under which users that act as clients may not trust the service provider that acts as the server. Local differential privacy (LDP) (Kasiviswanathan et al., 2011; Dwork et al., 2014) provides privacy protection on the individual level via applying randomized mechanisms that obfuscate the data before leaving the 1

