COMMUNICATION-COMPUTATION EFFICIENT SECURE AGGREGATION FOR FEDERATED LEARNING Anonymous

Abstract

Federated learning has been spotlighted as a way to train neural network models using data distributed over multiple clients without a need to share private data. Unfortunately, however, it has been shown that data privacy could not be fully guaranteed as adversaries may be able to extract certain information on local data from the model parameters transmitted during federated learning. A recent solution based on the secure aggregation primitive enables privacy-preserving federated learning, but at the expense of significant extra communication/computational resources. In this paper, we propose communication-computation efficient secure aggregation which reduces the amount of communication/computational resources at least by a factor of n/ log n relative to the existing secure solution without sacrificing data privacy, where n is the number of clients. The key idea behind the suggested scheme is to design the topology of the secret-sharing nodes (denoted by the assignment graph G) as sparse random graphs instead of the complete graph corresponding to the existing solution. We first obtain a sufficient condition on G to guarantee reliable and private federated learning. Afterwards, we suggest using the Erdős-Rényi graph as G, and provide theoretical guarantees on the reliability/privacy of the proposed scheme. Through extensive real-world experiments, we demonstrate that our scheme, using only 50% of the resources required in the conventional scheme, maintains virtually the same levels of reliability and data privacy in practical federated learning systems.

1. INTRODUCTION

Federated learning (McMahan et al., 2017) has been considered as a promising framework for training models in a decentralized manner without explicitly sharing the local private data. This framework is especially useful in various predictive models which learn from private distributed data, e.g., healthcare services based on medical data distributed over multiple organizations (Brisimi et al., 2018; Xu & Wang, 2019 ) and text prediction based on the messages of distributed clients (Yang et al., 2018; Ramaswamy et al., 2019) . In the federated learning (FL) setup, each device contributes to the global model update by transmitting its local model only; the private data is not shared across the network, which makes FL highly attractive (Kairouz et al., 2019; Yang et al., 2019) . Unfortunately, however, FL could still be vulnerable against the adversarial attacks on the data leakage. Specifically, the local model transmitted from a device contains extensive information on the training data, and an eavesdropper can estimate the data owned by the target device (Fredrikson et al., 2015; Shokri et al., 2017; Melis et al., 2019) . Motivated by this issue, the authors of (Bonawitz et al., 2017) suggested secure aggregation (SA), which integrates cryptographic primitives into the FL framework to protect data privacy. However, SA requires significant amounts of additional resources on communication and computing for guaranteeing privacy. Especially, the communication and computation burden of SA increases as a quadratic function of the number of clients, which limits the scalability of SA. Contributions We propose communication-computation efficient secure aggregation (CCESA), which maintains the reliability and data privacy in federated learning, with reduced resources on communication and computation compared to conventional SA. Our basic idea is illustrated in Fig. 1 ෨ 𝜃 1 = 𝜃 1 + 𝐏𝐑𝐆 𝑏 1 + 𝐏𝐑𝐆 𝑠 1,2 + 𝐏𝐑𝐆 𝑠 1,3 + 𝐏𝐑𝐆 𝑠 1,4 + 𝐏𝐑𝐆 𝑠 1,5 ෨ 𝜃 1 = 𝜃 1 + 𝐏𝐑𝐆 𝑏 1 + 𝐏𝐑𝐆 𝑠 1,3 + 𝐏𝐑𝐆(𝑠 1,4 ) Suggested Algorithm (CESA) 𝐺 for n = 5 clients. Compared to the existing scheme (Bonawitz et al., 2017), which applies secret sharing for all client pairs, we suggest sharing secrets for a subset of pairs in a way that data privacy is preserved. Using theoretic analysis, we provide a sufficient condition on the graph topology for private and reliable federated learning. As summarized in Table 1 , the proposed CCESA algorithm maintains both reliability and privacy against an eavesdropper who can access to any information transmitted between every client and the server, while the required amount of resources reduces by a factor of at least O( n/ log n) compared to SA. Notably, the reduction factor gets bigger with increasing n, suggesting that the proposed scheme is a scalable solution for privacy-preserving federated learning. Our mathematical results are also confirmed in experiments on two real datasets of AT&T face database and CIFAR-10. Especially, under the model inversion attack on the face dataset, the results in Fig. 2 show that the suggested scheme achieves perfect data privacy by using less amount of resources than SA, while federated averaging without security measures (McMahan et al., 2017) significantly compromises the privacy of the data. Related work Focusing on the collaborative learning setup with multiple clients and a server, previous works have suggested solutions to prevent the information leakage in the communication links between the server and clients. One major approach utilizes the concept of differential privacy (DP) (Dwork et al., 2014) by adding artificial random noise to the transmitted models (Wei et al., 2020; Geyer et al., 2017; Truex et al., 2020) or gradients (Shokri & Shmatikov, 2015; Abadi et al., 2016; Balcan et al., 2012) . Depending on the noise distribution, DP-based collaborative learning generally exhibits a trade-off between the privacy level and the convergence of the global model. Another popular approach is deploying secure multiparty computation (MPC) (Ben-Or et al., 1988; Damgård et al., 2012; Aono et al., 2017; Lindell et al., 2015; Bonawitz et al., 2017; Zhang et al., 2018; Tjell & Wisniewski, 2019; Shen et al., 2020; So et al., 2020) based on the cryptographic primitives including secret sharing and the homomorphic encryption (Leontiadis et al., 2014; 2015; Shi et al., 2011; Halevi et al., 2011) . Although these schemes guarantee privacy, they suffer from high communication burdens while reconstructing the secret information distributed over multiple clients. A notable work (Bonawitz et al., 2017) suggested secure aggregation (SA), which tolerates multiple client failures by applying pairwise additive masking ( Ács & Castelluccia, 2011; Elahi et al., 2014; Jansen & Johnson, 2016; Goryczka & Xiong, 2015) . A recent work (So et al., 2020) suggested Turbo-aggregate, which partitions n computing devices into L groups and updates the global model by utilizing the circular aggregation topology. However, each client in Turbo-aggregate requires a communication cost of at least 4mnR/L bits, which is much larger than that of our scheme (CCESA) requiring a communication cost of √ n log n(2a K + 5a S ) + mRfoot_0 . For example,



Here, m is the number of model parameters where each parameter is represented in R bits. aK and aS are the number of bits required for exchanging public keys and the number of bits in a secret share, respectively.



Figure 2: A training image and the reconstructed images using model inversion attacks with (a) the proposed scheme (CCESA), (b) existing secure aggregation (SA) (Bonawitz et al., 2017), (c) federated averaging with no security measures (McMahan et al., 2017). The federated averaging scheme leaks private data from the transmitted model, while SA and the proposed CCESA do not. Note that the required communication/computational resources of CCESA are only 40% of those of SA. Additional examples are given in Supplementary Materials.

