COMMUNICATION-COMPUTATION EFFICIENT SECURE AGGREGATION FOR FEDERATED LEARNING Anonymous

Abstract

Federated learning has been spotlighted as a way to train neural network models using data distributed over multiple clients without a need to share private data. Unfortunately, however, it has been shown that data privacy could not be fully guaranteed as adversaries may be able to extract certain information on local data from the model parameters transmitted during federated learning. A recent solution based on the secure aggregation primitive enables privacy-preserving federated learning, but at the expense of significant extra communication/computational resources. In this paper, we propose communication-computation efficient secure aggregation which reduces the amount of communication/computational resources at least by a factor of n/ log n relative to the existing secure solution without sacrificing data privacy, where n is the number of clients. The key idea behind the suggested scheme is to design the topology of the secret-sharing nodes (denoted by the assignment graph G) as sparse random graphs instead of the complete graph corresponding to the existing solution. We first obtain a sufficient condition on G to guarantee reliable and private federated learning. Afterwards, we suggest using the Erdős-Rényi graph as G, and provide theoretical guarantees on the reliability/privacy of the proposed scheme. Through extensive real-world experiments, we demonstrate that our scheme, using only 50% of the resources required in the conventional scheme, maintains virtually the same levels of reliability and data privacy in practical federated learning systems. 1 2 3 4 5 2 3 4 5 ෨ 𝜃 1 = 𝜃 1 + 𝐏𝐑𝐆 𝑏 1 + 𝐏𝐑𝐆 𝑠 1,2 + 𝐏𝐑𝐆 𝑠 1,3 + 𝐏𝐑𝐆 𝑠 1,4 + 𝐏𝐑𝐆 𝑠 1,5 ෨ 𝜃 1 = 𝜃 1 + 𝐏𝐑𝐆 𝑏 1 + 𝐏𝐑𝐆 𝑠 1,3 + 𝐏𝐑𝐆(𝑠 1,4 ) Suggested Algorithm (CESA) 𝐺 𝑆𝐴 𝐺 𝐶𝐸𝑆𝐴

1. INTRODUCTION

Federated learning (McMahan et al., 2017) has been considered as a promising framework for training models in a decentralized manner without explicitly sharing the local private data. This framework is especially useful in various predictive models which learn from private distributed data, e.g., healthcare services based on medical data distributed over multiple organizations (Brisimi et al., 2018; Xu & Wang, 2019 ) and text prediction based on the messages of distributed clients (Yang et al., 2018; Ramaswamy et al., 2019) . In the federated learning (FL) setup, each device contributes to the global model update by transmitting its local model only; the private data is not shared across the network, which makes FL highly attractive (Kairouz et al., 2019; Yang et al., 2019) . Unfortunately, however, FL could still be vulnerable against the adversarial attacks on the data leakage. Specifically, the local model transmitted from a device contains extensive information on the training data, and an eavesdropper can estimate the data owned by the target device (Fredrikson et al., 2015; Shokri et al., 2017; Melis et al., 2019) . Motivated by this issue, the authors of (Bonawitz et al., 2017) suggested secure aggregation (SA), which integrates cryptographic primitives into the FL framework to protect data privacy. However, SA requires significant amounts of additional resources on communication and computing for guaranteeing privacy. Especially, the communication and computation burden of SA increases as a quadratic function of the number of clients, which limits the scalability of SA. Contributions We propose communication-computation efficient secure aggregation (CCESA), which maintains the reliability and data privacy in federated learning, with reduced resources on communication and computation compared to conventional SA. Our basic idea is illustrated in Fig. 1 ෨ 𝜃 1 = 𝜃 1 + 𝐏𝐑𝐆 𝑏 1 + 𝐏𝐑𝐆 𝑠 1,2 + 𝐏𝐑𝐆 𝑠 1,3 + 𝐏𝐑𝐆 𝑠 1,4 + 𝐏𝐑𝐆 𝑠 1,5 ෨ 𝜃 1 = 𝜃 1 + 𝐏𝐑𝐆 𝑏 1 + 𝐏𝐑𝐆 𝑠 1,3 + 𝐏𝐑𝐆(𝑠 1,4 ) Suggested Algorithm (CESA) 𝐺 𝑆𝐴 𝐺 𝐶𝐸𝑆𝐴 2 3 5 4 1 2 3 5 4 Secure Aggregation (Bonawitz et al., 2017) (b) Suggested algorithm (CCESA) Figure 1 : Conventional secure aggregation (SA) (Bonawitz et al., 2017) versus the suggested communicationcomputation efficient secure aggregation (CCESA). Via selective secret sharing across only a subset of client pairs, the proposed algorithm reduces the communication cost (for exchanging public keys and secret shares among clients) and computational cost (for generating secret shares and pseudo-random values, and performing key agreements), compared to the existing fully-shared method. CCESA still maintains virtually the same levels of reliability and privacy, as proven by the theoretic analysis of Section 4. Training set image (𝑎) (𝑏) (𝑐) Figure 2: A training image and the reconstructed images using model inversion attacks with (a) the proposed scheme (CCESA), (b) existing secure aggregation (SA) (Bonawitz et al., 2017) , (c) federated averaging with no security measures (McMahan et al., 2017) . The federated averaging scheme leaks private data from the transmitted model, while SA and the proposed CCESA do not. Note that the required communication/computational resources of CCESA are only 40% of those of SA. Additional examples are given in Supplementary Materials. for n = 5 clients. Compared to the existing scheme (Bonawitz et al., 2017) , which applies secret sharing for all client pairs, we suggest sharing secrets for a subset of pairs in a way that data privacy is preserved. Using theoretic analysis, we provide a sufficient condition on the graph topology for private and reliable federated learning. As summarized in Table 1 , the proposed CCESA algorithm maintains both reliability and privacy against an eavesdropper who can access to any information transmitted between every client and the server, while the required amount of resources reduces by a factor of at least O( n/ log n) compared to SA. Notably, the reduction factor gets bigger with increasing n, suggesting that the proposed scheme is a scalable solution for privacy-preserving federated learning. Our mathematical results are also confirmed in experiments on two real datasets of AT&T face database and CIFAR-10. Especially, under the model inversion attack on the face dataset, the results in Fig. 2 show that the suggested scheme achieves perfect data privacy by using less amount of resources than SA, while federated averaging without security measures (McMahan et al., 2017) significantly compromises the privacy of the data.

Related work

Focusing on the collaborative learning setup with multiple clients and a server, previous works have suggested solutions to prevent the information leakage in the communication links between the server and clients. One major approach utilizes the concept of differential privacy (DP) (Dwork et al., 2014) by adding artificial random noise to the transmitted models (Wei et al., 2020; Geyer et al., 2017; Truex et al., 2020) or gradients (Shokri & Shmatikov, 2015; Abadi et al., 2016; Balcan et al., 2012) . Depending on the noise distribution, DP-based collaborative learning generally exhibits a trade-off between the privacy level and the convergence of the global model. Another popular approach is deploying secure multiparty computation (MPC) (Ben-Or et al., 1988; Damgård et al., 2012; Aono et al., 2017; Lindell et al., 2015; Bonawitz et al., 2017; Zhang et al., 2018; Tjell & Wisniewski, 2019; Shen et al., 2020; So et al., 2020) based on the cryptographic primitives including secret sharing and the homomorphic encryption (Leontiadis et al., 2014; 2015; Shi et al., 2011; Halevi et al., 2011) . Although these schemes guarantee privacy, they suffer from high communication burdens while reconstructing the secret information distributed over multiple clients. A notable work (Bonawitz et al., 2017) suggested secure aggregation (SA), which tolerates multiple client failures by applying pairwise additive masking ( Ács & Castelluccia, 2011; Elahi et al., 2014; Jansen & Johnson, 2016; Goryczka & Xiong, 2015) . A recent work (So et al., 2020) suggested Turbo-aggregate, which partitions n computing devices into L groups and updates the global model by utilizing the circular aggregation topology. However, each client in Turbo-aggregate requires a communication cost of at least 4mnR/L bits, which is much larger than that of our scheme (CCESA) requiring a communication cost of √ n log n(2a K + 5a S ) + mRfoot_0 . For example, CCESA SA FedAvg Communication cost Client O( √ n log n + m) O(n + m) O(m) Server O(n √ n log n + mn) O(n 2 + mn) O(mn) Computation cost Client O(n log n + m √ n log n) O(n 2 + mn) 0 Server O(mn log n) O(mn 2 ) O(mn) Reliability ≥ 1 -O(ne - √ n log n ) ≥ 1 -O(ne -n ) 1 Privacy ≥ 1 -O(n -C ), for C > 0 1 0 Table 1: Communication and computation cost of the proposed CCESA algorithm, Secure Aggregation (SA) in (Bonawitz et al., 2017) , federated averaging (FedAvg) in (McMahan et al., 2017) . Detailed expressions and derivations are in Section 4 and Supplementary Materials, respectively. in a practical scenario with m = 10 6 , R = 32, n = 100, L = 10 and a K = a S = 256, our scheme requires only 3% of the communication bandwidth used in Turbo-aggregate. The idea of replacing the complete graph with a low-degree graph for communication efficiency has been studied in the areas of distributed learning (Charles et al., 2017; Sohn et al., 2020) and multi-party computation (Fitzi et al., 2007; Harnik et al., 2007) . A very recent paper (Bell et al., 2020) proposed new protocols using k-regular graphs for communication in the secure aggregation (Bonawitz et al., 2017) framework, which are robust against semi-honest and semi-malicious threat models, respectively. However, the results in (Bell et al., 2020) is based on a strong assumption on the number of clients dropped out of the protocol, while our work does not assume anything on the number of dropouts.

2. BACKGROUND

Federated learning Consider a scenario with one server and n clients. Each client i has its local training dataset D i = {(x i,k , y i,k )} Ni k=1 where x i,k and y i,k are the feature vector and the label of the k-th training sample, respectively. For each round t, the server first selects a set S t of cn clients (0 < c ≤ 1) and sends the current global model θ Cryptographic primitives for preserving privacy Here we review three cryptographic tools used in SA (Bonawitz et al., 2017) . First, t-out-of-n secret sharing (Shamir, 1979) is splitting a secret s into n shares, in a way that any t shares can reconstruct s, while any t -1 shares provides absolutely no information on s. We denote t-out-of-n secret sharing by s (t,n) ---→ (s k ) k∈[n] , where s k indicates the k th share of secret s and [n] represents the index set {1, 2, • • • , n}. Second, the Diffie-Hellman key agreement is used to generate a secret s i,j that is only shared by two target clients i, j ∈ [n]. The key agreement scheme designs public-private key pairs (s P K u , s SK u ) for clients u ∈ [n] in a way that s i,j = f (s P K i , s SK j ) = f (s P K j , s SK i ) holds for all i, j ∈ [n] for some key agreement function f . The secret s i,j is unknown when neither s SK i nor s SK j is provided. Third, symmetric authenticated encryption is encrypting/decrypting message m using a key k shared by two target clients. This guarantees the integrity of the messages communicated by the two clients. Secure aggregation (Bonawitz et al., 2017)  = V = [n]. The objective of the server is to obtain the sum of models i θ i without getting any other information on private local models. In Step 0, client i ∈ V 0 generates key pairs (s P K i , s SK i ) and (c P K i , c SK i ) by using a key agreement scheme. Then, client i advertises its public keys (s P K i , c P K i ) to the server. The server collects the public keys from a client set V 1 ⊂ V 0 , and broadcasts {(i, s P K i , c P K i )} i∈V1 to all clients in V 1 . In Step 1, client i generates a random element b i and applies t-out-of-n secret sharing to generate n shares of b i and s SK i , i.e., b i (t,n) ---→ (b i,j ) j∈[n] and s SK i (t,n) ---→ (s SK i,j ) j∈[n] . By using the symmetric authenticated encryption, client i computes the ciphertext e i,j for all j ∈ V 1 \{i}, by taking b i,j and s SK i,j as messages and c i,j = f (c P K j , c SK i ) as a key, where f is the key agreement function. Finally, client i sends {(i, j, e i,j )} j∈V1\{i} to the server. The server collects the message from at least t clients (denote this set of client as V 2 ⊂ V 1 ) and sends {(i, j, e i,j )} i∈V2 to each client j ∈ V 2 . In Step 2, client i computes the shared secret s i,j = f (s P K j , s SK i ) for all j ∈ V 2 \{i}. Then, client i computes the masked private vector θi = θi + PRG(bi) + j∈V 2 ;i<j PRG(si,j) - j∈V 2 ;i>j PRG(si,j), and sends θi to the server, where PRG(x) indicates a pseudorandom generator with seed x outputting a vector having the dimension identical to θ i . Note that the masked vector θi gives no information on private vector θ i unless both s SK i and b i are revealed. The server collects θi from at least t clients (denote this set as V 3 ⊂ V 2 ), and sends V 3 to each client i ∈ V 3 . In Step 3, client j decrypts the ciphertext {e i,j } i∈V2\{j} by using the key c i,j = f (c P K i , c SK j ) to obtain {b i,j } i∈V2\{j} and {s SK i,j } i∈V2\{j} . Each client j sends a set of shares {b i,j } i∈V3 ∪ {s SK i,j } i∈V2\V3 to the server. The server collects the responds from at least t clients (denote this set of clients as V 4 ⊂ V 3 ). For each client i ∈ V 3 , the server reconstructs b i from {b i,j } i∈V4 and computes PRG(b i ). Similarly, for each client i ∈ V 2 \V 3 , the server reconstructs s SK i from {s SK i,j } i∈V4 and computes PRG(s i,j ) for all j ∈ V 2 . Using this information, the server obtains the sum of private local models by computing i∈V 3 θi = i∈V 3 θi - i∈V 3 PRG(bi) - j∈V 2 ,i∈V 2 \V 3 ;i<j PRG(si,j) + j∈V 2 ,i∈V 2 \V 3 ;i>j PRG(si,j). (2)

3. SUGGESTED ALGORITHM

In the secure aggregation (Bonawitz et al., 2017) , the public keys (c P K i , s P K i ) and the shares of secrets (s SK i , b i ) are transmitted between clients and the server, which requires additional communication/computational resources compared with the vanilla federated learning. Specifically, since each client i needs to receive information from all other clients j = i, the required amount of resources increases as a quadratic function of the number of clients n. In this paper, we suggest a variant of secure aggregation, dubbed as communication-computation efficient secure aggregation (CCESA), which enables to provide a more scalable solution for privacypreserving federated learning by improving the communication/computational efficiency. The basic idea of the proposed algorithm is to allow each client to share its public keys and secret shares to a subset of other clients, instead of sharing them with all other clients. By doing so, compared with SA, the suggested scheme achieves two advantages in resource efficiency, without losing the reliability of learning algorithms and the data privacy. The first advantage is the reduction of the communication cost, since each node shares its public keys and secrets with less clients. The second advantage is a reduction of the computational cost of each client, since a smaller number of masks are used while computing its masked private vector. The proposed algorithm is specified by the assignment graph which represents how public keys and secret shares are assigned to the other clients. Given n clients, the assignment graph G = (V, E) consists of n vertices where the vertex and the edge set of G are represented by V and E, respectively. We set V = [n] where each index i ∈ V represents client i, and the edge {i, j} ∈ E connecting vertices i and j indicates that client i and j exchange their public keys and secret shares. For vertex i ∈ [n], we define Adj(i) := j; {i, j} ∈ E as the index set of vertices adjacent to vertex i. In our algorithm, public keys and secrets of client i are shared with clients j ∈ Adj(i). Now, using the assignment graph notation, we formally define the suggested algorithm. Due to the space limitation, we put the algorithm in Supplementary Materials A; here we only describe what differs from SA. In Step 0, instead of broadcasting the public keys (c P K j , s P K j ) for client j to all other clients, the server sends the public keys only to the client i satisfying j ∈ Adj(i) ∩ V 1 . In Step 1, each client i ∈ V 1 uses t i -out-of-(|Adj(i)| + 1) secret sharing scheme to generate shares of s SK i and b i , i.e., s SK i (ti,|Adj(i)|+1) ---------→ (s SK i,j ) j∈Adj(i)∪{i} and b i (ti,|Adj(i)|+1) ---------→ (b i,j ) j∈Adj(i)∪{i} , and sends the encrypted s SK i,j and b i,j to client j through the server. In Step 2, client i computes the masked private model θi = θi + PRG(bi) + j∈V 2 ∩Adj(i);i<j PRG(si,j) - j∈V 2 ∩Adj(i);i>j PRG(si,j), and transmits θi to the server. In Step 3, client i sends b j,i to the server for all j ∈ V 3 ∩ Adj(i), and sends s SK j,i to the server for all j ∈ (V 2 \V 3 ) ∩ Adj(i). After reconstructing secrets from shares, the server obtains the sum of the local models θ i as i∈V 3 θi = i∈V 3 θi - i∈V 3 PRG(bi) - i∈V 2 \V 3 ,j∈Adj(i)∩V 3 ;i>j PRG(si,j) + i∈V 2 \V 3 ,j∈Adj(i)∩V 3 ;i<j PRG(si,j). (4) Note that the suggested protocol with n-complete assignment graph G reduces to SA. Here we define several notations representing the evolution of the assignment graph G as some of the nodes may drop out of the system in each step. Recall that V 0 = V and V i+1 is defined as the set of survived nodes in Step i ∈ {0, • • • , 3}. Let us define G i as the induced subgraph of G whose vertex set is V i , i.e., G i := G -(V \V i ). Then, G i+1 represents how the nodes survived until Step i are connected. We define the evolution of assignment graph during the protocol as G = (G 0 , G 1 , • • • , G 4 ). In Fig. 1 , we illustrate an example of the suggested algorithm with n = 5 clients. Fig. 1a corresponds to SA (Bonawitz et al., 2017) , while Fig. 1b depicts the proposed scheme. Here, we focus on the required communication/computational resources of client 1. Note that each client exchanges public keys and secret shares with its adjacent clients. For example, client 1 exchanges the data with four other clients in the conventional scheme, while client 1 exchanges the data with clients 3 and 5 in the suggested scheme. Thus, the proposed CCESA requires only half of the bandwidth compared to the conventional scheme. In addition, CCESA requires less computational resources than conventional scheme, since each client generates less secret shares and pseudo-random values, and performs less key agreements.

4.1. PERFORMANCE METRICS

The proposed CCESA algorithm aims at developing private, reliable and resource-efficient solutions for federated learning. Here, we define key performance metrics for federated learning systems including federated averaging (McMahan et al., 2017) , SA (Bonawitz et al., 2017) and CCESA. Recall that the server receives (masked) model parameters θi from clients i ∈ V 3 , and wants to update the global model as the sum of the unmasked model parameters, i.e., θ global ← i∈V3 θ i . The condition for successful (or reliable) global model update is stated as follows. Definition 1. A system is called reliable if the server successfully obtains the sum of the model parameters i∈V3 θ i aggregated over the distributed clients. Now, we define private federated learning. In our analysis, we focus on a passive eavesdropper who can access to any information transmitted between any client and the server throughout running CCESA algorithm, namely, public keys of clients, secret shares, masked local models and the indices of survived clients V 3foot_1 . Once an eavesdropper gets the local model θ i of client i, it can reconstruct the private data (e.g., face image or medical records) of the client or can identify the client which contains the target data. In general, if an eavesdropper obtains the sum of local models occupied by a subset T of clients, a similar privacy attack is possible for the subset of clients. Thus, to preserve the data privacy, it is safe to protect the information on the partial sum of model parameters against the eavesdropper; we formalize this. Definition 2. A system is called private if H( i∈T θ i ) = H( i∈T θ i |E) holds for all T satisfying T ⊂ V 3 and T / ∈ {∅, V 3 }. Here, H is the entropy function, and E is the information accessible to the eavesdropper. When both reliability and privacy conditions hold, the server successfully updates the global model, while an eavesdropper cannot extract data in the information-theoretic sense. We define P (r) e and P (p) e as the probabilities that reliability and privacy conditions do not hold, respectively. 4.2 RESULTS FOR GENERAL ASSIGNMENT GRAPH G Recall that our proposed scheme is specified by the assignment graph G. We here provide mathematical analysis on the performance metrics of reliability and privacy, in terms of the graph G. To be specific, the theorems below provide the necessary and sufficient conditions on the assignment graph G to enable reliable/private federated learning, where the reliability and privacy are defined in Definitions 1 and 2. Before going into the details, we first define informative nodes as below. Definition 3. A node i ∈ V 0 is informative if |(Adj(i) ∪ {i}) ∩ V 4 | ≥ t i holds. Note that node i is called informative when the server can reconstruct the secrets (b i or s SK i ) of node i in Step 3 of the algorithm. Using this definition, we state the condition on graph G for enabling reliable systems as below. Theorem 1. The system is reliable if and only if node i in informative for all i ∈ V + 3 , where V + 3 = V 3 ∪ {i ∈ V 2 : Adj(i) ∩ V 3 = ∅} is the union of V 3 and the neighborhoods of V 3 within V 2 . Proof. The full proof is given in Supplementary Materials; here we provide a sketch for the proof. Recall that the server receives the sum of masked models i∈V3 θi , while the system is said to be reliable if the server obtains the sum of unmasked models i∈V3 θ i . Thus, the reliability condition holds if and only if the server can cancel out the random terms in (4), which is possible when either s SK i or b i is recovered for all i ∈ V + 3 . Since a secret is recovered if and only if at least t i shares are gathered from adjacent nodes, we need |(Adj(i) ∪ {i}) ∩ V 4 | ≥ t i , which completes the proof. Now, before moving on to the next theorem, we define some sets of graph evolutions as below: G C = {G = (G 0 , G 1 , • • • , G 4 ) : G 3 is connected }, G D = {G = (G 0 , G 1 , • • • , G 4 ) : G 3 is not connected }, G NI = {G ∈ G D : ∀l ∈ [κ], ∃i ∈ C + l such that node i is not informative}. Here, when G 3 is a disconnected graph with κ ≥ 2 components, C l is defined as the vertex set of the l th component, and C + l := C l ∪ {i ∈ V 2 : Adj(i) ∩ C l = ∅}. Using this definition, we state a sufficient condition on the assignment graph to enable private federated learning. Lemma 1. The system is private if G ∈ G C . Proof. Again we just provide a sketch of the proof here; the full proof is in Supplementary Materials. Note that G 3 is the induced subgraph of G whose vertex set is V 3 . Suppose an eavesdropper has access to the masked local models { θi } i∈T of a subset T ⊂ V 3 of nodes. Now, the question is whether this eavesdropper can recover the sum of the unmasked models i∈T θ i . If G 3 is connected, there exists an edge e = {p, q} such that p ∈ T and q ∈ V 3 \T . Note that i∈T θi contains the PRG(s p,q ) term, while s p,q is not accessible by the eavesdropper since p, q ∈ V 3 . Thus, from (4), the eavesdropper cannot obtain i∈T θ i , which completes the proof. Based on the Lemma above, we state the necessary and sufficient condition for private system as below, the proof of which is given in the Supplementary Materials. Theorem 2. The system is private if and only if G ∈ G C ∪ G N I . The theorems above provide guidelines on how to construct the assignment graph G to enable reliable and private federated learning. These guidelines can be further specified when we use the Erdős-Rényi graph as the assignment graph G. In the next section, we explore how the Erdős-Rényi graph can be used for reliable and private federated learning.

4.3. RESULTS FOR ERD ŐS-R ÉNYI ASSIGNMENT GRAPH G

The Erdős-Rényi graph G ∈ G(n, p) is a random graph of n nodes where each edge connecting two arbitrary nodes is connected with probability p. Define CCESA(n, p) as the proposed scheme using the assignment graph of G ∈ G(n, p). According to the analysis provided in this section, CCESA(n, p) almost surely achieves both reliability and privacy conditions, provided that the connection probability p is chosen appropriately. Throughout the analysis below, we assume that each client independently drops out with probability q at each step (from Step 0 to Step 3), and the secret sharing parameter t i is set to t for all i ∈ [n].

4.3.1. FOR ASYMPTOTICALLY LARGE n

We start with the analysis on CCESA(n, p) when n is asymptotically large. The following two theorems provide lower bounds on p to satisfy reliability/privacy conditions. The proofs are provided in Supplementary Materials. Theorem 3. CCESA(n, p) is asymptotically almost surely reliable if p > 3 √ (n-1) log(n-1)-1 (n-1)(2(1-q) 4 -1) . Theorem 4. CCESA(n, p) is asymptotically almost surely private if p > log( n(1-q) 3 - √ n log n ) n(1-q) 3 - √ n log n . From these theorems, the condition for achieving both reliability and privacy is obtained as follows. Remark 1. Let p = max{ log( n(1 -q) 3 - √ n log n ) n(1 -q) 3 - √ n log n , 3 (n -1) log(n -1) -1 (n -1)(2(1 -q) 4 -1) }. If p > p , then CCESA(n, p) is asymptotically almost surely (a.a.s.) reliable and private. Note that the threshold connection probability p is a decreasing function of n. Thus, the proposed algorithm is getting more resource efficient than SA as n grows, improving the scalability of the system. In the remarks below, we compare SA and the proposed CCESA, in terms of the required amount of communication/computational resources to achieve both reliability and privacy. These results are summarized in Table 1 . Remark 2. Let B be the amount of additional communication bandwidth used at each client, compared to that of federated averaging (McMahan et al., 2017) . Since the bandwidth is proportional to np, we have B CCESA(n,p) ∼ O( √ n log n) and B SA ∼ O(n). Thus, the suggested CCESA protocol utilizes a much smaller bandwidth compared to SA in (Bonawitz et al., 2017) . The detailed comparison is given in Section D.1 of the Supplementary Materials. Remark 3. Compared to SA, the proposed CCESA algorithm generates a smaller number of secret shares and pseudo-random values, and performs less key agreements. Thus, the computational burden at the server and the clients reduces by a factor of at least O( n/ log n). The detailed comparison is given in Section D.2 of the Supplementary Materials.

4.3.2. FOR FINITE n

We now discuss the performance of the suggested scheme for finite n. Let P (p) e be the error probability that CCESA(n, p) does not satisfy the the privacy condition, and define P (p) e as the error probability that CCESA(n, p) is not reliable. Below we provide upper bounds on P (p) e and P (r) e . Theorem 5. For arbitrary n, p, q and t, the error probability for reliability P (r) e is bounded by P (r) e ≤ ne -(n-1)D KL ( t-1 n-1 ||p(1-q) 4 ) , where D KL is the Kullback-Leibler (KL) divergence. Theorem 6. For arbitrary n, p and q, the error probability for privacy P (p) e is bounded by obtained in Theorems 5 and 6, when p = p . Here, q total := 1 -(1 -q) 4 is defined as the dropout probability of the entire protocol (from Step 0 to P (p) e ≤ n m=0 n m (1 -q) 3m (1 -(1 -q) 3 ) (n-m) m/2 k=1 m k (1 -p) k(m-k) . Step 3). Note that the upper bounds in Theorems 5 and 6 are decreasing functions of p. Therefore, the plotted values in Fig. 3 are indeed upper bounds on the error probabilities for arbitrary p > p . It is shown that a system with the suggested algorithm is private and reliable with high probability for an arbitrary chosen p > p . The error probability for the privacy P (p) e is below 10 -40 , which is negligible even for small n. The error probability for the reliability P (r) e is below 10 -2 , which means that in at most one round out of 100 federated learning rounds, the (masked) models { θi } i∈V3 received by the server cannot be converted to the sum of (unmasked) local models i∈V3 θ i . Even in this round when the server cannot obtain the sum of (unmasked) local models, the server is aware of the fact that the current round is not reliable, and may maintain the global model used in the previous round. This does not harm the accuracy of our scheme, as shown in the experimental results of Section 5. in Theorems 5 and 6 for p = p , where p is the threshold connection probability for achieving both reliability and privacy as in (5). Note that for arbitrary p > p , the error probabilities are lower than the upper bounds marked in the figure. One can confirm that the suggested CCESA algorithm is private and reliable with a high probability, provided that p > p . 

5. EXPERIMENTS

Here we provide experimental results on the proposed CCESA algorithm. We compare CCESA and secure aggregation (SA) of (Bonawitz et al., 2017) in terms of time complexity (running time), reliability, and privacy. We tested both schemes on two real datasets, AT&T Laboratories Cambridge database of faces (https://www.kaggle.com/kasikrit/att-database-of-faces) and CIFAR-10. For the AT&T face dataset containing images of 40 individuals, we considered a federated learning setup where each of n = 40 clients uses its own images for local training. All algorithms are implemented in python and PyTorch (Paszke et al., 2017) . Codes will be made available to the public.

5.1. RUNNING TIME

In Table 2 , we tested the running time of our CCESA and existing SA for various n and q total . Similar to the setup used in (Bonawitz et al., 2017) , we assumed that each node has its local model θ with dimension m = 10000, where each element of the model is chosen from the field F 2 16 . Here, t is selected by following the guideline in Supplementary Materials, and p is chosen as p ∼ O( log n/n) defined in (5) which is proven to meet both reliability and privacy conditions. For every n, q total setup, the proposed CCESA(n, p) requires p times less running time compared with the conventional SA of (Bonawitz et al., 2017) . This is because each client generates p times less number of secret shares and pseudo-random values, and performs p times less number of key agreements. This result is consistent with our analysis on the computational complexity in Table 1 .

5.2. RELIABILITY

Recall that a system is reliable if the server obtains the sum of the local models i∈V3 θ i . Fig. 4 shows the reliability of CCESA in CIFAR-10 dataset. We plotted the test accuracies of SA and the suggested CCESA(n, p) for various p. Here, we included the result when p = p = 0.3106, where p is the provably minimum connection probability for achieving both the reliability and privacy according to Remark 1. One can confirm that CCESA with p = p achieves the performance of SA in both i.i.d. and non-i.i.d. data settings, coinciding with our theoretical result in Theorem 3. Moreover, in both settings, selecting p = 0.25 is sufficient to achieve the test accuracy performance of SA when the system is trained for 200 rounds. Thus, the required communication/computational resources for guaranteeing the reliability, which is proportional to np, can be reduced to 50% of the conventional wisdom in federated learning. Similar behaviors are observed from the experiments on AT&T Face dataset, as in Fig. B .1 of the Supplementary Materials.

5.3. PRIVACY

We first consider a privacy threat called model inversion attack (Fredrikson et al., 2015) . The basic setup is as follows: the attacker eavesdrops the masked model θi sent from client i to the server, and reconstructs the face image of a target client. Under this setting, we compared how the eavesdropped model reveals the information on the raw data for various schemes. As in Fig. 2 ) to the server Server: Collects the messages from clients (denote this set of clients as V 1 ) Sends {(i, c P K i , s P K i )} i∈Adj(j)∩V1 to all clients j ∈ V 1 ; Step 1. Share Keys Client i: Generates a random element b i Applies t i -out-of-(|Adj(i)| + 1) secret sharing schemes to b i and s SK i b i (ti,|Adj(i)|+1) ---------→ (b i,j ) j∈(Adj(i))∪{i} , s SK i (ti,|Adj(i)|+1) ---------→ (s SK i,j ) j∈Adj(i)∪{i} Encrypts [b i,j , s SK i,j ] to [ bi,j , sSK i,j ] using the authenticated encryption with key f (c P K j , c SK i ) Sends {(i, j, bi,j , sSK i,j )} j∈Adj(i)∩V1 to the server Server: Collects the messages from clients (denote this set of clients as V 2 ) Sends {(i, j, bi,j , sSK i,j )} i∈Adj(j)∩V2 to all clients j ∈ V 2 Step 2. Masked Input Collection Client i: Computes s i,j = f (s P K j , s SK i ) and θi = θ i + PRG(b i ) + j∈V2∩Adj(i);i<j PRG(s i,j ) -j∈V2∩Adj(i);i>j PRG(s i,j ) Sends (i, θi ) to the server Server: Collects the messages from clients (denote this set of clients as V 3 ) Sends V 3 to all clients j in V 3 Step 3. Unmasking Client i: Decrypts bi,j with key f (c P K j , c SK i ) to obtain b i,j for all j ∈ Adj(i) ∩ V 3 Decrypts sSK i,j with key f (c P K j , c SK i ) to obtain s SK i,j for all j ∈ Adj(i) ∩ (V 2 \V 3 ) Sends {b i,j } j∈Adj(i)∩V3 , {s SK i,j } j∈Adj(i)∩(V2\V3) to the server Server: Collects the messages from clients Table B .1: Precision of the membership inference attack on local models trained on CIFAR-10. The scheme with a higher attack precision is more vulnerable to the inference attack. For the proposed CCESA, the attacker is no better than the random guess with precision = 50%, showing the privacy-preserving ability of CCESA. Reconstructs b i from {b i,j } j∈Adj(i)∩V3 for all i ∈ V 3 Reconstructs s SK i from {s SK i,j } j∈Adj(i)∩(V2\V3) for all i ∈ V 2 \V 3 Computes s i,j = f (s P K j , s SK i ) for all j ∈ Adj(i) ∩ V 3 Computes the aggregated sum of local models i∈V3 θ i = i∈V3 θi -i∈V3 PRG(b i ) -i∈V2\V3,j∈Adj(i)∩V3;i>j PRG(s i,j ) + i∈V2\V3,j∈Adj(i)∩V3;i<j PRG(s i,j )

B.1 RELIABILITY

In Fig. 4 of the main paper, we provided the experimental results on the reliaiblity of CCESA on CIFAR-10 dataset. Similarly, Fig. B .1 shows the reliability of CCESA in AT&T Face dataset, where the model is trained over n = 40 clients. We plotted the test accuracies of SA and the suggested CCESA(n, p) for various p. In both settings of q total , selecting p = 0.7 is sufficient to achieve the test accuracy performance of SA when the system is trained for 50 rounds. Thus, the required communication/computational resources for guaranteeing the reliability, which is proportional to np, can be reduced to 70% of the conventional wisdom in federated learning.

B.2 PRIVACY

In Section 5.3 and Fig. 2 of the main paper, we provided the experimental results on the AT&T Face dataset, under the model inversion attack. In Fig. B .2, we provide additional experimental results on the same dataset for different participants. Similar to the result in Fig. 2 , the model inversion attack successfully reveals the individuals identity in federated averaging (McMahan et al., 2017) , while the privacy attack is not effective in both SA and the suggested CCESA. In the main manuscript, we have also considered another type of privacy threat called membership inference attack (Shokri et al., 2017) , where the attacker observes masked local model θi sent from client i to the server, and guesses whether a particular data is the member of the training set. We measured three types of performance metrics of the attacker: accuracy (the fraction of the records correctly estimated the membership), precision (the fraction of the responses inferred as members of the training dataset that are indeed members) and recall (the fraction of the training data that the attacker can correctly infer as members). Table 3 in the main manuscript summarizes the attack accuracy result, while Table B .1 shows the attack precision for CIFAR-10 dataset. We also observed that recall is close to 1 for all schemes. Similar to the results on the attack accuracy, Table B .1 shows that the attack precision of federated averaging reaches near 70%, while that of SA and CCESA remain around the baseline performance of random guess. This shows that both SA and CCESA do not reveal any clue on the training set. When T = C l for some l ∈ [κ], there exists i ∈ C + l such that node i is not informative, according to the definition of G N I . Thus, the server (as well as eavesdroppers) cannot reconstruct both b i and s SK i . Note that the sum of masked models is i∈T θi = i∈T θ i + i∈T PRG(b i ) + j∈T i∈V2∪Adj(j) (-1) 1j>i PRG(s j,i ), where 1 A is the indicator function which is value 1 when the statement A is true, and 0 otherwise. When i ∈ C l = T , we cannot unmask PRG(b i ) in this equation. When i ∈ C + l \C l , there exists j ∈ C l such that {i , j} ∈ E. Note that the eavesdropper needs to know either s SK j or s SK i , in order to compute PRG(s j,i ). Since i is not informative, the eavesdropper cannot get s SK i . Moreover, since the server has already requested the shares of b j , the eavesdropper cannot access s SK j . Thus, the eavesdropper cannot unmask PRG(s j,i ) from ( 8). All in all, the eavesdropper cannot unmask at least one pseudorandom term in i∈T θi , proving (7). When T = C l ∀l ∈ [κ], there exists an edge e = {p, q} such that p ∈ T and q ∈ (V 3 \T ). Thus, we cannot unmask PRG(s p,q ) from i∈T θi . Following the steps in Section C.2, we have (7).

Now, we prove the converse by contrapositive

: if G = (G 0 , G 1 , • • • , G 4 ) ∈ G D ∩ G c N I , then the system is not private. In other words, we need to prove the following statement: if G 3 is disconnected and there exists a component C l such that all nodes in C l are informative, then the system is not private. Let T = C l . Then, the eavesdropper obtains i∈T θi = i∈T θ i + i∈T PRG(b i ) + z, where z = i∈T j∈V2∩Adj(i) (-1) 1i>j PRG(s i,j ) (a) = i∈T j∈T (-1) 1i>j PRG(s i,j ) (b) = 0. Note that (a) holds since T is a component, (b) holds from s i,j = s j,i . Moreover, the eavesdropper can reconstruct b i for all i ∈ T in Step 3 of the algorithm. Thus, the eavesdropper can successfully unmask random terms in i∈T θi and obtain i∈T θ i . This completes the proof.

C.4 PROOF OF THEOREM 3

Proof. Consider Erdős-Rényi assignment graph G ∈ G(n, p). Let N i := |Adj(i)| be the degree of node i, and X i := |Adj(i) ∩ V 4 | be the number of clients (except client i) that successfully send the shares of client i to the server in Step 3. Then, N i and X i follow the binomial distributions N i ∼ B(n -1, p), X i ∼ B(N i , (1 -q) 4 ) = B(n -1, p(1 -q) 4 ), respectively. By applying Hoeffding's inequality on random variable X i , we obtain P (X i < (n -1)p(1 -q) 4 -(n -1) log(n -1)) ≤ 1/(n -1) 2 . Let E be the event that the system is not reliable, i.e., the sum of local models i∈V3 θ i is not reconstructed by the server, and E i be the event {|(Adj(i) ∪ {i}) ∩ V 4 | < t}, i.e., a secret of client i is not reconstructed by the server. For a given p > t+ √ (n-1) log(n-1) (n-1)(1-q) 4 , we obtain P (E) (a) = P (∪ i∈V + 3 E i ) ≤ P (∪ i∈V + 3 {X i < t}) ≤ i∈V + 3 P (X i < t) ≤ i∈[n] P (X i < t) = nP (X 1 < t) ≤nP (X 1 < (n -1)p(1 -q) 4 -(n -1) log(n -1)) ≤ n (n -1) 2 n→∞ ----→ 0, where (a) comes from Theorem 1. Therefore, we conclude that CCESA(n, p) is asymptotically almost surely (a.a.s.) reliable if p > t+ √ (n-1) log(n-1) (n-1)(1-q) 4 . Furthermore, based on the parameter selection rule of t in Section F, we obtain a lower bound on p as p > t + (n -1) log(n -1) (n -1)(1 -q) 4 ≥ (n-1)p+ √ (n-1) log(n-1)+1 2 + (n -1) log(n -1) -1 (n -1)(1 -q) 4 . Rearranging the above inequality with respect to p yields p > 3 (n -1) log(n -1) -1 (n -1)(2(1 -q) 4 -1) .

C.5 PROOF OF THEOREM 4

Proof. Let X := |V 3 | be the number of clients sending its masked local model in Step 2. Then, X follows Binomial random variable B(n, (1 -q) 3 ). Given assignment graph G of CCESA(n, p), note that the induced subgraph G 3 = G -V \V 3 is an Erdős-Rényi graph G(X, p). First, we prove P (G 3 is connected |X -n(1 -q) 3 | ≤ √ n ln n) n→∞ ----→ 1, if p > p = ln( n(1-q) 3 - √ n ln n ) n(1-q) 3 - √ n ln n (1 + ). The left hand side of ( 9) can be rewritten as P (G 3 is connected |X -n(1 -q) 3 | ≤ √ n ln n) = l∈[n(1-q) 3 - √ n ln n,n(1-q) 3 + √ n ln n] P (X = l)P (G(l, p) is connected) l∈[n(1-q) 3 - √ n ln n,n(1-q) 3 + √ n ln n] P (X = l) . Here, we use a well-known property of Erdős-Rényi graph: G(l, p) is asymptotically almost surely (a.a.s.) connected if p > (1+ ) ln l l for some > 0. Since ln l l is a decreasing function, G(l, p) is a.a.s. connected for all l ∈ [n(1 -q) 3 -√ n ln n, n(1 -q) 3 + √ n ln n] when p > ln( n(1-q) 3 - √ n ln n ) n(1-q) 3 -√ n ln n . Thus, for given p > p , we can conclude P (G 3 is connected |X -n(1 -q) 3 | ≤ √ n ln n) n→∞ ----→ 1. Now, we will prove that CCESA(n, p) is a.a.s. private when p > p . The probability that CCESA(n, p) is private is lower bounded by P (CCESA(n, p) is private) (a) ≥ P (G 3 is connected) = P (|X -n(1 -q) 3 | ≤ √ n ln n)P (G 3 is connected |X -n(1 -q) 3 | ≤ √ n ln n) + P (|X -n(1 -q) 3 | > √ n ln n)P (G 3 is connected |X -n(1 -q) 3 | > √ n ln n) (b) ≥ (1 -2/n 2 )P (G 3 is connected |X -n(1 -q) 3 | ≤ √ n ln n) n→∞ ----→ 1, where (a) comes from Lemma 1 and (b) comes from Hoeffding's inequality P (|X -n(1 -q) 3 | ≤ √ n ln n) ≥ 1 -2/n 2 , which completes the proof. C.6 PROOF OF THEOREM 5 Proof. Consider an Erdős-Rényi assignment graph G ∈ G(n, p). Let N i := |Adj(i)| be the degree of node i, and X i := |Adj(i) ∩ V 4 | be the number of clients (except client i) that successfully send the shares of client i to the server in Step 3. Then, N i and X i follow the binomial distributions N i ∼ B(n -1, p), X i ∼ B(N i , (1 -q) 4 ) = B(n -1, p(1 -q) 4 ), respectively. Let E i be an event {|(Adj(i) ∪ {i}) ∩ V 4 | < t}, i.e., a secret of client i is not reconstructed by the server. We obtain an upper bound on P (E i ) as P (E i ) ≤ P (X i < t) = t-1 i=0 n -1 i (p(1 -q) 4 ) i (1 -p(1 -q) 4 ) (n-1-i) (a) = e -(n-1)D( t-1 n-1 ||p(1-q) 4 ) where (a) comes from Chernoff bound on the binomial distribution. Thus, P (r) e is upper bounded by P (r) e (b) =P (∪ i∈V + 3 E i ) ≤ P (∪ i∈V + 3 {X i < t}) ≤ i∈V + 3 P (X i < t) ≤ i∈[n] P (X i < t) = nP (X 1 < t) = ne -(n-1)D( t-1 n-1 ||p(1-q) 4 ) , where (b) comes from Theorem 1. C.7 PROOF OF THEOREM 6 Proof. Let P dc (n, p) be the probability of an event that Erdős-Rényi graph G ∈ G(n, p) is disconnected. Then, P dc (n, p) is upper bounded as follows. P dc (n, p) = P (G(n, p) is disconnected) = P (∪ n/2 k=1 {there exists a subset of k nodes that is disconnected}) ≤ n/2 k=1 P (there exists a subset of k nodes that is disconnected) ≤ n/2 k=1 n k P (a specific subset of k nodes is disconnected) = n/2 k=1 n k (1 -p) k(n-k) Therefore, P (p) e is upper bounded by P (p) e (a) ≤ P (G 3 = G -V \V 3 is disconnected) = n m=0 P (G 3 has m vertices)P dc (m, p) = n m=0 n m (1 -q) 3m (1 -(1 -q) 3 ) (n-m) • P dc (m, p) = n m=0 n m (1 -q) 3m (1 -(1 -q) 3 ) (n-m) m/2 k=1 m k (1 -p) k(m-k) , where We consider the worst-case scenario having the maximum additional bandwidth, where no client fails during the operation. We evaluate the computational cost of CCESA. Here we do not count the cost for computing the signatures since it is negligible. First, we derive the computational cost of each client. Given the number of model parameters m and the number of clients n, the computational cost of client i is composed of three parts: ) time in the worst case. As a result, the total computational cost of the server is O(m|Adj(i)| 2 ). If we choose p = (1 + )p for small > 0, the total computational cost per each client is O(n log n + m √ n log n), while the total computation cost of the server is O(mn log n). The computational cost of SA can be obtained in a similar manner, by setting Adj(i) = n -1; each client requires O(n 2 + mn) time while the server requires O(mn 2 ) time. These results are summarized in Table 1 in the main manuscript.

E RELIABILITY AND PRIVACY OF CCESA

Here, we analyze the asymptotic behavior of probability that a system is reliable/private. In our analysis, we assume that the connection probability is set to p and the parameter t used in the secret sharing is selected based on the rule in Section F. First, we prove that a system is reliable with probability ≥ 1 -O(ne -n log n ). Using Theorem 5, the probability that a system is reliable can be directly derived as P (A system is reliable) =1 -P (r) e ≥1 -ne -(n-1)D KL ( t-1 n-1 ||p (1-q) 4 ) . Using the fact that Kullback-Leibler divergence term satisfies D KL t -1 n -1 ||p (1 -q) 4 = t -1 n -1 log t-1 n-1 p (1 -q) 4 + 1 - t -1 n -1 log 1 -t-1 n-1 1 -p (1 -q) 4 =Θ( log n/n), we conclude that CCESA(n, p ) is reliable with probablilty ≥ 1 -O(ne - √ n log n ). Now we prove that a system is private with probability ≥ 1-O(n -C ) for an arbitrary C > 0. Using Theorem 6, the probability that a system is private can be obtained as P (A system is private) =1 -P (p) Note that a m = P (X = m) holds for binomial random variable X = B(n, (1 -q) 3 ). By utilizing Hoeffding's inequality, we have m th m=0 a m = P (X ≤ m th ) ≤ e -2(n(1-q) 3 -m th ) 2 ≤ e -n(1-q) 6 /2 . Therefore, we conclude that ( em -λ/2 1 -λ -1 (C + 2) ) k . For sufficiently large λ, we have em -λ/2 /(1 -λ -1 (C + 2)) < δ for some δ < 1. Therefore, d m is upper bounded by holds where (a) comes from m th = n(1 -q) 3 /2 . d m ≤ ∞ k=k +1 δ k = δ k 1 -δ = O(δ mC Combining the above two lemmas, we conclude that CCESA(n, p ) is private with probability ≥ 1 -O(n -C ) for arbitrary C > 0. These results on the reliability and the privacy are summarized in Table 1 of the main manuscript.

F DESIGNING THE PARAMETER t FOR THE SECRET SHARING

Here we provide a rule for selecting parameter t used in the secret sharing. In general, setting t to a smaller number is better for tolerating dropout scenarios. However, when t is excessively small, the system is vulnerable to the unmasking attack of adversarial server; the server may request shares of b i and s SK i to disjoint sets of remaining clients simultaneously, which reveals the local model θ i to the server. The following proposition provides a rule of designing parameter t to avoid such unmasking attack. Proposition 1 (Lower bound on t). For CCESA(n, p), let t > (n-1)p+ √ (n-1) log(n-1)+1 2 be given. Then, the system is asymptotically almost surely secure against the unmasking attack. Proof. Let E be the event that at least one of local models are revealed to the server, and E i be the event that i th local model θ i is revealed to the server. Note that θ i is revealed to the server if t clients send the shares of b i and other t clients send the shares of s SK As stated above, setting t to a smaller number is better to tolerate the dropout of multiple clients. Thus, as in the following remark, we set t to be the minimum value avoiding the unmasking attack.



Here, m is the number of model parameters where each parameter is represented in R bits. aK and aS are the number of bits required for exchanging public keys and the number of bits in a secret share, respectively. Our threat model is equivalent to the "server-only" honest-but-curious adversary, which is weaker than "client-server collusion" adversary considered in secure aggregation(Bonawitz et al., 2017).



selected clients. Then, each client i ∈ S t updates the received model by using the local data D i and sends the updated model θ (t+1) i to the server. Finally, the server updates the global model by aggregating local updates from the selected clients, i.e., θ where N = i∈St N i .

Fig. 3 illustrates the upper bounds on P (p) e and P (r) e

Figure 3: Upper bounds on the error probabilities P (r) e and P (p) e

n

Schemes \ Number of training data (n train ) 5000 10000 15000 50000 Federated Averaging (McMahan et al., 2017) 70.41% 65.82% 65.89% 60.62% Secure Aggregation (SA) (Bonawitz et al., 2017) 49.78% 49.97% 49.91% 49.10% CCESA (Suggested) 49.48% 50.07% 49.16% 50.00%

Figure B.2: The result of model inversion attack to three schemes, (b) the suggested scheme (CCESA), (c) SA (Bonawitz et al., 2017) and (d) federated averaging (McMahan et al., 2017), for AT&T Face dataset. The original training images at (a) can be successfully reconstructed by the attack only in federated averaging setup, i.e., both SA and CCESA achieve the same level of privacy.

a) comes from Lemma 1. D REQUIRED RESOURCES OF CCESA D.1 COMMUNICATION COST Here, we derive the additional communication bandwidth B CCESA used at each client for running CCESA, compared to the bandwidth used for running federated averaging (McMahan et al., 2017).

(1) computing 2|Adj(i)| key agreements, which takes O(|Adj(i)|) time, (2) generating shares of t i -out-of-|Adj(i)| secret shares of s SK i and b i , which takes O(|Adj(i)| 2 ) time, and (3) generating masked local model θi , which requires O(m|Adj(i)|) time. Thus, the total computational cost of each client is obtained as O(|Adj(i)| 2 + m|Adj(i)|). Second, the server's computational cost is composed of two parts: (1) reconstructing t i -out-of-|Adj(i)| secrets from shares for all clients i ∈ [n], which requires O(|Adj(i)| 2 ) time, and (2) removing masks from masked sum of local models n i=1 θi , which requires O(m|Adj(i)| 2

m , where a m = n m (1 -q) 3m (1-(1 -q) 3 ) (n-m) and b m = p ) k(m-k) . Note that the summation term n m=0 a m b m can be broken up into two parts: m th m=0 a m b m and n m=m th +1 a m b m , where m th = n(1 -q) 3 /2 . In the rest of the proof, we will prove two lemmas, showing that m th m=0 a m b m = O(e -n ) and n m=m th +1 a m b m = O(n -C ), respectively. Lemma 2.

m = O(e -n ) Proof. Since b m ≤ 1 for all m, we have m th m=0 a m b m ≤ m th m=0 a m .

m th m=0 a m b m = O(e -n ). Lemma 3. n m=m th +1 a m b m = O(n -C ) Proof. Since a m ≤ 1 for all m, we have n m=m th +1 a m b m ≤ n m=m th +1 b m .Let C > 0 be given. Then, the upper bound on b m can be obtained as(m-k)/m = c m + d m where λ = p n/ log n, c m = m k k k=1 m -λk(m-k)/m , d m = m k m/2 k=k +1 m -λk(m-k)/m , and k = m(1 -C+2 λ ) for some C > 0. The first part of summation is upper bounded by λ(m-k )/m-1 ≤ m -λ(m-k )/m-1 1 -m -λ(m-k )/m-1 = m -(C+1) 1 -m -(C+1) .The second part of summation, we will use the bound n k ≤ ( en k ) k . Using this bound, d m is upper bounded by

)where C = (1 -λ -1 (C + 2)) > 0. Combining upper bounds on c m and d m , we obtain b m = O(m -(C+1) ). Since b m is a decreasing function of m, n m=m th +1 b m ≤ n m=m th +1 b m th +1 = (n -m th )b m th +1 (a) = O(n -C )

Step 3. Therefore,P (E i ) ≤ P (|(Adj(i) ∪ {i}) ∩ V 4 | ≥ 2t) ≤ P (|(Adj(i) ∪ {i})| ≥ 2t) = P (|(Adj(i)| ≥ 2t -1) ≤ P (|Adj(i)| > (n -1)p + (n -1) log(n -1)) comes from Hoeffding's inequality of binomial random variable. As a result, we obtainP (E) = P (∪ i∈[n] E i ) ≤ i∈[n] P (E i ) = nP (E 1 ) = n (n -1) 2 n→∞ ----→ 0,which completes the proof.

For privacy-preserving federated learning, SA has been proposed based on the cryptographic primitives of Shamir's secret sharing, key agreement and symmetric authenticated encryption. The protocol consists of four steps: Step 0 Advertise Keys, Step 1 Share Keys, Step 2 Masked Input Collection, and Step 3 Unmasking. Consider a server with n clients where client i ∈ [n] has its private local model θ i . Denote the client index set as V 0

Running time (unit: ms) of SA(Bonawitz et al., 2017) and suggested CCESA

and Fig. B.2 in the A DETAILED DESCRIPTION OF THE CCESA PROTOCOL Algorithm 1: Communication-Computation Efficient Secure Aggregation (CCESA) Protocol Input: Number of clients n, assignment graph G, privacy thresholds t i of all clients i ∈ [n], local models θ i of all clients i ∈ [n], Diffie-Hellman key pairs (c P K

The required communication bandwidth of each client is composed of four parts. First, in Step 0, each client i sends two public keys to the server, and receives 2|Adj(i)| public keys from other clients. Second, in Step 1, each client i sends encrypted 2|Adj(i)| shares to other nodes, and receives 2|Adj(i)| shares from other nodes through the server. Third, in Step 2, each client i sends a masked data y i of mR bits. Here, m is the dimension of model parameters where each parameter is represented in R bits. Fourth, in Step 3, each client i sends |Adj(i)| + 1 shares to the server. Therefore, total communication bandwidth of client i can be expressed as (total communication bandwidth) = 2(|Adj(i)| + 1)a K + (5|Adj(i)| + 1)a S + mR, where a K and a S are the number of bits required for exchanging public keys and the number of bits in a secret share. Since each client i requires mR bits to send the private vector θ i in the federated averaging(McMahan et al., 2017), we haveB CCESA = 2(|Adj(i)| + 1)a K + (5|Adj(i)| + 1)a S .If we choose the connection probability p = (1 + )p for a small > 0, we haveB CCESA = O( √ n log n),where p is defined in (5). Note that the additional bandwidth B SA required for SA can be similarly obtained asB SA = 2na K + (5n -4)a S having B SA = O(n).Thus, we have B CCESA B SA → 0 as n increases, showing the scalability of CCESA. These results are summarized in Table 1 in the main manuscript.

annex

Federated Averaging (McMahan et al., 2017) 72.49% 70.72% 72.80% 66.47% Secure Aggregation (SA) (Bonawitz et al., 2017) 49.67% 49.96% 49.85% 49.33% CCESA (Suggested) 49.29% 50.14% 49.02% 50.00%Table 3 : Accuracy of the membership inference attack on local models trained on CIFAR-10. The scheme with a higher attack accuracy is more vulnerable to the inference attack. In order to maximize the uncertainty of the membership inference, the test set for the attack model consists of 5000 members (training data points) and 5000 non-members (evaluation data points). For the proposed CCESA, the attacker is no better than the random guess with accuracy = 50%, showing the privacy-preserving ability of CCESA.Supplementary Materials, the vanilla federated averaging (McMahan et al., 2017) with no privacypreserving techniques reveals the characteristics of individual's face, compromising the privacy of clients. On the other hand, both SA and CCESA do not allow any clue on the client's face; these schemes are resilient to the model inversion attack and preserve the privacy of clients. This observation is consistent with our theoretical results that the proposed CCESA guarantees data privacy. We also considered another privacy threat called the membership inference attack (Shokri et al., 2017) .Here, the attacker eavesdrops the masked model θi and guesses whether a target data is a member of the training dataset. Table 3 summarizes the accuracy of the inference attack for CIFAR-10 dataset, under the federated learning setup where n train training data is equally distributed into n = 10 clients.The attack accuracy reaches near 70% for federated averaging, while SA and CCESA have the attack accuracy of near 50%, similar to the performance of the random guess. This shows that both SA and CCESA do not reveal any clue on the local training set, which secures data privacy.

6. CONCLUSION

We devised communication-computation efficient secure aggregation (CCESA) which successfully preserves the data privacy of federated learning in a highly resource-efficient manner. Based on graph-theoretic analysis, we showed that using O(n log n) resources is sufficient for guaranteeing reliability and privacy of the proposed system with n clients, which is much smaller than O(n 2 ) resources used in the conventional wisdom. Our experiments on real datasets, measuring the test accuracy and the privacy leakage, show that CCESA requires only 50% of resources than the conventional wisdom, to achieve the same level of reliability and privacy.

C PROOFS C.1 PROOF OF THEOREM 1

Proof. Note that the sum of masked local models obtained by the server is expressed asHere, z can be rewritten aswhere (a) and (b) come from s i,j = s j,i . In order to obtain the sum of unmasked local models i∈V3 θ i from the sum of masked local models i∈V3 θi , the server should cancel out all the random terms in i∈V3 PRG(b i ) + z. In other words, the server should reconstruct b i for all i ∈ V 3 and s SK j for all j ∈ V + 3 \V 3 . Since the server can obtain3 is a sufficient condition for reliability. Now we prove the converse part by contrapositive. Suppose there exists iIn this case, note that the server cannot reconstruct both s SK i and b i from the shares. If i ∈ V 3 , the server cannot subtract PRG(b i ) from i∈V3 θi . As a result, the server cannot obtain3 \V 3 , the server cannot subtract PRG(s i,j ) for all j ∈ V 3 since the server does not have any knowledge of neither s SK i nor s SK j . Therefore, the server cannot compute i∈V3 θ i , which completes the proof.

C.2 PROOF OF LEMMA 1

Proof. Let T ⊂ V 3 be an arbitrary set of clients satisfying T / ∈ {∅, V 3 }. It is sufficient to prove the following statement: given a connected graph G 3 , an eavesdropper cannot obtain the partial sum of local models i∈T θ i from the sum of masked models i∈T θi . More formally, we need to proveNote that the sum of masked local models i∈T θi accessible to the eavesdropper is expressed aswhereHere, (a) and (b) come from s i,j = s j,i . If G 3 = (V 3 , E 3 ) is connected, there exists an edge e = {p, q} such that p ∈ T and q ∈ (V 3 \T ). As a consequence, pseudorandom term PRG(s p,q ) is included in z, and its coefficient c p,q is determined as 1 (if p < q), or -1 (if p > q). Note that equation ( 6) can be rewritten aswhere r is the sum of pseudorandom terms which do not include PRG(s p,q ). In order to unmask PRG(s p,q ), the eavesdropper need to know at least one of the secret keys of clients p and q. However, the eavesdropper cannot obtain any shares of the secret keys since the server do not request the shares of s SK p and s SK q in step 3. Therefore, due to the randomness of pseudorandom generator, H( i∈T θ i | i∈T θi ) = H( i∈T θ i ) holds, which completes the proof.

C.3 PROOF OF THEOREM 2

Proof. We first prove that the system is private if G ∈ G C ∪ G N I . When G ∈ G C , the statement holds directly from Lemma 1. Thus, below we only prove for the case of G ∈ G N I . Note that it is sufficient to prove the following statement: given a graph evolutionan eavesdropper cannot obtain the partial sum of local models i∈T θ i from the sum of masked models i∈T θi for every T ⊂ V 3 satisfying T / ∈ {V 3 , ∅}. More formally, we need to proveRemark 4 (Design rule for t). Throughout the paper, we set t =(n-1)p+

√

(n-1) log(n-1)+1 2 for CCESA(n, p), in order to secure a system against the unmasking attack and provide the maximum tolerance against dropout scenarios.

G.1 AT & T FACE DATASET

AT&T Face dataset contains images of 40 members. We allocated the data to n = 40 clients participating in the federated learning, where each client contains the images of a specific member. This experimental setup is suitable for the practical federated learning scenario where each client has its own image and the central server aggregates the local models for face recognition. Following the previous work (Fredrikson et al., 2015) We ran experiments under the federated learning setup where 50000 training images are allocated to n = 1000 clients. Here, we considered two scenarios for data allocation: one is partitioning the data in the i.i.d. manner (i.e., each client randomly obtains 50 images), while the other is non-i.i.d. allocation scenario. For the non-i.i.d. scenario, we followed the procedure of (McMahan et al., 2017) . Specifically, the data is first sorted by its category, and then the sorted data is divided into 2000 shards. Each client randomly chooses 2 shards for its local training data. Since each client has access to at most 2 classes, the test accuracy performance is degraded compared with the i.i. 

G.2.2 PRIVACY EXPERIMENTS IN TABLE 3 AND TABLE B.1

We conducted experiments under the federated learning setup where n train training images are assigned to n = 10 clients. We considered i.i.d. data allocation setup where each client randomly obtains n train /10 training images. The network architecture, the optimizer, and the number of local training epochs are set to the options used in Sec. G.2.1.

G.3 CONNECTION PROBABILITY SETUP IN FIG. 3

In Fig. 3 , we select different connection probabilities p = p (n, q total ) for various n and q total , where p is defined in (5). The detailed values of connection probability p are provided in Table G We implemented CCESA algorithm in python. For symmetric authenticated encryption, we use AES-GCM with 128-bit keys in Crypto.Cipher package. For the pseudorandom generator, we use randint function (input: random seed, output: random integer in the field of size 2 16 ) in numpy.random package. For key agreement, we use Elliptic-Curve Diffie-Hellman over the NIST SP800-56 curve composed with a SHA-256 hash function. For secret sharing, we use standard t-out-of-n secret sharing (Shamir, 1979) .

