RECONCILING SECURITY AND COMMUNICATION EFFI-CIENCY IN FEDERATED LEARNING

Abstract

Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40× compression in uplink communication with no meaningful loss in utility compared to uncompressed baselines.

1. INTRODUCTION

Federated Learning (FL) is a distributed machine learning (ML) paradigm that trains a model across a number of participating entities holding local data samples. In this work, we focus on cross-device FL that harnesses a large number (hundreds of millions) of edge devices with disparate characteristics such as availability, compute, memory, or connectivity resources (Kairouz et al., 2019) . Two challenges to the success of cross-device FL are privacy and scalability. FL was originally motivated for improving privacy since data points remain on client devices. However, as with other forms of ML, information about training data can be extracted via membership inference or reconstruction attacks on a trained model (Carlini et al., 2021a; b; Watson et al., 2022) , or leaked through local updates (Melis et al., 2019; Geiping et al., 2020) . Consequently, Secure Aggregation (SECAGG) protocols were introduced to prevent the server from directly observing individual client updates, which is a major vector for information leakage (Bonawitz et al., 2019; Huba et al., 2022) . Additional mitigations such as Differential Privacy (DP) may be required to offer further protection against attacks (Dwork et al., 2006; Abadi et al., 2016) , as discussed in Section 6. Ensuring scalability to populations of heterogeneous clients is the second challenge for FL. Indeed, wall-clock training times are highly correlated with increasing model and batch sizes (Huba et al., 2022) , even with recent efforts such as FedBuff (Nguyen et al., 2022) , and communication overhead between the server and clients dominates model convergence time. Consequently, compression techniques were used to reduce the communication bandwidth while maintaining model accuracy. However, a fundamental problem has been largely overlooked in the literature: in their native form, standard compression methods such as scalar quantization and pruning are not compatible with SECAGG. This makes it challenging to ensure both security and communication efficiency. In this paper, we address this gap by adapting compression techniques to make them compatible with SECAGG. We focus on compressing uplink updates from clients to the server for three reasons. First, uplink communication is more sensitive and so is subject to a high security bar, whereas downlink updates broadcast by the server are deemed public. Second, upload bandwidth is generally more restricted than download bandwidth. For instance, according to the most recent FCC 1 report, the Figure 1 : Summary of the proposed approach for one FL round, where we omit the round dependency and Differential Privacy (DP) for clarity. Blue boxes denote standard steps and red boxes denote additional steps for uplink compression. Client i computes local model update g i , compresses it with the compression operator q, and encrypts it by adding a random mask m i in the compressed domain, hence reducing the uplink bandwidth (steps 2-4). The server recovers the aggregate in the compressed domain by leveraging any SECAGG protocol (steps 7-13, with a TEE-based SECAGG, see Section 3.1). Since the decompression operator d is linear, the server can convert the aggregate back to the non-compressed domain, up to compression error (step 12). As with the model weights θ, the compression operator q are also periodically updated and broadcast by the server (step 14). In Section 4, we apply the proposed method to scalar quantization and pruning without impacting SECAGG and propose Secure Indexing, a variant of SECAGG for extreme uplink compression with product quantization. See Section 3.1 for details about SECAGG and Section 6 for a discussion on DP. ratio of download to upload speeds for DSL and cable providersfoot_1 in the US ranges between 3× to 20× (FCC, 2021) . Finally, efficient uplink communication brings several benefits beyond speeding up convergence: lowering communication cost reduces selection bias due to under-sampling clients with limited connectivity, improving fairness and inclusiveness. It also shrinks the carbon footprint of FL, the fraction of which attributable to communication can reach 95% (Qiu et al., 2021) . In summary, we present the following contributions in this paper: • We highlight the fundamental mismatch between two critical components of the FL stack: SECAGG protocols and uplink compression mechanisms. • We formulate solutions by imposing a linearity constraint on the decompression operator, as illustrated in Figure 1 in the case of TEE-based SECAGG. • We adapt the popular scalar quantization and (random) pruning compression methods for compatibility with the FL stack that require no changes to the SECAGG protocol. • For extreme uplink compression without compromising security, we propose Secure Indexing (SECIND), a variant of SECAGG that supports product quantization.

2. RELATED WORK

Communication is identified as a primary efficiency bottleneck in FL, especially in the cross-device FL setting (Kairouz et al., 2019) . Efficient Distributed Optimization. There is a large body of literature on reducing the communication cost for distributed training. Seide et al. (2014) proposes quantizing gradients to one bit while carrying the quantization error forward across mini-batches with error feedback. Similarly, Wen et al. (2017) proposes layer-wise ternary gradients and Bernstein et al. (2018) suggests using only the sign of the gradients. Gradient sparsity is another related area that is extensively studied (Wangni et al., 2018; Aji & Heafield, 2017; Lin et al., 2018; Renggli et al., 2019; Parcollet et al., 2022) . For instance, Chen et al. (2018) and Han et al. (2020) explore adapting the degree of sparsity to the distribution of local client data. Another method, QSGD, tunes the quantization level to trade possibly higher variance gradients for reduced communication bandwidth (Alistarh et al., 2017) . Researchers also studied structured and sketched model updates (Konečný et al., 2016) . For example, Wang et al. (2018) proposes expressing gradients as a linear combination of basis vectors common to all workers and Wang et al. (2022) propose to cluster the gradients and to implement error correction on the client side. Besides gradient compression, other methods such as Vepakomma et al. (2018) ; Hu et al. (2019) propose reducing the communication cost by partitioning the model such that each client learns a portion of it, while He et al. (2020) proposes training small models and periodically distilling them to a larger central model. However, as detailed in Section 3 and below, most of the proposed methods are not readily compatible with SECAGG and cannot be used in secure FL. Bi-directional Compression. In addition to uplink gradient compression, a line of work also focuses on downlink model compression. In a non-distributed setup, Zhou et al. (2016) ; Courbariaux et al. (2015) demonstrates that it is possible to meaningfully train with low bit-width models and gradients. In FL, Jiang et al. (2019) proposes adapting the model size to the device to reduce both communication and computation overhead. Since the local models are perturbed due to compression, researchers propose adapting the optimization algorithm for better convergence (Liu et al., 2020; Sattler et al., 2020; Tang et al., 2019; Zheng et al., 2019; Amiri et al., 2020; Philippenko & Dieuleveut, 2021) . Finally, pre-conditioning models during FL training can allow for quantized on-device inference, as demonstrated for non-distributed training by Gupta et al. (2015) ; Krishnamoorthi (2018) . As stated in Section 1, we do not focus on downlink model compression since uplink bandwidth is the main communication bottleneck and since SECAGG only involves uplink communication. Aggregation in the Compressed Domain. In the distributed setting, Yu et al. (2018) propose to leverage both gradient compression and parallel aggregation by performing the ring all-reduce operation in the compressed domain and decompressing the aggregate. To do so, the authors exploit temporal correlations of the gradients to design a linear compression operator. Another method, PowerSGD (Vogels et al., 2019) , leverages a fast low-rank gradient compressor. However, both aforementioned methods are not evaluated in the FL setup and do not mention SECAGG. Indeed, the proposed methods focus on decentralized communication between the workers by leveraging the all-reduce operation. Moreover, Power SGD incorporates (stateful) error feedback on all distributed nodes, which is not readily adaptable to cross-device FL in which clients generally participate in a few (not necessarily consecutive) rounds. Finally, Rothchild et al. (2020) proposes FetchSGD, a compression method relying on a CountSketch, which is compatible with SECAGG.

3. BACKGROUND

In this section, we recall the SECAGG protocol first, then the compression methods that we wish to adapt to SECAGG, namely, scalar quantization, pruning, and product quantization.

3.1. SECURE AGGREGATION

SECAGG refers to a class of protocols that allow the server to aggregate client updates without accessing them individually. While SECAGG alone does not entirely prevent client data leakage, it is a powerful and widely-used component of current at-scale cross-device FL implementations (Kairouz et al., 2019) . Two main approaches exist in practice: software-based protocols relying on Multiparty Computation (MPC) (Bonawitz et al., 2019; Bell et al., 2020; Yang et al., 2022) , and those that leverage hardware implementations of Trusted Execution Environments (TEEs) (Huba et al., 2022) . SECAGG relies on additive masking, where clients protect their model updates g i by adding a uniform random mask m i to it, guaranteeing that each client's masked update is statistically indistinguishable from any other value. At aggregation time, the protocol ensures that all the masks are canceled out. For instance, in an MPC-based SECAGG, the pairwise masks cancel out within the aggregation itself, since for every pair of users i and j, after they agree on a matched pair of input perturbations, the masks m i,j and m j,i are constructed so that m i,j = -m j,i . Similarly and as illustrated in Fig. 1 , in a TEE-based SECAGG, the server receives h i = g i + m i from each client as well as the sum of the masks i m i from the TEE and recovers the sum of the updates as i g i = i h i - i m i . We defer the discussion of DP noise addition by SECAGG protocols to Section 6. Finite Group. SECAGG requires that the plaintexts-client model updates-be elements of a finite group, while the inputs are real-valued vectors represented with floating-point types. This requirement is usually addressed by converting client updates to fixed-point integers and operating in a finite domain (modulo 2 p ) where p is typically set in prior literature to 32 bits. The choice of SECAGG bit-width p must balance communication costs with the accuracy loss due to rounding and overflows. Minimal Complexity. TEE-based protocols offer greater flexibility in how individual client updates can be processed; however, the code executed inside TEE is part of the trusted computing base (TCB) for all clients. In particular, it means that this code must be stable, auditable, defects-and side-channelfree, which severely limits its complexity. Hence, in practice, we prefer compression techniques that are either oblivious to SECAGG's implementation or require minimal changes to the TCB.

3.2. COMPRESSION METHODS

In this subsection, we consider a matrix W ∈ R Cin×Cout representing the weights of a linear layer to discuss three major compression methods with distinct compression/accuracy tradeoffs and identify the challenges SECAGG faces to be readily amenable to these popular quantization algorithms.

3.2.1. SCALAR QUANTIZATION

Uniform scalar quantization maps floating-point weight w to 2 b evenly spaced bins, where b is the number of bits. Given a floating-point scale s > 0 and an integer shift parameter z called the zero-point, we map any floating-point parameter w to its nearest bin indexed by {0, . . . , 2 b -1}: w → clamp(round(w/s) + z, [0, 2 b -1]). The tuple (s, z) is often referred to as the quantization parameters (qparams). With b = 8, we recover the popular int8 quantization scheme (Jacob et al., 2018) , while setting b = 1 yields the extreme case of binarization (Courbariaux et al., 2015) . The quantization parameters s and z are usually calibrated after training a model with floating-point weights using the minimum and maximum values of each layer. The compressed representation of weights W consists of the qparams and the integer representation matrix W q where each entry is stored in b bits. Decompressing any integer entry w q of W q back to floating point is performed by applying the (linear) operator w q → s × (w q -z). Challenge. The discrete domain of quantized values and the finite group required by SECAGG are not natively compatible because of the overflows that may occur at aggregation time. For instance, consider the extreme case of binary quantization, where each value is replaced by a bit. We can represent these bits in SECAGG with p = 1, but the aggregation will inevitably result in overflows.

3.2.2. PRUNING

Pruning is a class of methods that remove parts of a model such as connections or neurons according to some pruning criterion, such as weight magnitude (Le Cun et al. (1989) ; Hassibi & Stork (1992) ; see Blalock et al. (2020) for a survey). Konečný et al. (2016) demonstrate client update compression with random sparsity for federated learning. Motivated by previous work and the fact that random masks do not leak information about the data on client devices, we will leverage random pruning of client updates in the remainder of this paper. A standard method to store a sparse matrix is the coordinate list (COO) formatfoot_2 , where only the non-zero entries are stored (in floating point or lower precision), along with their integer coordinates in the matrix. This format is compact, but only for a large enough compression ratio, as we store additional values for each non-zero entry. Decompression is performed by re-instantiating the uncompressed matrix with both sparse and non-sparse entries. Challenge. Pruning model updates on the client side is an effective compression approach as investigated in previous work. However, the underlying assumption is that clients have different masks, either due to their seeds or dependency on client update parameters (e.g. weight magnitudes). This is a challenge for SECAGG as aggregation assumes a dense compressed tensor, which is not possible to construct when the coordinates of non-zero entries are not the same for all clients.

3.2.3. PRODUCT QUANTIZATION

Product quantization (PQ) is a compression technique developed for nearest-neighbor search (Jégou et al., 2011) that can be applied for model compression (Stock et al., 2020) . Here, we show how we can re-formulate PQ to represent model updates. We focus on linear layers and refer the reader to Stock et al. (2020) for adaptation to convolutions. Let the block size be d (say, 8), the number of codewords be k (say, 256) and assume that the number of input channels, C in , is a multiple of d. To compress W with PQ, we evenly split its columns into subvectors or blocks of size d × 1 and learn a codebook via k-means to select the k codewords used to represent the C in × C out /d blocks of W . PQ with block size d = 1 amounts to non-uniform scalar quantization with log 2 k bits per weight. The PQ-compressed matrix W is represented with the tuple (C, A), where C is the codebook of size k × d and A gives the assignments of size C in × C out /d. Assignments are integers in [0, k -1] and denote which codebook a subvector was assigned to. To decompress the matrix (up to reshaping), we index the codebook with the assignments, written in PyTorch-like notation as W = C[A]. Challenge. There are several obstacles to making PQ compatible with SECAGG. First, each client may have a different codebook, and direct access to these codebooks is needed to decode each client's message. Even if all clients share a (public) codebook, the operation to take assignments to produce an (aggregated) update is not linear, and so cannot be directly wrapped inside SECAGG.

4. METHOD

In this section, we propose solutions to reconcile security (SECAGG) and communication efficiency. Our approach is to modify compression techniques to share some hyperparameters globally across all clients so that aggregation can be done by uniformly combining each client's response, while still ensuring that there is scope to achieve accurate compressed representations. As detailed below, each of the proposed methods offers the same level of security as standard SECAGG without compression.

4.1. SECURE AGGREGATION AND COMPRESSION

We propose to compress the uplink model updates through a compression operator q, whose parameters are round-dependent but the same for all clients participating in the same round. Then, we will add a random mask m i to each quantized client update q(g i ) in the compressed domain, thus effectively reducing uplink bandwidth while ensuring that h i = q(g i ) + m i is statistically indistinguishable from any other representable value in the finite group (see Section 3.1). In this setting, SECAGG allows the server to recover the aggregate of the client model updates in the compressed domain: i q(g i ). If the decompression operator d is linear, the server is able to recover the aggregate in the non-compressed domain, up to quantization error, as illustrated in Figure 1 : d i h i - i m i = d i q(g i ) = i d(q(g i )) ≈ i g i . The server periodically updates the quantization and decompression operator parameters, either from the aggregated model update, which is deemed public, or by emulating a client update on some similarly distributed public data. Once these parameters are updated, the server broadcasts them to the clients for the next round. This adds overhead to the downlink communication payload, however, this is negligible compared to the downlink model size to transmit. For instance, for scalar quantization, q is entirely characterized by one fp32 scale and one int32 zero-point per layer, the latter of which is unnecessary in the case of a symmetric quantization scheme. Finally, this approach is compatible with both synchronous FL methods such as FedAvg (McMahan et al., 2017) and asynchronous methods such as FedBuff (Nguyen et al., 2022) as long as SECAGG maintains the mapping between the successive versions of quantization parameters and the corresponding client updates.

4.2. APPLICATION

Next, we show how we adapt scalar quantization and random pruning with no changes required to SECAGG. We illustrate our point with TEE-based SECAGG while these adapted uplink compression mechanisms are agnostic of the SECAGG mechanism. Finally, we show how to obtain extreme uplink compression by proposing a variant of SECAGG, which we call SECIND. This variant supports product quantization and is provably secure.

4.2.1. SCALAR QUANTIZATION AND SECURE AGGREGATION

As detailed in Section 3.2.1, a model update matrix g i compressed with scalar quantization is given by an integer representation in the range [0, 2 b -1] and by the quantization parameters scale (s) and zero-point (z). A sufficient condition for the decompression operator to be linear is to broadcast common quantization parameters per layer for each client. Denote q(g i ) as the integer representation of quantized client model update g i corresponding to a particular layer for client 1 ≤ i ≤ N . Set the scale of the decompression operator to s and its zero-point to z/N . Then, the server is able to decompress as follows (where the decompression operator is defined in Section 3.2.1): d i q(g i ) = s i q(g i ) - z N = i (s(q(g i )) -z) ≈ i g i Recall that all operations are performed in a finite group. Therefore, to avoid overflows at aggregation time, we quantize with a bit-width b but take SECAGG bit-width p > b, thus creating a margin for potential overflows (see Section 5.3). This approach is related to the fixed-point aggregation described in (Bonawitz et al., 2019; Huba et al., 2022 ), but we calibrate the quantization parameters and perform the calibration per layer and periodically, unlike the related approaches. Privacy, Security and Bandwidth. Scales and zero points are determined from public data on the server. Downlink overhead is negligible: the server broadcasts the per-layer quantization parameters. The upload bandwidth is p bits per weight, where p is the SECAGG finite group size (Section 3.1). Since the masks m i are chosen in the integer range [0, 2 p -1], any masked integer representation taken modulo 2 p is statistically indistinguishable from any other vector.

4.2.2. PRUNING AND SECURE AGGREGATION

To enable linear decompression with random pruning, all clients will share a common pruning mask for each round. This can be communicated compactly before each round as a seed for a pseudorandom function. This pruning mask seed is different from the SECAGG mask seed introduced in Section 3.1 and has a distinct role. Each client uses the pruning seed to reconstruct a pruning mask, prunes their model update g i , and only needs to encrypt and transmit the unpruned parameters. The trade-off here is that some parameters are completely unobserved in a given round, as opposed to traditional pruning. SECAGG operates as usual and the server receives the sum of the tensor of unpruned parameters computed by participating clients in the round, which it can expand using the mask seed. We denote the pruning operator as ϕ applied to the original model update g i , and the decompression operator as d applied to a compressed tensor ϕ(g i ). Decompression is an expansion operation equivalent to multiplication with a sparse permutation matrix P i whose entries are dependent on the i'th client's mask seed. Crucially, when all clients share the same mask seed within each round, we have P i = P for all i and linearity of decompression is maintained: d i ϕ(g i ) = P i ϕ(g i ) = i P i ϕ(g i ) = i d(ϕ(g i )) ≈ i g i . Privacy, Security and Bandwidth. Since the mask is random, no information leaks from the pruning mask. The downlink overhead (the server broadcasts one integer mask seed) is negligible. The upload bandwidth is simply the size of the sparse client model updates. Finally, there is no loss in security since each client uses standard SECAGG mechanism on the non-pruned entries.

4.2.3. PRODUCT QUANTIZATION AND SECURE INDEXING

We next describe the Secure Indexing (SECIND) primitive, and discuss how to instantiate it. Recall that with PQ, each layer has its own codebook C as explained in Section 4. Let us fix one particular Algorithm 1 Secure Indexing (SECIND) 1: procedure SECUREINDEXING(C) ▷ This happens inside the TEE 2: Receive common codebook C from server ▷ C is periodically updated by the server 3: Initialize histograms H m,n to 0 ▷ Each histogram for block (m, n) has size k 4: for each client i do 5: Receive and decrypt assignment matrix A i 6: for each block index (m, n) do Send back histograms H m,n to the server layer compressed with codebook C, containing k codewords. We assume that C is common to all clients participating in the round. Consider the assignment matrix of a given layer (A i ) m,n for client i. From these, we seek to build the assignment histograms H m,n ∈ R k that satisfy H m,n [r] = i 1 A i m,n = r , where the indicator function 1 satisfies 1 A i m,n = r = 1 if A i m,n = r and 0 otherwise. A Secure Indexing primitive will produce H m,n while ensuring that no other information about client assignments or partial aggregations is revealed. The server receives assignment histograms from SECIND and is able to recover the aggregated update for each block indexed by (m, n) as r H m,n [r] • C[r]. We describe how SECIND can be implemented with a TEE in Algorithm 1. Each client encrypts the assignment matrix, for instance with additive masking as described in Section 3.1, and sends it to the TEE via the server. Hence, the server does not have access to the plaintexts client-specific assignments. TEE decrypts each assignment matrix and for each block indexed by (m, n) produces the assignment histogram, which can then be mapped to an update via the (public) codebook. Compared to SECAGG, where the TEE receives an encrypted seed per client (a few bytes per client) and sends back the sum of the masks m i (same size as the considered model), SECIND receives the (masked) assignment matrices and sends back the aggregated update for the round. SECIND implementation feasibility is briefly discussed in Appendix A.3. Privacy, Security and Bandwidth. Codebooks are computed from public data while individual assignments are never revealed to the server. The downlink overhead of sending the codebooks is negligible as demonstrated in Section 5. The upload bandwidth in the TEE implementation is the assignment size, represented in k bits (the number of codewords). For instance, with a block size d = 8 and k = 32 codewords, assignment storage costs are 5 bits per 8 weights, which converts to 0.625 bits per weight. The tradeoff compared to non-secure PQ is the restriction to a global codebook for all clients (instead of one tailored to each client), and the need to instantiate SECIND instead of SECAGG. Since the assignments are encrypted before being sent to the TEE, there is no loss in security. Here, any encryption mechanism (not necessarily relying on additive masking) would work.

5. EXPERIMENTS

In this section, we numerically evaluate the performance of the proposed approaches when adapted to SECAGG protocols. We study the relationship between uplink compression and model accuracy for the LEAF benchmark tasks. In addition, for scalar and product quantization we also analyze the impact of refresh rate for compression parameters on overall model performance.

5.1. EXPERIMENTAL SETUP

We closely follow the setup of Nguyen et al. (2022) and use the FLSim library for our experiments . All experiments are run on a single V100 GPU 16 GB (except for Sent140 where we use one V100 32 GB) and typically take a few hours to run. More experiment details can be found in Appendix A.1. Figure 2 : We adapt scalar quantization (SQ) and pruning to the SECAGG protocol to enable efficient and secure uplink communications. We also present results for product quantization (PQ) under the proposed novel SECIND protocol. The x axis is log-scale and represents the uplink message size. Baseline refers to SECAGG FL run without any uplink compression, displayed as a horizontal line for easier comparison. Model size is indicated in the plot titles. Uncompressed client updates are as large as the models when p = 32 (see Section 3.1, represented as stars). We refer to the Appendix A.2.1 for the matching tables where we additionally report the standard deviation of each data point. Tasks. We run experiments on three datasets from the LEAF benchmark (Caldas et al., 2018) : CelebA (Liu et al., 2015) , Sent140 (Go et al., 2009) and FEMNIST (LeCun & Cortes, 2010) . For CelebA, we train the same convolutional classifier as Nguyen et al. (2022) with BatchNorm layers replaced by GroupNorm layers and 9,343 clients. For Sent140, we train an LSTM classifier for binary sentiment analysis with 59, 400 clients. Finally, for FEMNIST, we train a GroupNorm version of the ResNet18 (He et al., 2016) for digit classification with 3,550 clients. For all compression methods, we do not compress biases and norm layers due to their small overhead. Baselines. We focus here on the (synchronous) FedAvg approach although, as explained in Section 4, the proposed compression methods can be readily adapted to asynchronous FL aggregation protocols. As done in the literature, we keep the number of clients per round to at most 100 (see Appendix A.1), a small fraction of the total considered population size (Chen et al., 2019; Charles et al., 2021) . We report the average and standard deviation of accuracy over three independent runs for all tasks at different uplink byte sizes corresponding to various configurations of the compression operator. Implementation Details. We refer the reader to Appendix A.1. The downlink overhead of sending the per-layer codebooks for product quantization is negligible as shown in Appendix A.2.4. Finally, the convergence time in terms of rounds is similar for PQ runs and the non-compressed baseline, as illustrated in Appendix A.2.5. Note that outside a simulated environment, the wall-clock time convergence for PQ runs would be lower due to uplink communication savings.

5.2. RESULTS AND COMPARISON WITH PRIOR WORK

Results for efficient and secure uplink communications are displayed in Figure 2 . We observe that PQ yields a consistently better trade-off curve between model update size and accuracy. For instance, on CelebA, PQ achieves ×30 compression with respect to the non-compressed baseline at iso-accuracy. The iso-accuracy compression rate is ×32 on Sent140 and ×40 on FEMNIST (see Appendix A.2.1 for detailed tables). Scalar quantization accuracy degrades significantly for larger compression rates due to the overflows at aggregation as detailed in Appendix A.2.2. The line of work that develops FL compression techniques mainly includes FetchSGD (Rothchild et al., 2020) as detailed in Section 2, although the authors do not mention SECAGG. Their results are not directly comparable to ours due to non-matching experimental setups (e.g., datasets and architectures). However, Figure 6 in the appendix of Rothchild et al. (2020) mentions upload compression rates at iso-accuracy that are weaker than those obtained here with product quantization.

5.3. ABLATION STUDIES

We investigate the influence of the frequency of updates of the compression operator q for scalar quantization and pruning, and study the influence of the SECAGG bit-width p on the number of overflows for scalar quantization. Update frequency of the compression operators. In Figure 3 , we show that for scalar quantization, the update periodicity only plays a role with low SECAGG bit-width values p compared to the quantization bit-width b. For product quantization, the update periodicity plays an important role for aggressive compression setups corresponding to large block sizes d or to a smaller number of codewords k. For pruning, we measure the impact of masks that are refreshed periodically. We observe that if we refresh the compression operator more frequently, staleness is reduced, leading to accuracy improvements. We present our findings in Appendix A.2.6. Overflows for scalar quantization. As discussed in Section 4.2.1, we choose the SECAGG bitwidth p to be greater than quantization bit-width b in order to avoid aggregation overflows. While it suffices to set p to be ⌈log 2 n c ⌉ more than b, where n c is the number of clients participating in the round, reducing p is desirable to reduce uplink size. We study the impact of p on the percentage of parameters that suffer overflows and present our findings in Appendix A.2.2.

6. LIMITATIONS AND FUTURE WORK

Compatibility with DP. As mentioned in Section 1, we may want both SECAGG and Differential Privacy (Abadi et al., 2016) to realize the full promise of FL as a privacy-enhancing technology. While our primary focus is on enabling efficient and secure uplink communication, we emphasize that the proposed approaches are compatible with user-level DP. For instance, at the cost of increasing the complexity of the trusted computing base, DP noise can be added natively by the TEE with our modified random pruning or scalar quantization approaches. For PQ and SECIND, we can have the TEE to add noise in the assignment space (outputting a noisy histogram), or to map the histogram to the codeword space and add noise there. Each option offers a different tradeoff between privacy, trust, and accuracy; we leave detailed evaluation to future work. 

7. CONCLUSION

In this paper, we reconcile efficiency and security for uplink communication in Federated Learning. We propose to adapt existing compression mechanisms such as scalar quantization and pruning to the secure aggregation protocol by imposing a linearity constraint on the decompression operator. Our experiments demonstrate that we can adapt both quantization and pruning mechanisms to obtain a high degree of uplink compression with minimal degradation in performance and higher security guarantees. For achieving the highest rates of compression, we introduce SECIND, a variant of SECAGG well-suited for TEE-based implementation that supports product quantization while maintaining a high security bar. We plan to extend our work to other federated learning scenarios, such as asynchronous FL, and further investigate the interaction of compression and privacy.

A APPENDIX

A.1 EXPERIMENTAL DETAILS In this section, we provide further details of the experimental setup described in Section 5.1 and the hyper-parameters used for all the runs in Table 1 . For all the tasks, we use a mini-batch SGD optimizer for local training at the client and FEDAVG optimizer for global model update on the server. The LEAF benchmark is released under the BSD 2-Clause License. Baselines. We run hyper-parameter sweeps to tune the client and server learning rates for the uncompressed baselines. Then, we keep the same hyper-parameters in all the runs involving uplink compression. We have observed that tuning the hyper-parameters for each compression factor does not provide significantly different results than using those for the uncompressed baselines, in addition to the high cost of model training involved. Compression details. For scalar quantization, we use per-tensor quantization with MinMax observers. We use the symmetric quantization scheme over the integer range [-2 b-1 , 2 b-1 -1]. For pruning, we compute the random mask separately for each tensor, ensuring all pruned layers have the same target sparsity in their individual updates. For product quantization, we explore various configurations by choosing the number of codewords per layer k in {8, 16, 32, 64} and the block size d in {4, 9, 18}. We automatically adapt the block size for each layer to be the largest allowed one that divides C in (in the fully connected case). We provide various additional experimental results that are referred to in the main paper.

A.2.1 TABLES CORRESPONDING TO FIGURE 2

We provide the detailed results corresponding to Figure 2 along with standard deviation over 3 runs in Tables 4, 5 , and 6.

A.2.2 AGGREGATION OVERFLOWS WITH SCALAR QUANTIZATION

We discussed the challenge of aggregation overflows of quantized values with restricted SECAGG finite group size in Section 3.2.1 and noted in Section 4.2.1 that it suffices for SECAGG bit-width p to be greater than quantization bit-width b by at most ⌈log 2 N ⌉, where N is the number of clients participating in a given round. However, the overflow margin increases the client update size by p -b per weight. To optimize this further, we explore the impact of p on aggregation overflows and accuracy, and present the results in Table 2 . As expected, we observe a decrease in percentage of weights that overflow during aggregation with the increase in the overflow margin size. However, while there is some benefit to non-zero overflow margin size, there is no strong correlation between the overflow margin size and accuracy, indicating the potential to achieve better utility even in the presence of overflows.

A.2.3 WEIGHTED AGGREGATION AND SCALAR QUANTIZATION

Following the setup of Nguyen et al. (2022) , we weight each client update by the number of samples the client trained on. Denoting the weight associated with the client i with ω i and following the same notations as in Section 4.1, weighted update is obtained as Note that the effect of refreshing the pruning masks is more apparent at higher sparsity levels, and generalization performance decreases when masks are stale for longer during training. h i = (q(g i ) × ω i ) + m i . Since this is

A.2.5 CONVERGENCE CURVES

We also provide convergence curves for PQ-compressed and baseline runs to demonstrate similar number of rounds needed to convergence in Figure 4 .

A.2.6 PERFORMANCE IMPACT OF SPARSITY MASK REFRESH

In addition to scalar and product quantization as described in Section 5.3, we also conduct experiments with varying the interval for refreshing pruning masks. We consider two levels of sparsity, 50% and 99% and our experiments are on the CelebA dataset. We present our results in Figure 5 . Overall we find that the model accuracy is robust to the update periodicity unless at very high sparsities, where accuracy decreases when mask refresh periodicity increases. This is important for future directions such as in asynchronous FL where clients have to maintain the same mask across successive global updates.

A.3 SECIND IMPLEMENTATIONS

SECIND can be extended to other settings, such as multi-party computation (using two or more servers to operate on shares of the input), where each client can send evaluations of distributed point functions to encode each assignment (Boyle et al., 2016) . These are represented compactly, but may require longer codewords to overcome the overheads. We leave the study of such software implementations of SECIND to future work. 



US Federal Communications Commission. FL is typically restricted to using unmetered connections, usually over Wi-Fi(Huba et al., 2022). See the torch.sparse documentation.



Recover assignment of client i for block (m, m)8: H m,n [r] ← H m,n [r] + 1▷ Update global count for codeword index r 9:

Figure 3: Impact of the refresh rate of the compression operator by the server on the CelebA dataset. Left: for scalar quantization (quantization parameters), where we fix the quantization bit-width b = 8 (p denotes the SECAGG bit-width). Right: for product quantization (codebooks), where k denotes the number of codewords and d the block size.

Privacy. A separate line of work aims to combine communication efficiency and privacy. For instance, Triastcyn et al. (2021) develop a method that unifies compressed communication and DP (where integration with SECAGG is left as an open problem), while Chaudhuri et al. (2022) design a privacy-aware scalar compression mechanism within the local differential privacy model.

Figure5: Impact of pruning mask refresh intervals on model performance for the CelebA dataset. Note that the effect of refreshing the pruning masks is more apparent at higher sparsity levels, and generalization performance decreases when masks are stale for longer during training.

This has led to significant interest in reducing FL's communication requirements. In what follows, we might refer to any local model update in a distributed training procedure as a gradient, including model updates computed following multiple local training steps.

Hyper-parameters used for all the experiments including baselines. η is the learning rate.DatasetUsers per round Client epochs Max. server epochs η SGD η FedAvg

Percentage of aggregation overflows (among all model parameters) for the CelebA dataset over various SQ configurations. b is Quantization bit-width, p is SECAGG bit-width, p -b is overflow margin size in bits.

Cost of broadcasting codebooks (for downlink communications) is negligible compared to model sizes. Recall that k denotes the number of codebooks and d the block size.Figure4: Number of rounds to convergence is similar for PQ-compressed runs compared to the non-compressed baseline (on CelebA). Note that outside a simulated environment, the wall-clock time convergence for PQ runs would be lower than the baseline since uplink communications would be faster.

Results of client update compression with SECAGG-compatible scalar quantization on LEAF datasets over three runs. We fix p across runs this defines the uplink size, but not b. We pick the run with the best accuracy and report the corresponding b.

Results of client update compression with SECAGG-compatible random mask pruning on LEAF datasets Dataset Sparsity Uplink size (in KB) Compression factor Accuracy

Results of scalar quantization on LEAF datasets with unweighted client update aggregation over three runs. We fix p across runs as this defines the uplink size, but not b. We pick the run with the best accuracy and report the corresponding b.

