RECONCILING SECURITY AND COMMUNICATION EFFI-CIENCY IN FEDERATED LEARNING

Abstract

Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40× compression in uplink communication with no meaningful loss in utility compared to uncompressed baselines.

1. INTRODUCTION

Federated Learning (FL) is a distributed machine learning (ML) paradigm that trains a model across a number of participating entities holding local data samples. In this work, we focus on cross-device FL that harnesses a large number (hundreds of millions) of edge devices with disparate characteristics such as availability, compute, memory, or connectivity resources (Kairouz et al., 2019) . Two challenges to the success of cross-device FL are privacy and scalability. FL was originally motivated for improving privacy since data points remain on client devices. However, as with other forms of ML, information about training data can be extracted via membership inference or reconstruction attacks on a trained model (Carlini et al., 2021a; b; Watson et al., 2022) , or leaked through local updates (Melis et al., 2019; Geiping et al., 2020) . Consequently, Secure Aggregation (SECAGG) protocols were introduced to prevent the server from directly observing individual client updates, which is a major vector for information leakage (Bonawitz et al., 2019; Huba et al., 2022) . Additional mitigations such as Differential Privacy (DP) may be required to offer further protection against attacks (Dwork et al., 2006; Abadi et al., 2016) , as discussed in Section 6. Ensuring scalability to populations of heterogeneous clients is the second challenge for FL. Indeed, wall-clock training times are highly correlated with increasing model and batch sizes (Huba et al., 2022) , even with recent efforts such as FedBuff (Nguyen et al., 2022) , and communication overhead between the server and clients dominates model convergence time. Consequently, compression techniques were used to reduce the communication bandwidth while maintaining model accuracy. However, a fundamental problem has been largely overlooked in the literature: in their native form, standard compression methods such as scalar quantization and pruning are not compatible with SECAGG. This makes it challenging to ensure both security and communication efficiency. In this paper, we address this gap by adapting compression techniques to make them compatible with SECAGG. We focus on compressing uplink updates from clients to the server for three reasons. First, uplink communication is more sensitive and so is subject to a high security bar, whereas downlink updates broadcast by the server are deemed public. Second, upload bandwidth is generally more restricted than download bandwidth. For instance, according to the most recent FCC 1 report, the Figure 1 : Summary of the proposed approach for one FL round, where we omit the round dependency and Differential Privacy (DP) for clarity. Blue boxes denote standard steps and red boxes denote additional steps for uplink compression. Client i computes local model update g i , compresses it with the compression operator q, and encrypts it by adding a random mask m i in the compressed domain, hence reducing the uplink bandwidth (steps 2-4). The server recovers the aggregate in the compressed domain by leveraging any SECAGG protocol (steps 7-13, with a TEE-based SECAGG, see Section 3.1). Since the decompression operator d is linear, the server can convert the aggregate back to the non-compressed domain, up to compression error (step 12). As with the model weights θ, the compression operator q are also periodically updated and broadcast by the server (step 14). In Section 4, we apply the proposed method to scalar quantization and pruning without impacting SECAGG and propose Secure Indexing, a variant of SECAGG for extreme uplink compression with product quantization. See Section 3.1 for details about SECAGG and Section 6 for a discussion on DP. ratio of download to upload speeds for DSL and cable providersfoot_1 in the US ranges between 3× to 20× (FCC, 2021). Finally, efficient uplink communication brings several benefits beyond speeding up convergence: lowering communication cost reduces selection bias due to under-sampling clients with limited connectivity, improving fairness and inclusiveness. It also shrinks the carbon footprint of FL, the fraction of which attributable to communication can reach 95% (Qiu et al., 2021) . In summary, we present the following contributions in this paper: • We highlight the fundamental mismatch between two critical components of the FL stack: SECAGG protocols and uplink compression mechanisms. • We formulate solutions by imposing a linearity constraint on the decompression operator, as illustrated in Figure 1 in the case of TEE-based SECAGG. • We adapt the popular scalar quantization and (random) pruning compression methods for compatibility with the FL stack that require no changes to the SECAGG protocol. • For extreme uplink compression without compromising security, we propose Secure Indexing (SECIND), a variant of SECAGG that supports product quantization. 



US Federal Communications Commission. FL is typically restricted to using unmetered connections, usually overWi-Fi (Huba et al., 2022).



Communication is identified as a primary efficiency bottleneck in FL, especially in the cross-device FL setting(Kairouz et al., 2019). This has led to significant interest in reducing FL's communication requirements. In what follows, we might refer to any local model update in a distributed training procedure as a gradient, including model updates computed following multiple local training steps. Efficient Distributed Optimization. There is a large body of literature on reducing the communication cost for distributed training. Seide et al. (2014) proposes quantizing gradients to one bit while carrying the quantization error forward across mini-batches with error feedback. Similarly, Wen et al. (2017) proposes layer-wise ternary gradients and Bernstein et al. (2018) suggests using only the sign

