β-STOCHASTIC SIGN SGD: A BYZANTINE RESILIENT AND DIFFERENTIALLY PRIVATE GRADIENT COMPRES-SOR FOR FEDERATED LEARNING

Abstract

Federated Learning (FL) is a nascent privacy-preserving learning framework under which the local data of participating clients is kept locally throughout model training. Scarce communication resources and data heterogeneity are two defining characteristics of FL. Besides, a FL system is often implemented in a harsh environment, leaving the clients vulnerable to Byzantine attacks. To the best of our knowledge, no gradient compressors simultaneously achieve quantitative Byzantine resilience and privacy preservation. In this paper, we fill this gap via revisiting the stochastic sign SGD Jin et al. ( 2020). We propose β-stochastic sign SGD, which contains a gradient compressor that encodes a client's gradient information in sign bits subject to the privacy budget β > 0. We show that β-stochastic sign SGD converges in the presence of partial client participation, mobile static and adaptive Byzantine faults, and that it achieves quantifiable Byzantine-resilience and differential privacy simultaneously even with non-IID local data. We show that our compressor works for both bounded and unbounded stochastic gradients, i.e., both light-tailed and heavy-tailed distributions. As a byproduct, we show that when the clients report sign messages, the popular information aggregation rules simple mean, trimmed mean, median and majority vote are identical in terms of the output signs. Our theories are corroborated by experiments on MNIST and CIFAR-10 datasets.



). However, challenges remain. A FL system is often massive in scale and is implemented in harsh environment -leaving the clients vulnerable to unstructured faults such as Byzantine faults Lynch (1996) . Moreover, FL clients are privacy-sensitive. Despite clients' privacy is partially preserved via denying raw data access, quantitative privacy preservation is still desirable. Observing this, Bernstein et al. ( 2019) proposed signSGD with majority vote which is provably resilient to Byzantine faults. However, even in the absence of Byzantine faults, SignSGD fails to converge in the presence of non-IID data Safaryan & Richtárik (2021); Chen et al. (2020), and is not differentially private. To handle non-IID data, Jin et al. ( 2020) proposed stochastic sign SGD and its differentially-private (DP) variant, whose gradient compressors are simple yet elegant. Unfortunately, their DP variant does not converge 1 , and their standard stochastic sign SGD is not differentially-private (shown in our Theorem 1). We will discuss the relations between Jin et al. ( 2020) and our work in the related work. 1 Their Theorem 6 analysis contains major flaws. 1



FL) is a nascent learning framework that enables privacy sensitive clients to collectively train a model without disclosing their raw data McMahan et al. (2017); Kairouz et al. (2021). Expensive communication overhead and non-IID local data are two defining characteristics of FL. A variety of communication-saving techniques have been introduced, including periodic averaging McMahan et al. (2017), large mini-batch sizes Lin et al. (2020), and gradient compressors Xu et al. (2020); Alistarh et al. (2017); Bernstein et al. (2018; 2019); Jin et al. (2020); Safaryan et al. (2021); Wang et al. (

