SECURE BYZANTINE-ROBUST MACHINE LEARNING

Abstract

Increasingly machine learning systems are being deployed to edge servers and devices (e.g. mobile phones) and trained in a collaborative manner. Such distributed/federated/decentralized training raises a number of concerns about the robustness, privacy, and security of the procedure. While extensive work has been done in tackling with robustness, privacy, or security individually, their combination has rarely been studied. In this paper, we propose a secure two-server protocol that offers both input privacy and Byzantine-robustness. In addition, this protocol is communication-efficient, fault-tolerant and enjoys local differential privacy.

1. INTRODUCTION

Recent years have witnessed fast growth of successful machine learning applications based on data collected from decentralized user devices. Unfortunately, however, currently most of the important machine learning models on a societal level do not have their utility, control, and privacy aligned with the data ownership of the participants. This issue can be partially attributed to a fundamental conflict between the two leading paradigms of traditional centralized training of models on one hand, and decentralized/collaborative training schemes on the other hand. While centralized training violates the privacy rights of participating users, existing alternative training schemes are typically not robust. Malicious participants can sabotage the training system by feeding it wrong data intentionally, known as data poisoning. In this paper, we tackle this problem and propose a novel distributed training framework which offers both privacy and robustness. When applied to datasets containing personal data, the use of privacy-preserving techniques is currently required under regulations such as the General Data Protection Regulation (GDPR) or Health Insurance Portability and Accountability Act (HIPAA). The idea of training models on decentralized datasets and incrementally aggregating model updates via a central server motivates the federated learning paradigm (McMahan et al., 2016) . However, the averaging in federated learning, when viewed as a multi-party computation (MPC), does not preserve the input privacy because the server observes the models directly. The input privacy requires each party learns nothing more than the output of computation which in this paradigm means the aggregated model updates. To solve this problem, secure aggregation rules as proposed in (Bonawitz et al., 2017) achieve guaranteed input privacy. Such secure aggregation rules have found wider industry adoption recently e.g. by Google on Android phones (Bonawitz et al., 2019; Ramage & Mazzocchi, 2020) where input privacy guarantees can offer e.g. efficiency and exactness benefits compared to differential privacy (both can also be combined). The concept of Byzantine robustness has received considerable attention in the past few years for practical applications, as a way to make the training process robust to malicious actors. A Byzantine participant or worker can behave arbitrarily malicious, e.g. sending arbitrary updates to the server. This poses great challenge to the most widely used aggregation rules, e.g. simple average, since a single Byzantine worker can compromise the results of aggregation. A number of Byzantine-robust aggregation rules have been proposed recently (Blanchard et al., 2017; Muñoz-González et al., 2017; Alistarh et al., 2018; Mhamdi et al., 2018; Yin et al., 2018; Muñoz-González et al., 2019) and can be used as a building block for our proposed technique. Achieving both input privacy and Byzantine robustness however remained elusive so far, with Bagdasaryan et al. (2020) stating that robust rules "...are incompatible with secure aggregation". We here prove that this is not the case. Closest to our approach is (Pillutla et al., 2019) which tolerates data poisoning but does not offer Byzantine robustness. Prio (Corrigan-Gibbs & Boneh, 2017) is a private and robust aggregation system relying on secret-shared non-interactive proofs (SNIP). While their setting is similar to ours, the robustness they offer is limited to check the range of the input. Besides, the encoding for SNIP has to be affine-aggregable and is expensive for clients to compute. In this paper, we propose a secure aggregation framework with the help of two non-colluding honestbut-curious servers. This framework also tolerates server-worker collusion. In addition, we combine robustness and privacy at the cost of leaking only worker similarity information which is marginal for high-dimensional neural networks. Note that our focus is not to develop new defenses against state-of-the-art attacks, e.g. (Baruch et al., 2019; Xie et al., 2019b) . Instead, we focus on making arbitary current and future distance-based robust aggregation rules (e.g. Krum by Mhamdi et al. (2018) , RFA by Pillutla et al. ( 2019)) compatible with secure aggregation. Main contributions. We propose a novel distributed training framework which is • Privacy-preserving: our method keeps the input data of each user secure against any other user, and against our honest-but-curious servers. • Byzantine robust: our method offers Byzantine robustness and allows to incorporate existing robust aggregation rules, e.g. (Blanchard et al., 2017; Alistarh et al., 2018) . The results are exact, i.e. identical to the non-private robust methods. • Fault tolerant and easy to use: our method natively supports workers dropping out or newly joining the training process. It is also easy to implement and to understand for users. • Efficient and scalable: the computation and communication overhead of our method is negligible (less than a factor of 2) compared to non-private methods. Scalability in terms of cost including setup and communication is linear in the number of workers.

2. PROBLEM SETUP, PRIVACY, AND ROBUSTNESS

We consider the distributed setup of n user devices, which we call workers, with the help of two additional servers. Each worker i has its own private part of the training dataset. The workers want to collaboratively train a public model benefitting from the joint training data of all participants. In every training step, each worker computes its own private model update (e.g. a gradient based on its own data) denoted by the vector x i . The aggregation protocol aims to compute the sum z = n i=1 x i (or a robust version of this aggregation), which is then used to update a public model. While the result z is public in all cases, the protocol must keep each x i private from any adversary or other workers. Security model. We consider honest-but-curious servers which do not collude with each other but may collude with malicious workers. An honest-but-curious server follows the protocol but may try to inspect all messages. We also assume that all communication channels are secure. We guarantee the strong notion of input privacy, which means the servers and workers know nothing more about each other than what can be inferred from the public output of the aggregation z. Byzantine robustness model. We allow the standard Byzantine worker model which assumes that workers can send arbitrary adversarial messages trying to compromise the process. We assume that a fraction of up to α (< 0.5) of the workers is Byzantine, i.e. are malicious and not follow the protocol. Additive secret sharing. Secret sharing is a way to split any secret into multiple parts such that no part leaks the secret. Formally, suppose a scalar a is a secret and the secret holder shares it with k parties through secret-shared values a . In this paper, we only consider additive secret-sharing where a is a notation for the set {a i } k i=1 which satisfy a = k p=1 a p , with a p held by party p. Crucially, it must not be possible to reconstruct a from any a p . For vectors like x, their secret-shared values x are simply component-wise scalar secret-shared values. Two-server setting. We assume there are two non-colluding servers: model server (S1) and worker server (S2). S1 holds the output of each aggregation and thus also the machine learning model which is public to all workers. S2 holds intermediate values to perform Byzantine aggregation. Another key assumption is that the servers have no incentive to collude with workers, perhaps enforced via a potential huge penalty if exposed. It is realistic to assume that the communication link between the two servers S1 and S2 is faster than the individual links to the workers. To perform robust aggregation, the servers will need access to a sufficient number of Beaver's triples. These are data-independent values required to implement secure multiplication in MPC on both servers, and can be precomputed beforehand. For completeness, the classic algorithm for multiplication is given in in Appendix B.1.

