BAFFLE: TOWARDS RESOLVING FEDERATED LEARN-ING'S DILEMMA -THWARTING BACKDOOR AND INFERENCE ATTACKS

Abstract

Recently, federated learning (FL) has been subject to both security and privacy attacks posing a dilemmatic challenge on the underlying algorithmic designs: On the one hand, FL is shown to be vulnerable to backdoor attacks that stealthily manipulate the global model output using malicious model updates, and on the other hand, FL is shown vulnerable to inference attacks by a malicious aggregator inferring information about clients' data from their model updates. Unfortunately, existing defenses against these attacks are insufficient and mitigating both attacks at the same time is highly challenging, because while defeating backdoor attacks requires the analysis of model updates, protection against inference attacks prohibits access to the model updates to avoid information leakage. In this work, we introduce BAFFLE, a novel in-depth defense for FL that tackles this challenge. To mitigate backdoor attacks, it applies a multilayered defense by using a Model Filtering layer to detect and reject malicious model updates and a Poison Elimination layer to eliminate any effect of a remaining undetected weak manipulation. To impede inference attacks, we build private BAFFLE that securely evaluates the BAFFLE algorithm under encryption using sophisticated secure computation techniques. We extensively evaluate BAFFLE against state-of-the-art backdoor attacks on several datasets and applications, including image classification, word prediction, and IoT intrusion detection. We show that BAFFLE can entirely remove backdoors with a negligible effect on accuracy and that private BAFFLE is practical.

1. INTRODUCTION

Federated learning (FL) is an emerging collaborative machine learning trend with many applications such as next word prediction for mobile keyboards (McMahan & Ramage, 2017) , medical imaging (Sheller et al., 2018a) , and intrusion detection for IoT (Nguyen et al., 2019) . In FL, clients locally train model updates using private data and provide these to a central aggregator who combines them to a global model that is sent back to clients for the next training iteration. FL offers efficiency and scalability as the training is distributed among many clients and executed in parallel (Bonawitz et al., 2019) . In particular, FL improves privacy by enabling clients to keep their training data locally (McMahan et al., 2017) . This is not only relevant for compliance to legal obligations such as the GDPR ( 2018), but also in general when processing personal and sensitive data. Despite its benefits, FL is vulnerable to backdoor (Bagdasaryan et al., 2020; Nguyen et al., 2020; Xie et al., 2020) and inference attacks (Pyrgelis et al., 2018; Shokri et al., 2017; Ganju et al., 2018) . In the former, the adversary stealthily manipulates the global model so that attacker-chosen inputs result in wrong predictions chosen by the adversary. Existing backdoor defenses, e.g., (Shen et al., 2016; Blanchard et al., 2017) fail to effectively protect against state-of-the-art backdoor attacks, e.g., constrain-and -scale (Bagdasaryan et al., 2020) and DBA (Xie et al., 2020) . In inference attacks, the adversary aims at learning information about the clients' local data by analyzing their model updates. Mitigating both attack types at the same time is highly challenging due to a dilemma: Backdoor defenses require access to the clients' model updates, whereas inference mitigation strategies prohibit this to avoid information leakage. No solution currently exists that defends against both attacks at the same time ( §6). Our Goals and Contributions. In this paper, we provide the following contributions: 1. BAFFLE, a novel generic FL defense system that simultaneously protects both the security and the data privacy of FL by effectively preventing backdoor and inference attacks. To the best of our knowledge, this is the first work that discusses and tackles this dilemma, i.e., no existing defense against backdoor attacks preserves the privacy of the clients' data ( §4). 2. To the best of our knowledge, we are the first to point out that combining clustering, clipping, and noising can prevent the adversary to trade-off between attack impact and attack stealthiness. However, the naïve combination of these two classes of defenses is not effective to defend against sophisticated backdoor attacks. Therefore, we introduce a novel backdoor defense (cf. Alg. 1) that has three-folds of novelty: (1) a novel two-layer defense, (2) a new dynamic clustering approach ( §3.1), and (3) a new adaptive threshold tuning scheme for clipping and noising ( §3.2). The clustering component filters out malicious model updates with high attack impact while adaptive smoothing, clipping, and noising eliminate potentially remaining malicious model contributions. Moreover, BAFFLE is able to mitigate more complex attack scenarios like the simultaneous injection of different backdoors by several adversaries that cannot be handled in existing defenses ( §3). 3. We design tailored efficient secure (two-party) computation protocols for BAFFLE resulting in private BAFFLE, the first privacy-preserving backdoor defense that also inhibits inference attacks ( §4). To the best of our knowledge, no existing defense against backdoor attacks preserves the privacy of the clients' data ( §6). 4. We demonstrate BAFFLE's effectiveness against backdoor attacks through an extensive evaluation on various datasets and applications ( §5). Beyond mitigating state-of-the-art backdoor attacks, we also show that BAFFLE succeeds to thwart adaptive attacks that optimize the attack strategy to circumvent BAFFLE ( §5.1). 5. We evaluate the overhead of applying secure two-party computation to demonstrate the efficiency of private BAFFLE. A training iteration of private BAFFLE for a neural network with 2.7 million parameters and 50 clients on CIFAR-10 takes less than 13 minutes ( §5.3). , where n i = D i , n = Σ K i=1 n i (cf. Alg. 2 and Alg. 3 in §A for details). In practice, previous works employ equal weights (n i = n/K) for the contributions of all clients (Bagdasaryan et al., 2020; Xie et al., 2020) . We adopt this approach, i.e., we set G t = Σ K i=1 Wi K . Adversary model: In typical FL settings, there are two adversaries: malicious clients that try to inject backdoors into the global model and honest-but-curious (a.k.a. semi-honest) aggregators that correctly compute and follow the training protocols, but aim at (passively) gaining information about the training data of the clients through inference attacks (Bonawitz et al., 2017) . The former type of adversary A c has full control over K (K < K

2. BACKGROUND AND PROBLEM SETTING

2 ) clients and their training data, processes, and parameters (Bagdasaryan et al., 2020) . A c also has full knowledge of the aggregator's operations, including potentially applied backdooring defenses and can arbitrarily adapt its attack strategy at any time during the training like simultaneously injecting none, one, or several backdoors. However, A c has no control over any processes executed at the aggregator nor over the honest clients. The second adversary type, the honest-but-curious aggregator A s , has access to all local model updates W i , and can thus perform model inference attacks on each local model W i to extract information about the corresponding participant's data D i used for training W i . Backdoor attacks. The goals of A c are two-fold: (1) Impact: A c aims at manipulating the global model G t such that the modified model G t provides incorrect predictions G t (x) = c = G t (x), ∀x ∈ I A c , where I A c is a trigger set specific adversary-chosen inputs. (2) Stealthiness: In addition, A c seeks to make poisoned models and benign models indistinguishable to avoid detection. Model G t should therefore perform normally on all other inputs that are not in the trigger set, i.e., G t (x) = G t (x), ∀x ∈ I A c , and the dissimilarity (e.g., Euclidean distance) between a poisoned model W and a benign model W must be smaller than a threshold ε: W -W < ε. Inference Attacks. The honest-but-curious aggregator A s attempts to infer sensitive information about clients' data D i from their model updates W i (Pyrgelis et al., 2018; Shokri et al., 2017; Ganju 



learning (FL) is a concept for distributed machine learning where K clients and an aggregator A collaboratively build a global model G (McMahan et al., 2017). In training round t ∈ [1, T ], each client i ∈ [1, K] locally trains a local model W i (with p parameters/weights w 1 i , . . . , w p i ) based on the previous global model G t-1 using its local data D i and sends W i to A. Then, A aggregates the received models W i into the new global model G t by averaging the local models (weighted by the number of training samples used to train it): G t = Σ K i=1 ni×Wi n

