FLGAME: A GAME-THEORETIC DEFENSE AGAINST BACKDOOR ATTACKS IN FEDERATED LEARNING

Abstract

Federated learning enables the distributed training paradigm, where multiple local clients jointly train a global model without needing to share their local training data. However, recent studies have shown that federated learning provides an additional surface for backdoor attacks. For instance, an attacker can compromise a subset of clients and thus corrupt the global model to incorrectly predict an attacker-chosen target class given any input embedded with the backdoor trigger. Existing defenses for federated learning against backdoor attacks usually detect and exclude the corrupted information from the compromised clients based on a static attacker model. Such defenses, however, are less effective when faced with dynamic attackers who can strategically adapt their attack strategies. In this work, we model the strategic interaction between the (global) defender and attacker as a minimax game. Based on the analysis of our model, we design an interactive defense mechanism that we call FLGAME. Theoretically, we prove that under mild assumptions, the global model trained with FLGAME under backdoor attacks is close to that trained without attacks. Empirically, we perform extensive evaluations on benchmark datasets and compare FLGAME with multiple state-ofthe-art baselines. Our experimental results show that FLGAME can effectively defend against strategic attackers and achieves significantly higher robustness than baselines.

1. INTRODUCTION

Federated learning (FL) (McMahan et al., 2017a) aims to train machine learning models (called global models) over training data that is distributed across multiple clients (e.g., mobile phones, IoT devices). FL has been widely used in many real-world applications such as finance (Long et al., 2020) and healthcare (Long et al., 2022) . FL trains a global model in an iterative manner. In each communication round, a cloud server shares its global model with selected clients; each selected client uses the global model to initialize its local model, then utilizes its local training dataset to train the local model, and finally sends the local model update to the server; the server uses an aggregation rule to aggregate local model updates from clients to update its global model. Due to the distributed nature of FL, many recent studies (Bhagoji et al., 2019; Bagdasaryan et al., 2020; Baruch et al., 2019; Wang et al., 2020; Kairouz et al., 2021) have shown that it is vulnerable to backdoor attacks. For instance, an attacker can compromise a subset of clients and manipulate their local training datasets to corrupt the global model such that it predicts an attacker-chosen target class for any inputs embedded with a backdoor trigger (Bagdasaryan et al., 2020) . To defend against backdoor attacks, many defenses (Sun et al., 2019; Cao et al., 2021a) have been proposed. For example, Sun et al. (2019) proposed to clip the norm of the local model update from each client such that its L 2 -norm was no larger than a defender-chosen threshold. Cao et al. (2021a) proposed FLTrust in which a server computes a local model update itself and computes its similarity with that of a client as the trust score, which is leveraged when updating the global model. However, all of those defenses consider a static attack model where an attacker does not adapt its attack strategies. As a result, they are less effective under adaptive attacks, e.g., Wang et al. (2020) showed that the defenses proposed in (Sun et al., 2019; Blanchard et al., 2017) can be bypassed by appropriately designed attacks. Our contribution: In this work, we propose FLGAME, a game-theoretic defense against backdoor attacks to FL. Specifically, we formulate FLGAME as a minimax game between the server (defender) and attacker, which enables them to strategically adapt their defense and attack strategies. In the rest of the paper, we use the terms benign client to denote a valid/un-compromised client and genuine score to quantify the extent to which a client is benign. Our key idea is that the server can compute a genuine score for each client whose value is large (or small) if the client is benign (or compromised) in each communication round. The genuine score serves as a weight for the local model update of the client when used to update the global model. The goal of the defender is to minimize the genuine scores for compromised clients and maximize them for benign ones. To solve the resulting minimax game for the defender, we follow a three-step process consisting of 1) building an auxiliary global model, 2) exploiting it to reverse engineer a backdoor trigger and target class, and 3) inspecting whether the local model of a client will predict an input embedded with the reverse engineered backdoor trigger as the target class to compute a genuine score for the client. Based on the deployed defense, the goal of the attacker is to optimize its attack strategy by maximizing the effectiveness of the backdoor attack. Our key observation is that the attack effectiveness is determined by two factors: genuine score and the local model of the client. We optimize the attack strategy with respect to those two factors to maximize the effectiveness of backdoor attacks against our defense. We perform both theoretical analysis and empirical evaluations for FLGAME. Theoretically, we prove that the global model trained with our defense under backdoor attacks is close to that trained without attacks (measured by L 2 -norm of global model parameters difference). Empirically, we evaluate FLGAME on benchmark datasets to demonstrate its effectiveness under state-of-the-art backdoor attacks. Moreover, we compare it with state-of-the-art baselines. Our results indicate that FLGAME outperforms them by a significant margin. Our key contributions can be summarized as follows: • We propose a game-theoretic defense FLGAME. We formulate FLGAME as a minimax game between the defender and attacker, which enables them to strategically optimize their defense and attack strategies. • We theoretically analyze the robustness of FLGAME. In particular, we show that the global model trained with FLGAME under backdoor attacks is close to that without attacks. • We perform a systematic evaluation of FLGAME on benchmark datasets and demonstrate that FLGAME significantly outperforms state-of-the-art baselines.

2. RELATED WORK

Backdoor attacks on federated learning: In backdoor attacks to FL (Bhagoji et al., 2019; Bagdasaryan et al., 2020; Baruch et al., 2019; Wang et al., 2020; Zhang et al., 2022b) , an attacker aims to make a global model predict a target class for any input embedded with a backdoor trigger via compromised clients. Defenses for Federated learning against backdoor attacks: Many defenses (Sun et al., 2019; Cao et al., 2021a; Ozdayi et al., 2021; Wu et al., 2020; Rieger et al., 2022; Nguyen et al., 2022) Trimmed Mean (Yin et al., 2018), and Median (Yin et al., 2018) . However, all of those defenses consider a static attacker model. As a result, they become less effective against dynamic attackers who strategically adapt their attack strategies.



For instance, Bagdasaryan et al. (2020) proposed scaling attack in which an attacker uses a mix of backdoored and clean training examples to train its local model and then scales the local model update by a factor before sending it to the server. Xie et al. (2019) proposed distributed backdoor attack to FL. Roughly speaking, the idea is to decompose a backdoor trigger into different sub-triggers and embed each of them to the local training dataset of different compromised clients. In our work, we will leverage those attacks to perform strategic backdoor attacks to our defense.

were proposed to mitigate backdoor attacks to FL.For instance, Sun et al. (2019)  proposed norm-clipping which clips the norm of the local model update of a client such that its norm is no larger than a threshold. They also extended differential privacy(Dwork et al., 2006; Abadi et al., 2016; McMahan  et al., 2017b)  to mitigate backdoor attacks to federated learning. The idea is to clip the local model update and add Gaussian noise to it.Cao et al. (2021a)  proposed FLTrust which leveraged the similarity of the local model update of a client with that computed by the server itself on its clean dataset. Other defenses include Byzantine-robust FL methods such asKrum (Blanchard et al., 2017),

