FLGAME: A GAME-THEORETIC DEFENSE AGAINST BACKDOOR ATTACKS IN FEDERATED LEARNING

Abstract

Federated learning enables the distributed training paradigm, where multiple local clients jointly train a global model without needing to share their local training data. However, recent studies have shown that federated learning provides an additional surface for backdoor attacks. For instance, an attacker can compromise a subset of clients and thus corrupt the global model to incorrectly predict an attacker-chosen target class given any input embedded with the backdoor trigger. Existing defenses for federated learning against backdoor attacks usually detect and exclude the corrupted information from the compromised clients based on a static attacker model. Such defenses, however, are less effective when faced with dynamic attackers who can strategically adapt their attack strategies. In this work, we model the strategic interaction between the (global) defender and attacker as a minimax game. Based on the analysis of our model, we design an interactive defense mechanism that we call FLGAME. Theoretically, we prove that under mild assumptions, the global model trained with FLGAME under backdoor attacks is close to that trained without attacks. Empirically, we perform extensive evaluations on benchmark datasets and compare FLGAME with multiple state-ofthe-art baselines. Our experimental results show that FLGAME can effectively defend against strategic attackers and achieves significantly higher robustness than baselines.

1. INTRODUCTION

Federated learning (FL) (McMahan et al., 2017a) aims to train machine learning models (called global models) over training data that is distributed across multiple clients (e.g., mobile phones, IoT devices). FL has been widely used in many real-world applications such as finance (Long et al., 2020) and healthcare (Long et al., 2022) . FL trains a global model in an iterative manner. In each communication round, a cloud server shares its global model with selected clients; each selected client uses the global model to initialize its local model, then utilizes its local training dataset to train the local model, and finally sends the local model update to the server; the server uses an aggregation rule to aggregate local model updates from clients to update its global model. Due to the distributed nature of FL, many recent studies (Bhagoji et al., 2019; Bagdasaryan et al., 2020; Baruch et al., 2019; Wang et al., 2020; Kairouz et al., 2021) have shown that it is vulnerable to backdoor attacks. For instance, an attacker can compromise a subset of clients and manipulate their local training datasets to corrupt the global model such that it predicts an attacker-chosen target class for any inputs embedded with a backdoor trigger (Bagdasaryan et al., 2020) . To defend against backdoor attacks, many defenses (Sun et al., 2019; Cao et al., 2021a) have been proposed. For example, Sun et al. (2019) proposed to clip the norm of the local model update from each client such that its L 2 -norm was no larger than a defender-chosen threshold. Cao et al. (2021a) proposed FLTrust in which a server computes a local model update itself and computes its similarity with that of a client as the trust score, which is leveraged when updating the global model. However, all of those defenses consider a static attack model where an attacker does not adapt its attack strategies. As a result, they are less effective under adaptive attacks, e.g., Wang et al. (2020) showed that the defenses proposed in (Sun et al., 2019; Blanchard et al., 2017) can be bypassed by appropriately designed attacks. Our contribution: In this work, we propose FLGAME, a game-theoretic defense against backdoor attacks to FL. Specifically, we formulate FLGAME as a minimax game between the server (defender)

