THINKING TWO MOVES AHEAD: ANTICIPATING OTHER USERS IMPROVES BACKDOOR ATTACKS IN FEDERATED LEARNING

Abstract

Federated learning is particularly susceptible to model poisoning and backdoor attacks because individual users have direct control over the training data and model updates. At the same time, the attack power of an individual user is limited because their updates are quickly drowned out by those of many other users. Existing attacks do not account for future behaviors of other users, and thus require many sequential updates and their effects are quickly erased. We propose an attack that anticipates and accounts for the entire federated learning pipeline, including behaviors of other clients, and ensures that backdoors are effective quickly and persist even after multiple rounds of community updates. We show that this new attack is effective in realistic scenarios where the attacker only contributes to a small fraction of randomly sampled rounds and demonstrate this attack on image classification, next-word prediction, and sentiment analysis.

1. INTRODUCTION

When training models on private information, it is desirable to choose a learning paradigm that does not require stockpiling user data in a central location. Federated learning (Konečný et al., 2015; McMahan et al., 2017b) Unfortunately, by placing responsibility for model updates in the handle of many anonymous users, federated learning also opens up model training to a range of malicious attacks (Bagdasaryan et al., 2019; Kairouz et al., 2021) . In model poisoning attacks (Biggio & Roli, 2018; Bhagoji et al., 2019) , a user sends malicious updates to the central server to alter behavior of the model. For example in language modeling, backdoor attacks could modify the behavior of the final model to misrepresent specific facts, attach negative sentiment to certain groups, change behavior in edge cases, but also attach false advertising and spam to certain key phrases. In practical applications, however, the real threat posed by such attacks is debated (Sun et al., 2019b; Wang et al., 2020; Shejwalkar et al., 2021) . Usually only a small fraction of users are presumed to be malicious, and their impact on the final model can be small, especially when the contributions of each user are limited by norm-bounding (Sun et al., 2019b) . Attacks as described in Bagdasaryan & Shmatikov (2021) further require successive attacks over numerous sequential rounds of training. This is not realistic in normal cross-device applications (Bonawitz et al., 2019; Hard et al., 2019) where users are randomly selected in each round from a larger pool, making it exceedingly unlikely that any attacker or even group of attackers will be able to contribute to more than a fraction of the total rounds of training. Model updates that are limited in this way are immediately less effective, as even strong backdoor attacks can be wiped away and replaced by subsequent updates from many benign users Sun et al. (2019b); Shejwalkar et al. (2021) . In this work we set out to discover whether strong attacks are possible in these more realistic scenarios. We make the key observation that previous attack algorithms such as described in Bagdasaryan et al. ( 2019 2021) only consider the immediate effects of a model update, and ignore the downstream impacts of updates from benign users.We show that, by modeling these future updates, a savvy attacker can update model parameters in a way that is unlikely to be over-written or undone by benign users. By backpropagating through simulated future updates, our proposed attack directly optimizes a malicious update to maximize its permanence. Using both vision and language tasks, and under a realistic threat model where attack opportunities are rare, we see that these novel attacks become operational after fewer attack opportunities than baseline methods, and remain active for much longer after the attack has passed as shown in Figure 1 .

2. BACKGROUND

Federated Learning systems have been described in a series of studies and a variety of protocols. In this work, we focus on mainly on federated averaging (fedAVG) as proposed in McMahan et al. (2017b) and implemented in a range of recent system designs (Bonawitz et al., 2019; Paulik et al., 2021; Dimitriadis et al., 2022) , but the attack we describe can be extended to other algorithms. In fedAVG, the server sends the current state of the model θ i to all users selected for the next round of training. Each user then computed an updated local model through several iterations, for example via local SGD. The u-th local user has data D which is partitioned into batches D u and then, starting from the global model, their local model is updated for m steps based on the training objective L: θ u i+1,j+1 = θ u i,j -τ ∇L(D u , θ u i,j ), for j = 1, . . . , m. The updated models θ u i+1,m from each user are returned to the server which computes a new central state by averaging: θ i+1 = 1 n n u=1 θ u i+1,m . We will later summarize this procedure that depends on a group of users U i in the i-th round as θ i+1 = F avg (U i , θ i ). Optionally, the average can be reweighted based on the amount of data controlled by each user (Bonawitz et al., 2017) , however this is unsafe without further precautions, as an attacker could overweight their own contributions such that we only consider unweighted averages in this work. Federated Averaging is further safeguarded against malicious users by the use of norm-bounding. Each updated model θ i,u is projected onto an ||θ i,u || p ≤ C, for some clip value C so that no user update can dominate the average. Norm-bounding is necessary to defend against model replacment attacks described in Bagdasaryan et al. (2019) and Bhagoji et al. (2019) which send malicious updates with extreme magnitudes that overpower updates from benign users. Once norm-bounding is in place as a defense though, the potential threat posed by malicious attacks remains debated. We summarize a few related areas of research, before returning to this question:



Figure1: Our method, Anticipate, reaches 100% backdoor accuracy faster than the baseline in the setting of 100 random attacks in the first 500 rounds. Moreover, after the window of attack passes, the attack decays much slower than the baseline. At the end of federated training, our attack still has backdoor accuracy of 60%, while the baseline maintains just 20%. Overall, only 100 out of a total of 20k contributions are malicious.

); Wang et al. (2020); Zhou et al. (

