INVARIANT AGGREGATOR FOR DEFENDING AGAINST FEDERATED BACKDOOR ATTACKS

Abstract

Federated learning is gaining popularity as it enables training of high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Specifically, an adversary can perform backdoor attacks to control model predictions via poisoning the training dataset with a trigger. In this work, we propose a mitigation for backdoor attacks in a federated learning setup. Our solution forces the model optimization trajectory to focus on the invariant directions that are generally useful for utility and avoid selecting directions that favor few and possibly malicious clients. Concretely, we consider the sign consistency of the pseudo-gradient (the client update) as an estimation of the invariance. Following this, our approach performs dimension-wise filtering to remove pseudo-gradient elements with low sign consistency. Then, a robust mean estimator eliminates outliers among the remaining dimensions. Our theoretical analysis further shows the necessity of the defense combination and illustrates how our proposed solution defends the federated learning model. Empirical results on three datasets with different modalities and varying number of clients show that our approach mitigates backdoor attacks with a negligible cost on the model utility.

1. INTRODUCTION

Federated learning enables multiple distrusting clients to jointly train a machine learning model without sharing their private data directly. However, a rising concern in this setting is the ability of potentially malicious clients to perpetrate backdoor attacks. To this end, it has been argued that conducting backdoor attacks in a federated learning setup is practical (Shejwalkar et al., 2022) and can be effective (Wang et al., 2020) . For instance, the adversary can connect to a federated learning system as a legitimate user and conduct a backdoor attack that forces the model to mispredict. The impact of such attacks is quite severe in many mission-critical federated learning applications. For example, anomaly detection is a common federated learning task where multiple parties (e.g., banks or email users) collaboratively train a model that detects frauds or phishing emails. Backdoor attacks allow the adversary to successfully circumvent these detection methods. The most common backdoor attack embeds triggers in the data samples and forces the model to make an adversary-specified prediction when the trigger is observed (Liu et al., 2018; Bagdasaryan et al., 2020) . Thus, an adversary can conduct a backdoor attack by generating a trigger that statistically correlates with a particular label. Once the adversary injects these trigger-embedded backdoor data samples into the training data, the model can entangle the trigger-label correlation and predict as the adversary specifies. Meanwhile, the backdoor attack often does not degrade the predictive accuracy on the benign samples, making backdoor detection difficult in practice (Wang et al., 2020) . In federated learning, the server aggregates only the client-level updates (a.k.a. pseudo-gradient or gradient for short) without control over the training procedure or any data samples. Such limited visibility of the federated learning server on the client-side training makes defending against backdoor attacks challenging. Common defenses against backdoor attacks aim at identifying the backdoor data samples or poisoned model parameters and usually require access to at least a subset of the training data (Tran et al., 2018; Li et al., 2021a) , which is prohibitive for a federated learning server. Other defense methods against untargeted poisoning attacks that degrade the model utility

