FLIP: A PROVABLE DEFENSE FRAMEWORK FOR BACKDOOR MITIGATION IN FEDERATED LEARNING

Abstract

Federated Learning (FL) is a distributed learning paradigm that enables different parties to train a model together for high quality and strong privacy protection. In this scenario, individual participants may get compromised and perform backdoor attacks by poisoning the data (or gradients). Existing work on robust aggregation and certified FL robustness does not study how hardening benign clients can affect the global model (and the malicious clients). In this work, we theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting. Moreover, we propose a trigger reverse engineering based defense and show that our method can achieve robustness improvement with guarantee (i.e., reducing the attack success rate) without affecting benign accuracy. We conduct comprehensive experiments across different datasets and attack settings. Our results on nine competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks. Code is available at https://github.com/KaiyuanZh/FLIP.

1. INTRODUCTION

Federated Learning (FL) is a distributed learning paradigm with many applications, such as next word prediction (McMahan et al., 2017) , credit prediction (Cheng et al., 2021a) , and IoT device aggregation (Samarakoon et al., 2018) . FL promises scalability and privacy as its training is distributed to many clients. Due to the decentralized nature of FL, recent studies demonstrate that individual participants may be compromised and become susceptible to backdoor attacks (Bagdasaryan et al., 2020; Bhagoji et al., 2019; Xie et al., 2019; Wang et al., 2020a; Sun et al., 2019) . Backdoor attacks aim to make any inputs stamped with a specific pattern misclassified to a target label. Backdoors are hence becoming a prominent security threat to the real-world deployment of federated learning. Deficiencies of Existing Defense. Existing FL defense methods mainly fall into two categories, robust aggregation (Fung et al., 2020; Pillutla et al., 2022; Fung et al., 2020; Blanchard et al., 2017; El Mhamdi et al., 2018; Chen et al., 2017) which detects and rejects malicious weights, and certified defense (Cohen et al., 2019; Xiang et al., 2021; Levine & Feizi, 2020; Panda et al., 2022; Cao et al., 2021) which provides robustness certification in the presence of backdoors with limited magnitude. Some of them need a large number of clean samples in the global server (Lin et al., 2020b; Li et al., 2020a) , which violates the essence of FL. Others require inspecting model weights (Aramoon et al., 2021) , which may cause information leakage of local clients. Existing model inversion techniques (Fredrikson et al., 2015; Ganju et al., 2018; An et al., 2022) have shown the feasibility of exploiting model weights for privacy gains. Besides, existing defense methods based on weights clustering (Blanchard et al., 2017; Nguyen et al., 2021) either reject benign weights, causing degradation on model training performance, or accept malicious weights, leaving backdoor effective. According to our results in the experiment section, the majority of existing methods only work in the single-shot attack setting where only a small set of adversaries participate in a few rounds and fall short in the stronger and stealthier continuous attack setting where the attackers continuously participate in the entire FL training. FLIP. In this paper, we propose a Federated LearnIng Provable defense framework (FLIP) that provides theoretical guarantees. For each benign local client, FLIP adversarially trains the local

