BACKDOORS STUCK AT THE FRONTDOOR: MULTI-AGENT BACKDOOR ATTACKS THAT BACKFIRE

Abstract

Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.

1. INTRODUCTION

Beyond training algorithms, the scale-up of model training depends strongly on the trust between agents. In collaborative learning and outsourced data collection training regimes, backdoor attacks and defenses (Gao et al., 2020; Li et al., 2021) are studied to mitigate a single malicious agent that perturbs train-time images for targeted test-time misclassifications. Outsourced data collection is common amongst industry practitioners, where Taigman et al. (2014) ; Papernot (2018) find a strong reliance on web-scraped data or third-party sourcing. Kumar et al. (2020) also find that data poisoning is viewed by practitioners as their most serious threat. In many practical situations, It is plausible in practice for >1 attacker, such as the poisoning of crowdsourced and agent-driven datasets on Google Images (hence afflicting subsequent scraped datasets) and financial market data respectively, or poisoning through human-in-the-loop learning on mobile devices or social network platforms. In this paper, we investigate the under-represented aspect of agent dynamics in backdoor attacks: what happens when multiple backdoor attackers are present? We simulate different configurations to study how the payoff landscape changes for attackers, with respect to standard attack/defense configurations, cooperative/non-cooperative behaviour, and joint distribution shifts. Our key contributions are: • We explore the novel scenario of the multi-agent backdoor attack. Our findings on the backfiring effect and a low equilibrium attack success rate indicate a stable, natural defense against backdoor attacks, and motivates us to propose the multi-agent setting as a baseline in future research. • We introduce a set of cooperative dynamics between multiple attackers, extending on existing backdoor attack procedures with respect to trigger pattern generation or trigger label selection. • We vary the sources of distribution shift, from just multiple backdoor perturbations to the inclusion of adversarial and stylized perturbations, to investigate changes to a wider scope of attack success.

2. RELATED WORK

Backdoor Attacks. We refer the reader to Gao et al. ( 2020); Li et al. ( 2021) for detailed backdoor literature. In poisoning attacks (Alfeld et al., 2016; Biggio et al., 2012; Jagielski et al., 2021; Koh & Liang, 2017; Xiao et al., 2015) , the attack objective is to reduce the accuracy of a model on clean samples. In backdoor attacks (Gu et al., 2019a) , the attack objective is to maximize the attack success rate in the presence of the trigger while retain the accuracy of the model on clean samples. To achieve this attack objective, there are different variants of attack vectors, such as code poisoning (Bagdasaryan & Shmatikov, 2021; Xiao et al., 2018) , pre-trained model tampering (Yao et al., 2019; Ji et al., 2018; Rakin et al., 2020) , or outsourced data collection (Gu et al., 2019a; Chen et al., 2017; Shafahi et al., 2018b; Zhu et al., 2019b; Saha et al., 2020; Lovisotto et al., 2020; Datta & Shadbolt, 2022b) . We specifically evaluate backdoor attacks manifesting through outsourced data collection. Though the attack vectors and corresponding attack methods vary, the principle is consistent: model weights are modified such that they achieve the backdoor attack objective. Multi-Agent Attacks. Backdoor attacks (Suresh et al., 2019; Wang et al., 2020; Bagdasaryan et al., 2020; Huang, 2020) and poisoning attacks (Hayes & Ohrimenko, 2018; Mahloujifar et al., 2018; 2019; Chen et al., 2021; Fang et al., 2020) against federated learning systems and against multi-party learning models have been demonstrated, but with a single attacker intending to compromise multiple victims (i.e. single attacker vs multiple defenders); for example, with a single attacker controlling multiple participant nodes in the federated learning setup (Bagdasaryan et al., 2020) ; or decomposing a backdoor trigger pattern into multiple distributed small patterns to be injected by multiple participant nodes controlled by a single attacker (Xie et al., 2020) . Our multi-agent backdoor attack could be evaluated extensibly in federated learning, where multiple attackers control distinctly different nodes to backdoor the joint model. 2020) proposed an 1-to-N attack, where an attacker triggers multiple backdoor inputs by varying the intensity of the same backdoor, and N-to-1 attack, where the backdoor attack is triggered only when all N backdoor (sub)-triggers are present. Though its implementation of multiple triggers are for the purpose of maximizing a single-agent payoff, we reference its insights in evaluating a low-distance-triggers, cooperative attack in (E4) . Our work is unique because: (i) prior work evaluates a single attacker against multiple victims, while our work evaluates multiple attackers against each other and a defender; (ii) our attack objective is strict and individualized for each attacker (i.e. in a poisoning attack, each attacker can have a generalized, attacker-agnostic objective of reducing the standard model accuracy, but in a backdoor attack, each attacker has an individualized objective with respect to their own trigger patterns and target labels). Our work is amongst the first to investigate this conflict between the attack objectives between multiple attackers, hence the resulting backfiring effect does not manifest in existing multiagent attack work.



Though not a multi-agent attack, Xue et al. (2020); Nguyen & Tran (2020); Salem et al. (2021) make use of multiple trigger patterns in their single-agent backdoor attack. Xue et al. (

