BACKDOORS STUCK AT THE FRONTDOOR: MULTI-AGENT BACKDOOR ATTACKS THAT BACKFIRE

Abstract

Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.

1. INTRODUCTION

Beyond training algorithms, the scale-up of model training depends strongly on the trust between agents. In collaborative learning and outsourced data collection training regimes, backdoor attacks and defenses (Gao et al., 2020; Li et al., 2021) are studied to mitigate a single malicious agent that perturbs train-time images for targeted test-time misclassifications. Outsourced data collection is common amongst industry practitioners, where Taigman et al. (2014) ; Papernot (2018) find a strong reliance on web-scraped data or third-party sourcing. Kumar et al. ( 2020) also find that data poisoning is viewed by practitioners as their most serious threat. In many practical situations, It is plausible in practice for >1 attacker, such as the poisoning of crowdsourced and agent-driven datasets on Google Images (hence afflicting subsequent scraped datasets) and financial market data respectively, or poisoning through human-in-the-loop learning on mobile devices or social network platforms. In this paper, we investigate the under-represented aspect of agent dynamics in backdoor attacks: what happens when multiple backdoor attackers are present? We simulate different configurations to study how the payoff landscape changes for attackers, with respect to standard attack/defense configurations, cooperative/non-cooperative behaviour, and joint distribution shifts. Our key contributions are: • We explore the novel scenario of the multi-agent backdoor attack. Our findings on the backfiring effect and a low equilibrium attack success rate indicate a stable, natural defense against backdoor attacks, and motivates us to propose the multi-agent setting as a baseline in future research. • We introduce a set of cooperative dynamics between multiple attackers, extending on existing backdoor attack procedures with respect to trigger pattern generation or trigger label selection. • We vary the sources of distribution shift, from just multiple backdoor perturbations to the inclusion of adversarial and stylized perturbations, to investigate changes to a wider scope of attack success.

