EFFICIENT CERTIFIED DEFENSES AGAINST PATCH AT-TACKS ON IMAGE CLASSIFIERS

Abstract

Adversarial patches pose a realistic threat model for physical world attacks on autonomous systems via their perception component. Autonomous systems in safety-critical domains such as automated driving should thus contain a fail-safe fallback component that combines certifiable robustness against patches with efficient inference while maintaining high performance on clean inputs. We propose BAGCERT, a novel combination of model architecture and certification procedure that allows efficient certification. We derive a loss that enables end-to-end optimization of certified robustness against patches of different sizes and locations. On CIFAR10, BAGCERT certifies 10.000 examples in 43 seconds on a single GPU and obtains 86% clean and 60% certified accuracy against 5 × 5 patches.

1. INTRODUCTION

Adversarial patches (Brown et al., 2017) are one of the most relevant threat models for attacks on autonomous systems such as highly automated cars or robots. In this threat model, an attacker can freely control a small subregion of the input (the "patch") but needs to leave the rest of the input unchanged. This threat model is relevant because it corresponds to a physically realizable attack (Lee & Kolter, 2019): an attacker can print the adversarial patch pattern, place it in the physical world, and it will become part of the input of any system whose field of view overlaps with the physical patch. Moreover, once an attacker has generated a successful patch pattern, this pattern can be easily shared, will be effective against all systems using the same perception component, and an attack can be conducted without requiring access to the individual system. This makes for instance attacking an entire fleet of cars of the same vendor feasible. While several empirical defenses were proposed (Hayes, 2018; Naseer et al., 2019; Selvaraju et al., 2019; Wu et al., 2020) ), these only offer robustness against known attacks but not necessarily against more effective attacks that may be developed in the future (Chiang et al., 2020) . In contrast, certified defenses for the patch threat model (Chiang et al., 2020; Levine & Feizi, 2020; Zhang et al., 2020; Xiang et al., 2020) allow guaranteed robustness against all possible attacks for the given threat model. Ideally, a certified defense should combine high certified robustness with efficient inference while maintaining strong performance on clean inputs. Moreover, the training objective should be based on the certification problem to avoid post-hoc calibration of the model for certification. Existing defenses do not satisfy all of these conditions: Chiang et al. ( 2020) proposed an approach that extends interval-bound propagation (Gowal et al., 2019) to the patch threat model. In this approach, there is a clear connection between training objective and certification problem. However, certified accuracy is relatively low and clean performance severely affected (below 50% on CI-FAR10). Moreover, inference requires separate forward passes for all possible patch positions and is thus computationally very expensive. Derandomized smoothing (Levine & Feizi, 2020) achieves much higher certified and clean performance on CIFAR10 and even scales to ImageNet. However, inference is computationally expensive since it is based on separately propagating many differently ablated versions of a single input. Moreover, training and certification are disconnected and a separate tuning of parameters of the post-hoc certification procedure on some hold-out data is required, a drawback shared also by Clipped BagNet Zhang et al. (2020) and PatchGuard (Xiang et al., 2020) .

