CERTIFIED DEFENCES AGAINST ADVERSARIAL PATCH ATTACKS ON SEMANTIC SEGMENTATION

Abstract

Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present DEMASKED SMOOTHING, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the model architecture and additional training which is undesirable and computationally expensive. In DEMASKED SMOOTHING, any segmentation model can be applied without particular training, fine-tuning, or restriction of the architecture. Using different masking strategies, DEMASKED SMOOTHING can be applied both for certified detection and certified recovery. In extensive experiments we show that DEMASKED SMOOTHING can on average certify 63% of the pixel predictions for a 1% patch in the detection task and 46% against a 0.5% patch for the recovery task on the ADE20K dataset.

1. INTRODUCTION

Physically realizable adversarial attacks are a threat for safety-critical (semi-)autonomous systems such as self-driving cars or robots. Adversarial patches (Brown et al., 2017; Karmon et al., 2018 ) are the most prominent example of such an attack. Their realizability has been demonstrated repeatedly, for instance by Lee & Kolter (2019) : an attacker places a printed version of an adversarial patch in the physical world to fool a deep learning system. While empirical defenses (Hayes, 2018; Naseer et al., 2019; Selvaraju et al., 2019; Wu et al., 2020) may offer robustness against known attacks, it does not provide any guarantees against unknown future attacks (Chiang et al., 2020). Thus, certified defenses for the patch threat model, which allow guaranteed robustness against all possible attacks for the given threat model, are crucial for safety-critical applications. Research on certifiable defenses against adversarial patches can be broadly categorized into certified recovery and certified detection. Certified recovery (Chiang et al., 2020; Levine & Feizi, 2020; Zhang et al., 2020; Xiang et al., 2021; Metzen & Yatsura, 2021; Lin et al., 2021; Xiang et al., 2022a; Salman et al., 2021; Chen et al., 2022) has the objective to make a correct prediction on an input even in the presence of an adversarial patch. In contrast, certified detection (McCoyd et al., 2020; Xiang & Mittal, 2021b; Han et al., 2021; Huang & Li, 2021) provides a weaker guarantee by only aiming at detecting inputs containing adversarial patches. While certified recovery is more desirable in principle, it typically comes at a high cost of reduced performance on clean data. In practice, certified detection might be preferable because it allows maintaining high clean performance. Most existing certifiable defenses against patches are focused on image classification, with the exception of DetectorGuard (Xiang & Mittal, 2021a) and ObjectSeeker (Xiang et al., 2022b) that certifiably defend against patch hiding attacks on object detectors. Moreover, existing defences are not easily applicable to arbitrary downstream models, because they assume either that the downstream model is trained explicitly for being certifiably robust (Levine & Feizi, 2020; Metzen & Yatsura, 2021) , or that the model has a certain network architecture such as BagNet (Zhang et al., 2020; Metzen & Yatsura, 2021; Xiang et al., 2021) or a vision transformer (Salman et al., 2021; Huang & Li, 2021) . A notable exception is PatchCleanser (Xiang et al., 2022a) , which can be combined with arbitrary downstream models but is restricted to image classification. et al., 2021; Bousselham et al., 2021) . Their output may become more vulnerable to adversarial patches if they manage to manipulate the global self-attention (Lovisotto et al., 2022) . We demonstrate how significant parts of the segmentation output may be affected by a small patch for Swin tranfromer Liu et al. (2021) in Figure 1a . Full details on the attack are available in Appendix D. We point out that preventive certified defences are important because newly developed attacks can immediately be used to compromise safety-critical applications unless they are properly defended. In this work, we propose the novel framework DEMASKED SMOOTHING (Figure 1c ) to obtain the first (to the best of our knowledge) certified defences against patch attacks on semantic segmentation models. Similarly to previous work (Levine & Feizi, 2020), we mask different parts of the input (Figure 1b ) and provide guarantees with respect to every possible patch that is not larger than a certain pre-defined size. While prior work required the classification model to deal with such masked inputs, we leverage recent progress in image inpainting (Dong et al., 2022) to reconstruct the input before passing it to the downstream model. This decoupling of image demasking from the segmentation task allows us to support arbitrary downstream models. Moreover, we can leverage state of the art methods for image inpainting. We also propose different masking schemes tailored for the segmentation task that provide the dense input allowing the demasking model to understand the scene but still satisfy the guarantees with respect to the adversarial patch. We summarize our contributions as follows: • We propose DEMASKED SMOOTHING which is the first (to the best of our knowledge) certified recovery or certified detection based defence against adversarial patch attacks on semantic segmentation models (Section 4). • DEMASKED SMOOTHING can do certified detection and recovery with any off-the-shelf segmentation model without requiring finetuning or any other adaptation. • We implement DEMASKED SMOOTHING, evaluate it for different certification objectives and masking schemes (Section 5). We can certify 63% of all pixels in certified detection for a 1% patch and 46% in certified recovery for a 0.5% patch for the BEiT-B (Bao et al., 2022) segmentation model on the ADE20K Zhou et al. ( 2017) dataset.



Figure 1: (a) A simple patch attack on the Swin transformer (Liu et al., 2021) manages to switch the prediction for a big part of the image. (b) Masking the patch. (c) A sketch of DEMASKED SMOOTHING for certified image segmentation. First, we generate a set of masked versions of the image such that each possible patch can only affect a certain number of masked images. Then we use image inpainting to partially recover the information lost during masking and then apply an arbitrary segmentation method. The output is obtained by aggregating the segmentations pixelwise. The masking strategy and aggregation method depend on the certification mode (detection or recovery).

