RELEVANCE ATTACK ON DETECTORS

Abstract

This paper focuses on high-transferable adversarial attacks on detectors, which are hard to attack in a black-box manner, because of their multiple-output characteristics and the diversity across architectures. To pursue a high attack transferability, one plausible way is to find a common property across detectors, which facilitates the discovery of common weaknesses. We are the first to suggest that the relevance map for detectors is such a property. Based on it, we design a Relevance Attack on Detectors (RAD), which achieves a state-of-the-art transferability, exceeding existing results by above 20%. On MS COCO, the detection mAPs for all 8 black-box architectures are more than halved and the segmentation mAPs are also significantly influenced. Given the great transferability of RAD, we generate the first adversarial dataset for object detection, i.e., Adversarial Objects in COntext (AOCO), which helps to quickly evaluate and improve the robustness of detectors.

1. INTRODUCTION

Adversarial attacks (Szegedy et al. (2014) ; Goodfellow et al. (2015) ; Carlini & Wagner (2017) ; M ądry et al. (2017) ; Baluja & Fischer (2017) ; Su et al. (2019) ) have revealed the fragility of Deep Neural Networks (DNNs) by fooling them with elaborately-crafted imperceptible perturbations. Among them, the black-box attack, i.e., attacking without knowledge of their inner structure and weights, is much harder, more aggressive and closer to real-world scenarios. For classifiers, there exist some promising black-box attacks (Papernot et al. (2016) ; Brendel et al. (2018) ; Dong et al. (2018) ; Xie et al. (2019) ; Lin et al. (2020) ; Chen et al. (2020) ). It is also severe to attack object detection (Zhang & Wang (2019) ) in a black-box manner, e.g., hiding certain objects from unknown detectors (Thys et al. (2019)) . By that, life-concerning systems based on detection such as autonomous driving and security surveillance would be greatly influence. To the best of our knowledge, no existing attack is specifically designed for black-box transferability in detectors, because they have multiple-outputs and a high diversity across architectures. In such situations, adversarial samples do not transfer well (Su et al. (2018) ), and most attacks only decrease mAP of black-box detectors by 5 to 10% (Xie et al. (2017) ; Li et al. (2018c; b) ). To overcome this, we propose one plausible way to find common properties across detectors, which facilitates the discovery of common weaknesses. Based on them, the designed attack can threaten variable victims. In this paper, we adopt the relevance map as a common property, on which different detectors have similar interpretable results, as shown in Fig. 1 . Based on relevance maps, we design a Relevance Attack on Detectors (RAD). RAD focuses on suppressing the relevance map rather than directly attacking the prediction as in existing works (Xie et al. (2017) ; Li et al. (2018c; a) ). Because the relevance maps are quite similar across models, those of black-box models are influenced and misled as well in attack, leading to the great transferability. Although some works have adopted the relevance map as an indicator or reference of success attacks (Dong et al. (2019) ; Zhang & Zhu (2019) ; Chen et al. (2020) ; Wu et al. (2020a) ), there is no work to directly attack the relevance maps of detectors to the best of our knowledge. In our comprehensive evaluation, RAD achieves the state-of-the-art transferability on 8 black-box models for COCO) dataset (Lin et al. (2014) ), nearly halving the detection mAP. Interestingly, the adversarial samples of RAD also greatly influence the performance of instance segmentation, even only detectors are attacked. Given the high transferability of RAD, we create Adversarial Objects in COntext (AOCO), the first adversarial dataset for object detection. AOCO contains 10K samples that significantly decrease the performance of black-box models for detection and segmentation. AOCO may serve as a benchmark to test the robustness of a DNN or improve it by adversarial training. Figure 1 : Relevance maps for models with different architectures. Three models not only predict the "stop sign" right, but also share similar relevance maps.

CONTRIBUTIONS

• We propose a novel attack framework on relevance maps for detectors. We extend network visualization methods to detectors, find out the most suitable nodes to attack by relevance maps, and explore on the best update techniques to increase the transferability. • We evaluate RAD comprehensively and find its state-of-the-art transferability, which exceeds existing results by above 20%. Detection mAPs are more than halved, invalidating the stateof-the-art detectors to a large extent. • By RAD, we create the first adversarial dataset for object detection, i.e., AOCO. As a potential benchmark, AOCO is generated from COCO and contains 10K high-transferable samples. AOCO helps to quickly evaluate and improve the robustness of detectors.

2. RELATED WORK

Since (Szegedy et al. (2014) ), there have been lots of promising adversarial attacks (Goodfellow et al. (2015) ; Carlini & Wagner (2017) ; M ądry et al. ( 2017)). Generally, they fix the network weights and change the input slightly to optimize the attack loss. The network then predicts incorrectly on adversarial samples with a high confidence. (Papernot et al. (2016; 2017) ) find that adversarial samples crafted by attacking a white-box surrogate model may transfer to other black-box models as well. Input modification (Xie et al. (2019) ; Dong et al. (2019) ; Lin et al. (2020) ) or other optimization ways (Dong et al. (2018) ; Lin et al. (2020) ) are validated to be effective in enhancing the transferability. (Xie et al. (2017) ) extends adversarial attacks to detectors. It proposes to attack on densely generated bounding boxes. After that, losses about localization and classification are designed (Li et al. (2018c) ) for attacking detectors. (Lu et al. (2017) ) and (Li et al. (2018a) ) propose to attack detectors in a restricted area. Existing works achieve good results in white-box scenarios, but are not specifically designed for transferability. The adversarial impact on black-box models is quite limited, i.e., a 5 to 10% decrease from the original mAP, even when two models only differ in backbone (Xie et al. (2017) ; Li et al. (2018c; b) ). (Wang et al. (2020) ) discusses black-box attacks towards detectors based on queries rather than the transferability as we do. The performance is satisfactory, but it requires over 30K queries, which is easy to be discovered by the model owner. Besides, physical attacks on white-box detectors are also feasible (Huang et al. (2020) RAD differs from (Ghorbani et al. (2019) ; Zhang et al. (2020) ) in the goal. RAD misleads detectors by suppressing relevance maps. In contrast, (Ghorbani et al. (2019) ) misleads the relevance maps while keeping the prediction unchanged. (Zhang et al. (2020) ) also misleads DNNs, but it keeps the relevance maps unchanged.

3. RELEVANCE ATTACK ON DETECTORS

We propose an attack specifically designed for black-box transferability, named Relevance Attack on Detectors (RAD). RAD suppresses multi-node relevance maps for several bounding boxes. Since the relevance map is commonly shared by different detectors as shown in Fig. 1 , attacking on it in the white-box surrogate model achieves a high transferability towards black-box models. In this section, we first provide a high-level overview of RAD, and analyze the potential reasons of its transferability. Then we thoroughly discuss three crucial concrete issues in RAD. • In Section 3.3, we specify the calculation of relevance maps for detectors, where current visualization methods are not applicable. • In Section 3.4, we introduce the proper nodes to attack by RAD. • In Section 3.5, we explore on the suitable techniques to update samples in RAD.

3.1. WHAT IS RAD?

We present the framework of RAD in Fig. 2 . Initialized by the original sample x 0 , the adversarial sample x k in the k th iteration is forward propagated in the surrogate model, getting the prediction f (x k ). Current attacks generally suppress the prediction values of all attacked output nodes in T . In contrast, RAD suppresses the corresponding relevance map h(x k , T ). To restrain that, gradients of h(x k , T ) back propagate to x k , which is then modified to x k+1 . Figure 2 : Framework of RAD. x k is the sample in iteration k and f (x k ) is the network prediction for it. h(x k , T ) stands for the relevance map for all attacked nodes in T . RAD works by repeating processes denoted by "black", "red" and "blue" arrows in turn. It is notable that RAD is a complete framework to attack detectors, and its each component requires a special design. Besides the calculation of relevance maps of detectors, other components in RAD, e.g., the attacked nodes or the update techniques, also need a customized analysis. The reason is that no existing work directly attacks the relevance of detectors, and the experience in attacking predictions is not totally applicable here. For example, (Zhang & Wang (2019) ) emphasizes classification loss and localization loss equally, but the former is validated to be significantly better in attacking the relevance in Section 3.4.

3.2. WHY RAD TRANSFERS?

RAD's transferability comes from the attack goal: changing the common properties, i.e., the relevance maps. As shown in Fig. 3 , the relevance maps are clear and structured for the original sample in both detectors. After RAD, the relevance maps are induced to be meaningless without a correct focus, leading to wrong predictions, i.e., no or false detection. Because relevance maps transfer well across models, those for black-box detectors are also significantly influenced, causing a great performance drop, which is illustrated visually in Section 4.2. RAD also attacks quite "precisely", i.e., the perturbation pattern is significantly focused on distinct areas and has a clear structure as shown in Fig. 4 . That is to say, RAD accurately locates the most discriminating parts of a sample and concentrates the perturbation on them, leading to a great transferability when the perturbations is equally bounded. We analyze the potential of RAD above, below we make it feasible by addressing three crucial issues. To conduct the relevance attack, we first need to know the relevance maps for detectors. Currently, there have been lots of methods to calculate the relevance maps for classifiers as described in Section 2, but none of them are suitable for detectors. We take SGLRP (Iwana et al. (2019) ) as an example to explain this and then to modify, because it excels in discriminating ability against irrelevant regions of a certain target node. SGLRP visualizes how the input contributes to one output node in a pixel-wise way by backpropagating the relevance from the output to the input based on Deep Taylor Decomposition (DTD) as illustrated in Appendix A. R (L) is the initial relevance in the output layer L and its n th component is calculated as R (L) n = y n (1 -y n ) n = t, -y n y t n = t, where y n is the predicted probability of class n, and y t is that for the single-node target t. The pixel-wise relevance map h(x, t) for the single-node target t is calculated by back propagating the relevance R from the final layer to the input following rules specified in (Iwana et al. ( 2019)). In detectors, we need the pixel-wise contributions from the input to m bounding boxes. This multi-node relevance map could not be directly calculated by (1), so we naturally modify SGLRP as R (L) n = y n (1 -y n ) n ∈ T, -1 m y n m i=1 y ti n / ∈ T, where y ti is the predicted probabilities for one target node t i . T is the set containing all target nodes {t 1 , t 2 , ..., t m }. With iNNvestigate Library (Alber et al. (2019) ) to implement Multi-Node SGLRP and Deep Learning platforms supporting auto-gradient, the gradients from RAD loss L RAD (x) = h(x, T ) to sample x could be obtained according to the calculation rules of relevance maps in Appendix A. We illustrate the difference between SGLRP and our Multi-Node SGLRP in Fig. 5 . SGLRP only displays the relevance map for one bounding box, e.g., "TV", "chair" and "bottle". Multi-Node SGLRP, in contrast, visualizes the overall relevance. Figure 5 : Difference between relevance maps from SGLRP and Multi-Node SGLRP. The relevance maps are for YOLOv3 (Redmon & Farhadi (2018) ).

3.4. WHERE TO ATTACK?

Besides the calculation of relevance maps, it is also important to choose a proper node set T to attack. Specifically, we need to select certain bounding boxes and the corresponding output nodes for RAD. Heuristically, the most "obvious" bounding boxes are desired to be eliminated, so we select the bounding boxes with the highest confidence, following (Xie et al. (2017) ). Concretely, it is feasible to statically choose m bounding boxes to attack in each iteration, or dynamically attack all bounding boxes whose confidence exceeds a threshold. In our evaluation, the two strategies differ a little in performance and are not sensitive to hyper-parameter as demonstrated in Appendix C. This shows that RAD does not require a sophisticated tuning of parameters, which is user-friendly. In our following experiments, we statically attack m = 20 nodes. After selecting bounding boxes, we could attack their size, leading them to shrink; or their localization, leading them to shift; or their confidence, leading them to be misclassified. To adopt the best strategy, we conduct a toy experiment by attacking YOLOv3 (Redmon & Farhadi (2018) ), denoted as M2 and other models are specified in Appendix B. Given the results in Table 1 , the classification loss induces a better black-box transferability. This may because detectors generally include a pre-trained classification as the feature extractor, and relevance maps are believed to be an indicator of success attacks (Dong et al. (2019) ; Zhang & Zhu (2019) ). 

3.5. HOW TO UPDATE?

By the relevance map h(x, T ) for certain attacked nodes T , we are able to attack, i.e., update the original sample to become adversarial with the guidance of the attack gradients g(x) as g(x) = ∂L RAD (x) ∂x = ∂h(x, T ) ∂x . Some update techniques are validated to be effective for enhancing the transferability in classification. For example, Scale-Invariant (SI) (Lin et al. (2020) ) proposes to average the attack gradients by scale copies of the samples as g si (x) = 1 k k i=0 g(x/2 i ). Besides SI, Diverse Input (DI) (Xie et al. (2019) ), Translation-Invariant (TI) (Dong et al. (2019) ) are also promising in classification. We are curious about whether they also work well in object detection. To explore on this, we adopt these techniques in RAD as the setting suggested by their designers (see Appendix E). From the results in Table 2 , we discover that SI is quite effective, further decreasing the mAP from the baseline to a large extent. Accordingly, we adopt (4) to update the sample in RAD. With the calculated gradient, we update the sample as x k+1 = clip ε x k -α g si (x k ) ||g si (x k )|| 1 /N , where α stands for the step length. Division by N is necessary because 1 -norm sums all components of the tensor x, which is too large as a normalization factor. We do not adopt the mainstream sign method because it is not suitable to generate small perturbations as shown in other attacks in detectors (Xie et al. (2017) ).

4. EXPERIMENTS

In this section, we evaluate the performance of RAD, especially its transferability. The results are presented numerically and visually. In comprehensive evaluation, RAD achieves a great transferability in across models and even across tasks.

4.1. SETUP

Our experiments are based on Keras (Chollet et al. (2015) ), Tensorflow (Abadi et al. (2015) ) and PyTorch (Paszke et al. (2019) ) in 4 NVIDIA GeForce RTX 2080Ti GPUs. Library iNNvestigate (Alber et al. (2019) ) is used to implement Multi-Node SGLRP. We conduct experiments on MS COCO 2017 dataset (Lin et al. (2014) ), which is a large-scale benchmark for object detection, instance segmentation and image captioning. For a fair evaluation, we generate adversarial samples from all 5K samples in its validation set and test several black-box models on their mAP, a standard criteria in many works (He et al. (2017) ; Chen et al. ( 2019a)). All attacks are conducted with the step length α = 2 for 10 iterations and the perturbation is ∞bounded in ε = 16 to guarantee the imperceptibility as in (Dong et al. (2019) ). To validate that the mAP drop comes from the attack instead of resizing or perturbation, we add large Gaussian noises (σ = 9) to the resized images, and report it as "Ablation". We choose 8 typical detectors ranging from the first end-to-end detector to recent ones for attack and test. The variety of model guarantees the validity of results. We specify their information in Appendix B and the corresponding pre-processing in Appendix E.

4.2. VISUAL RESULTS OF RAD

We visualize several predictions on the same adversarial sample by black-box models in Fig. 6 to intuitively illustrate the transferability of RAD. The objects in the image, e.g., the laptop and keyboard, are quite large and obvious to detect. However, with a small perturbation from RAD, 5 black-box models all fail to detect the laptop, keyboard and mouse. Surprisingly, 4 of them even detect a non-existent "bed", which is neither relevant nor similar in the image. The attack process of RAD is analyzed in Appendix F. We present the results in Table 3 . Among the classification attacks and detection ones, cross-domain attack (Naseer et al. ( 2019)) is effective, but RAD is more aggressive. RAD enjoys a state-of-theart transferability towards most black-box models, outperforming other methods for above 20%. The detection mAPs are more than halved, making state-of-the-art detectors worse than the early single-shot detector (SSD Liu et al. (2016) , M1). The influence of ε on detection mAP in RAD are displayed in Fig. 7 . With the ∞ bound increases, the resulting mAP greatly decreases for all black-box models especially for ε from 8 to 12. 4 , we find that RAD also greatly hurts the performance of instance segmentation, leading to a drop on mAP of over 70%. This inspire the segmentation attackers to indirectly attack detectors. 7 32.4 34.3 35.4 M2 17.9 18.6 20.3 31.6 31.7 34.5 18.0 18.9 20.7 M3 11.6 11.9 12.9 19.2 19.1 20.7 12.1 12.6 13.7 M5 1.2 11.1 11.8 2.4 17.9 18.9 1.0 11.9 12.6

5. ADVERSARIAL OBJECTS IN CONTEXT

Given the great transferability of RAD, we create Adversarial Objects in COntext (AOCO), the first adversarial dataset for object detection. AOCO dataset serves as a potential benchmark to evaluate the robustness of detectors, which will be beneficial to network designers. It will also be useful for adversarial training, as the most effective practice to improve the robustness of DNNs Zhang et al. (2019); Tramèr et al. (2018) . Notice that there is no other adversarial dataset for detection at all. This is not because the dataset is useless, but due to the low transferability of attack methods such that the examples are detector-dependent. Now we have achieved high transferability and can then make such an adversarial dataset publicly available. AOCO is generated from the full COCO 2017 validation set (Lin et al. (2014) ) with 5k samples. It contains 5K adversarial samples for evaluating object detection (AOCO detection) and 5K for instance segmentation (AOCO segmentation). All 10K samples in AOCO are crafted by RAD. The surrogate model we attack is YOLOv3 for AOCO detection and Mask R-CNN for AOCO segmentation given the results in Table 3 and Table 4. We measure the perturbation ∆x in AOCO by Root Mean Squared Error (RMSE) as in (Xie et al. (2017) ; Liu et al. (2017) ). It is calculated as i (∆x i ) 2 /N in a pixel-wise way, and N is the size of the image. Performance of AOCO is reported in Table 5 . The RMSE in AOCO is lower than that in (Wu et al. (2019) ), and the perturbation is quite imperceptible. Details and samples of AOCO are presented in Appendix G. 

6. CONCLUSION

To pursue a high transferability, this paper proposes Relevance Attack on Detectors (RAD), which works by suppressing the multi-node relevance, a common property across detectors calculated by our Multi-Node SGLRP. We also thoroughly discuss where to attack and the how to update in attacking relevance maps. RAD achieves a state-of-the-art transferability towards 8 diverse black-box models, exceeding existing results by above 20%, and also significantly hurts the instance segmentation. Given the great transferability of RAD, we generate the first adversarial dataset for object detection, i.e., Adversarial Objects in COntext (AOCO), which helps to quickly evaluate and improve the robustness of detectors. Also, attacking other common properties is promising for a good transferability. A RELEVANCE BACK-PROPAGATION RULES (DTD) DTD-based network visualization methods, such as LRP, CLRP and SGLRP, back-propagate the relevance from the output layer to the input layer according to the rules specified in this section. Their only difference is the relevance in the initial output layer R (L) n . For each layer l in a DNN with L layers in total, suppose layer l has N nodes and layer l + 1 has M nodes, the relevance R (L) n at node n in layer l is defined recursively by R (l) n = m a (l) n w +(l) n,m n a (l) n w +(l) n ,m R (l+1) m , for nodes with definite positive values (such as after ReLU), and R (l) n = m z (l) n w (l) n,m -b (l) n w +(l) n,m -h (l) n w -(l) n,m n z (l) n w (l) n ,m -b (l) n w +(l) n ,m -h (l) n w -(l) n ,m R (l+1) m , for nodes that may have negative values. In the formulas above, a n is the post-activation output of node n in layer l and z According to the propagation rules above as mentioned in (Iwana et al. ( 2019)), we could naturally obtain the attack gradients as ∂R (l+1) m ∂R (l) n =        ( m a (l) n w +(l) n,m n a (l) n w +(l) n ,m ) -1 , for nodes with definite positive values ( m z (l) n w (l) n,m -b (l) n w +(l) n,m -h (l) n w -(l) n,m n z (l) n w (l) n ,m -b (l) n w +(l) n ,m -h (l) n w -(l) n ,m ) -1 , otherwise .

B MODEL INFORMATION

Table 6 presents the models' information from our evaluation and MMdetection (Chen et al. (2019b) ). 

D RAD ON MORE SURROGATES

The results on attacking more surrogates by RAD are reported in Table 8 . 2019)) transforms the image for 4 times with probability p (p = 1 for better transferability as suggested) and averaging the gradients. The transformation is to resize the image to 0.9× its size and randomly padding the outer areas with white pixels. SI (Lin et al. (2020) ) divides the sample numerically by the power 2 for 4 times and averages the 4 obtained gradients. TI (Dong et al. (2019) ) translates the image to calculate the augmented gradients. To implement it efficiently, it adopts a kernel to simulate the averaging of gradients. We choose the kernel size 15 as suggested. MI (Dong et al. (2018) ) uses momentum optimization (parameter µ = 1 as suggested) for a better transferability and a faster attack. Cross-domain attack (Naseer et al. (2019) ) uses extra datasets (paintings, denoted as CD-paintings, and comics, denoted as CD-comics) to train a perturbation generator with the relative loss. The adopted surrogate model is also InceptionV3 for consistency. All perturbations are resized to fit the sample size.

E.3 DETECTION ATTACKS

For DAG (Xie et al. (2017) ), we follow the setting of generating dense proposals. The classification probabilities of 3000 bounding boxes with highest confidence are attacked. But we alter its optimization to (5) because its original update produces quite small perturbation, leading to a poor transferability, which is unfair for comparison. Dfool (Lu et al. (2017) ) suppresses the classification confidence for the original bounding boxes, which is the same in our experiment. Localization loss is shown to be useful in (Zhang & Wang (2019) ), and here we suppress the width and height of the original bounding boxes.

F VISUAL RESULTS OF RAD PROCESS

By RAD, the relevance map is attacked to be meaningless and loss its focus. In Fig. 8 , the initial prediction is correct and the relevance map is clear. RAD constantly misleads the relevance map to be unstructured without outline of objects. Finally, all bounding boxes vanish. 

G MORE ABOUT AOCO

We report the mAP50 and mAP75 of AOCO in Table 9 and Table 10 . We show in Fig. 9 the visual comparative results. For COCO, both networks predict correctly. For AOCO segmentation results, the top image contains two big masks for "chair" and "potted plan"; the second image contains one false mask for "sports ball"; the bottom image contains "dog" in green, "car" in purple and "elephant" in red.



Figure 3: RAD's transferability origins from the change of relevance maps. The image contains a person and a skateboard. By attacking on relevance maps, both surrogate models make extremely confusing predictions.

Figure 4: The original image and the adversarial perturbations (×5 in magnitude for demonstration) generated by Dfool (Lu et al. (2017)), DAG (Xie et al. (2017)), and RAD (from left to right)

Figure 6: RAD has a great transferability. The same adversarial sample generated by attacking Mask R-CNN fools all 5 black-box detectors.

Figure 7: The influence of ε on detection mAP in RAD

is the pre-activation one. The range [b (l) n , h (l) n ] stands for the minimum and maximum of z (l) n . Finally, w +(l) n,m = max w (l) n,m , 0 and w -(l) n,m = min w (l) n,m , 0 .

Figure 8: Transition of prediction and relevance map in RAD (from top to bottom and left to right).

Figure 9: Detection and segmentation results in COCO and AOCO by YOLOv3 and Mask R-CNN.For COCO, both networks predict correctly. For AOCO segmentation results, the top image contains two big masks for "chair" and "potted plan"; the second image contains one false mask for "sports ball"; the bottom image contains "dog" in green, "car" in purple and "elephant" in red.

Detection mAP in RAD with different attacked nodes

Detection mAP in RAD with different update techniques





Detection mAP and segmentation mAP on COCO and AOCO

of RAD is not sensitive to hyper-parameter, no matter the strategy to select bounding boxes is dynamic or static as Table7. Attackers are not bothered to tune them carefully. The parameter for dynamic strategy refers to the pre-softmax confidence threshold to select a bounding box. The parameter for static strategy refers to the fixed number of selected bounding boxes in each iteration.

Detection mAP in different hyper-parameters in RADTo pre-process, we resize the image with its long side as 416 for YOLOv3 or RetinaNet and 448 for Mask R-CNN, and then zero-pad it to a square. The resolution is kept relatively the same for a fair evaluation. Images are normalized to [0,1] in YOLOv3 or subtracted by the mean of COCO training set in RetinaNet and Mask R-CNN. Accordingly, samples in AOCO detection have the long side 416 and that for AOCO segmentation is 448.



