RELEVANCE ATTACK ON DETECTORS

Abstract

This paper focuses on high-transferable adversarial attacks on detectors, which are hard to attack in a black-box manner, because of their multiple-output characteristics and the diversity across architectures. To pursue a high attack transferability, one plausible way is to find a common property across detectors, which facilitates the discovery of common weaknesses. We are the first to suggest that the relevance map for detectors is such a property. Based on it, we design a Relevance Attack on Detectors (RAD), which achieves a state-of-the-art transferability, exceeding existing results by above 20%. On MS COCO, the detection mAPs for all 8 black-box architectures are more than halved and the segmentation mAPs are also significantly influenced. Given the great transferability of RAD, we generate the first adversarial dataset for object detection, i.e., Adversarial Objects in COntext (AOCO), which helps to quickly evaluate and improve the robustness of detectors.



)). By that, life-concerning systems based on detection such as autonomous driving and security surveillance would be greatly influence. To the best of our knowledge, no existing attack is specifically designed for black-box transferability in detectors, because they have multiple-outputs and a high diversity across architectures. In such situations, adversarial samples do not transfer well (Su et al. (2018) ), and most attacks only decrease mAP of black-box detectors by 5 to 10% (Xie et al. (2017); Li et al. (2018c; b) ). To overcome this, we propose one plausible way to find common properties across detectors, which facilitates the discovery of common weaknesses. Based on them, the designed attack can threaten variable victims. In this paper, we adopt the relevance map as a common property, on which different detectors have similar interpretable results, as shown in Fig. 1 . Based on relevance maps, we design a Relevance Attack on Detectors (RAD). RAD focuses on suppressing the relevance map rather than directly attacking the prediction as in existing works (Xie et al. ( 2017 In our comprehensive evaluation, RAD achieves the state-of-the-art transferability on 8 black-box models for COCO) dataset (Lin et al. ( 2014)), nearly halving the detection mAP. Interestingly, the adversarial samples of RAD also greatly influence the performance of instance segmentation, even only detectors are attacked. Given the high transferability of RAD, we create Adversarial Objects in COntext (AOCO), the first adversarial dataset for object detection. AOCO contains 10K samples that significantly decrease the performance of black-box models for detection and segmentation. AOCO may serve as a benchmark to test the robustness of a DNN or improve it by adversarial training. Figure 1 : Relevance maps for models with different architectures. Three models not only predict the "stop sign" right, but also share similar relevance maps.

CONTRIBUTIONS

• We propose a novel attack framework on relevance maps for detectors. We extend network visualization methods to detectors, find out the most suitable nodes to attack by relevance maps, and explore on the best update techniques to increase the transferability. • We evaluate RAD comprehensively and find its state-of-the-art transferability, which exceeds existing results by above 20%. Detection mAPs are more than halved, invalidating the stateof-the-art detectors to a large extent. • By RAD, we create the first adversarial dataset for object detection, i.e., AOCO. As a potential benchmark, AOCO is generated from COCO and contains 10K high-transferable samples. AOCO helps to quickly evaluate and improve the robustness of detectors. 



Szegedy et al. (2014); Goodfellow et al. (2015); Carlini & Wagner (2017); M ądry et al. (2017); Baluja & Fischer (2017); Su et al. (2019)) have revealed the fragility of Deep Neural Networks (DNNs) by fooling them with elaborately-crafted imperceptible perturbations. Among them, the black-box attack, i.e., attacking without knowledge of their inner structure and weights, is much harder, more aggressive and closer to real-world scenarios. For classifiers, there exist some promising black-box attacks (Papernot et al. (2016); Brendel et al. (2018); Dong et al. (2018); Xie et al. (2019); Lin et al. (2020); Chen et al. (2020)). It is also severe to attack object detection (Zhang & Wang (2019)) in a black-box manner, e.g., hiding certain objects from unknown detectors (Thys et al. (

); Li et al. (2018c;a)). Because the relevance maps are quite similar across models, those of black-box models are influenced and misled as well in attack, leading to the great transferability. Although some works have adopted the relevance map as an indicator or reference of success attacks (Dong et al. (2019); Zhang & Zhu (2019); Chen et al. (2020); Wu et al. (2020a)), there is no work to directly attack the relevance maps of detectors to the best of our knowledge.

et al. (2014)), there have been lots of promising adversarial attacks (Goodfellow et al. (2015); Carlini & Wagner (2017); M ądry et al. (2017)). Generally, they fix the network weights and change the input slightly to optimize the attack loss. The network then predicts incorrectly on adversarial samples with a high confidence. (Papernot et al. (2016; 2017)) find that adversarial samples crafted by attacking a white-box surrogate model may transfer to other black-box models as well. Input modification (Xie et al. (2019); Dong et al. (2019); Lin et al. (2020)) or other optimization ways (Dong et al. (2018); Lin et al. (2020)) are validated to be effective in enhancing the transferability. (Xie et al. (2017)) extends adversarial attacks to detectors. It proposes to attack on densely generated bounding boxes. After that, losses about localization and classification are designed (Li et al. (2018c)) for attacking detectors. (Lu et al. (2017)) and (Li et al. (2018a)) propose to attack detectors in a restricted area. Existing works achieve good results in white-box scenarios, but are not specifically designed for transferability. The adversarial impact on black-box models is quite limited, i.e., a 5 to 10% decrease from the original mAP, even when two models only differ in backbone (Xie et al. (2017); Li et al. (2018c;b)). (Wang et al. (2020)) discusses black-box attacks towards detectors based on queries rather than the transferability as we do. The performance is satisfactory, but it requires over 30K queries, which is easy to be discovered by the model owner. Besides, physical attacks on white-box detectors are also feasible (Huang et al. (2020); Wu et al. (2020b); Xu et al. (2020)). For the great transferability, we propose to attack on relevance maps, which are calculated by network visualization methods (Zeiler & Fergus (2014); Selvaraju et al. (2017); Shrikumar et al. (2017)). They are originally developed to interpret how DNNs predict and help users gain trust on them. Specifically, they display how the input contributes to a certain node output in a pixel-wise manner. Typical works include Layer-wise Relevance Propagation (LRP) (Bach et al. (2015)), Contrastive LRP (Gu et al. (2018)) and Softmax Gradient LRP (SGLRP) (Iwana et al. (2019)). These methods encourage the reference of relevance maps in attack (Dong et al. (2019); Zhang & Zhu (2019); Chen et al. (2020); Wu et al. (2020a)), and also inspire us. However, none of them attack on relevance maps for detectors. RAD differs from (Ghorbani et al. (2019); Zhang et al. (2020)) in the goal. RAD misleads detectors by suppressing relevance maps. In contrast, (Ghorbani et al. (2019)) misleads the relevance maps while keeping the prediction unchanged. (Zhang et al. (2020)) also misleads DNNs, but it keeps the relevance maps unchanged.

