RELEVANCE ATTACK ON DETECTORS

Abstract

This paper focuses on high-transferable adversarial attacks on detectors, which are hard to attack in a black-box manner, because of their multiple-output characteristics and the diversity across architectures. To pursue a high attack transferability, one plausible way is to find a common property across detectors, which facilitates the discovery of common weaknesses. We are the first to suggest that the relevance map for detectors is such a property. Based on it, we design a Relevance Attack on Detectors (RAD), which achieves a state-of-the-art transferability, exceeding existing results by above 20%. On MS COCO, the detection mAPs for all 8 black-box architectures are more than halved and the segmentation mAPs are also significantly influenced. Given the great transferability of RAD, we generate the first adversarial dataset for object detection, i.e., Adversarial Objects in COntext (AOCO), which helps to quickly evaluate and improve the robustness of detectors.



)). By that, life-concerning systems based on detection such as autonomous driving and security surveillance would be greatly influence. To the best of our knowledge, no existing attack is specifically designed for black-box transferability in detectors, because they have multiple-outputs and a high diversity across architectures. In such situations, adversarial samples do not transfer well (Su et al. ( 2018)), and most attacks only decrease mAP of black-box detectors by 5 to 10% (Xie et al. (2017); Li et al. (2018c; b) ). To overcome this, we propose one plausible way to find common properties across detectors, which facilitates the discovery of common weaknesses. Based on them, the designed attack can threaten variable victims. In this paper, we adopt the relevance map as a common property, on which different detectors have similar interpretable results, as shown in Fig. 1 . Based on relevance maps, we design a Relevance Attack on Detectors (RAD). RAD focuses on suppressing the relevance map rather than directly attacking the prediction as in existing works (Xie et al. ( 2017 In our comprehensive evaluation, RAD achieves the state-of-the-art transferability on 8 black-box models for COCO) dataset (Lin et al. (2014) ), nearly halving the detection mAP. Interestingly, the adversarial samples of RAD also greatly influence the performance of instance segmentation, even only detectors are attacked. Given the high transferability of RAD, we create Adversarial Objects in COntext (AOCO), the first adversarial dataset for object detection. AOCO contains 10K samples that significantly decrease the performance of black-box models for detection and segmentation. AOCO may serve as a benchmark to test the robustness of a DNN or improve it by adversarial training.



Szegedy et al. (2014); Goodfellow et al. (2015); Carlini & Wagner (2017); M ądry et al. (2017); Baluja & Fischer (2017); Su et al. (2019)) have revealed the fragility of Deep Neural Networks (DNNs) by fooling them with elaborately-crafted imperceptible perturbations. Among them, the black-box attack, i.e., attacking without knowledge of their inner structure and weights, is much harder, more aggressive and closer to real-world scenarios. For classifiers, there exist some promising black-box attacks (Papernot et al. (2016); Brendel et al. (2018); Dong et al. (2018); Xie et al. (2019); Lin et al. (2020); Chen et al. (2020)). It is also severe to attack object detection (Zhang & Wang (2019)) in a black-box manner, e.g., hiding certain objects from unknown detectors (Thys et al. (

); Li et al. (2018c;a)). Because the relevance maps are quite similar across models, those of black-box models are influenced and misled as well in attack, leading to the great transferability. Although some works have adopted the relevance map as an indicator or reference of success attacks (Dong et al. (2019); Zhang & Zhu (2019); Chen et al. (2020); Wu et al. (2020a)), there is no work to directly attack the relevance maps of detectors to the best of our knowledge.

