

Abstract

Humans rely heavily on shape information to recognize objects. Conversely, convolutional neural networks (CNNs) are biased more towards texture. This fact is perhaps the main reason why CNNs are susceptible to adversarial examples. Here, we explore how shape bias can be incorporated into CNNs to improve their robustness. Two algorithms are proposed, based on the observation that edges are invariant to moderate imperceptible perturbations. In the first one, a classifier is adversarially trained on images with the edge map as an additional channel. At inference time, the edge map is recomputed and concatenated to the image. In the second algorithm, a conditional GAN is trained to translate the edge maps, from clean and/or perturbed images, into clean images. The inference is done over the generated image corresponding to the input's edge map. A large number of experiments with more than 10 data sets demonstrate the effectiveness of the proposed algorithms against FGSM, ∞ PGD-40, Carlini-Wagner, Boundary, and adaptive attacks. Further, we show that edge information can a) benefit other adversarial training methods, b) be even more effective in conjunction with background subtraction, c) be used to defend against poisoning attacks, and d) make CNNs more robust against natural image corruptions such as motion blur, impulse noise, and JPEG compression, than CNNs trained solely on RGB images. From a broader perspective, our study suggests that CNNs do not adequately account for image structures and operations that are crucial for robustness. The code is available at: https://github.com/[masked].

1. INTRODUCTION

Deep neural networks (LeCun et al., 2015) remain the state of the art across many areas and are employed in a wide range of applications. They also provide the leading model of biological neural networks, especially in visual processing (Kriegeskorte, 2015) . Despite the unprecedented success, however, they can be easily fooled by adding carefully-crafted imperceptible noise to normal inputs (Szegedy et al., 2014; Goodfellow et al., 2015) . This poses serious threats in using them in safety-and security-critical domains. Intensive efforts are ongoing to remedy this problem. Our primary goal here is to learn robust models for visual recognition inspired by two observations. First, object shape remains largely invariant to imperceptible adversarial perturbations (Fig. 1 ). Shape is a sign of an object and plays a vital role in recognition (Biederman, 1987) . We rely heavily on edges and object boundaries, whereas CNNs emphasize more on texture (Geirhos et al., 2018) . Second, unlike CNNs, we recognize objects one at a time through attention and background subtraction (e.g., Itti & Koch (2001) ). These may explain why adversarial examples are perplexing. The convolution operation in CNNs is biased towards capturing texture since the number of pixels constituting texture far exceeds the number of pixels that fall on the object boundary. This in turn provides a big opportunity for adversarial image manipulation. Some attempts have been made to emphasize more on edges, for example by utilizing normalization layers (e.g., contrast and divisive normalization (Krizhevsky et al., 2012) ). Such attempts, however, have not been fully investigated for adversarial defense. Overall, how shape and texture should be reconciled in CNNs continues to be an open question. Here we propose two solutions that can be easily implemented and integrated in existing defenses. We also investigate possible adaptive attacks against them. Extensive experiments across ten datasets, over which shape and texture have different relative importance, demonstrate the effectiveness of our solutions against strong attacks. Our first method performs adversarial training on edge-augmented inputs. The second method uses a conditional GAN (Isola et al., 2017) to translate edge maps to clean images, essentially finding a perturbation-invariant transformation. There is no need for adversarial training (and hence less computation) in this method. Further, and perhaps less surprising, we find that incorporating edges also makes CNNs more robust to natural images corruptions and backdoor attacks. The versatility and effectiveness of these approaches, without significant parameter tuning, is very promising. Ultimately, our study shows that shape is the key to build robust models and opens a new direction for future research in adversarial robustness.

2. RELATED WORK

Here, we provide a brief overview of the closely related research with an emphasis on adversarial defenses. For detailed comments on this topic, please refer to Akhtar & Mian (2018). Adversarial attacks. The goal of the adversary is to craft an adversarial input x ∈ R d by adding an imperceptible perturbation to the (legitimate) input x ∈ R d (here in the range [0,1]), i.e., x = x + . Here, we consider two attacks based on the ∞ -norm of , the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) , as well as the Projected Gradient Descent (PGD) method (Madry et al., 2017) 2020) studied the factors that produce texture bias in CNNs and learned that data augmentation plays a significant role to mitigate texture bias. Xiao et al. (2019) , in parallel to our work, have also proposed methods to utilize shape for adversarial defense. They perform classification on the edge map rather than the image itself. This is a baseline method against which we compare our algorithms. Similar to us, they also use GANs to purify the input image.



Figure 1: Adversarial attacks against ResNet152 over the giant panda image using FGSM (Goodfellow et al., 2015), PGD-40 (Madry et al., 2017) (α=8/255), DeepFool (Moosavi-Dezfooli et al., 2016) and Carlini-Wagner (Carlini & Wagner, 2017) attacks. The second columns in panels show the difference (L2) between the original image (not shown) and the adversarial one (values shifted by 128 and clamped). The edge map (using Canny edge detector) remains almost intact at small perturbations. Notice that edges are better preserved for the PGD-40. See Appx. A for a more detailed version of this figure, and also the same using the Sobel method.

Adversarial defenses. Recently, there has been a surge of methods to mitigate the threat from adversarial attacks either by making models robust to perturbations or by detecting and rejecting malicious inputs. A popular defense is adversarial training in which a network is trained on adversarial examples(Szegedy et al., 2014; Goodfellow et al., 2015). In particular, adversarial training with a PGD adversary remains empirically robust to this day(Athalye et al., 2018). Drawbacks of adversarial training include impacting clean performance, being computationally expensive, and overfitting to the attacks it is trained on. Some defenses, such as Feature Squeezing(Xu et al., 2017), Feature Denoising (Xie et al., 2019), PixelDefend (Song et al., 2017), JPEG Compression (Dziugaite et al., 2016) and Input Transformation (Guo et al., 2017), attempt to purify the maliciously perturbed images by transforming them back towards the distribution seen during training. MagNet (Meng & Chen, 2017) trains a reformer network (one or multiple auto-encoders) to move the adversarial image closer to the manifold of legitimate images. Likewise, Defense-GAN (Samangouei et al., 2018) uses GANs (Goodfellow et al., 2014) to project samples onto the manifold of the generator before classifying them. A similar approach based on Variational AutoEncoders (VAE) is proposed in Li & Ji (2019). Unlike these works which are based on texture (and hence are fragile (Athalye et al., 2018)), our GAN-based defense is built upon edge maps. Some defenses are inspired by biology (e.g., Dapello et al. (2020), Li et al. (2019), Strisciuglio et al. (2020), Reddy et al. (2020)). vs. texture. Geirhos et al. (2018) discovered that CNNs routinely latch on to the object texture, whereas humans pay more attention to shape. When presented with stimuli with conflicting cues (e.g., a cat shape with elephant skin texture; Appx. A), human subjects correctly labeled them based on their shape. In sharp contrast, predictions made by CNNs were mostly based on the texture (See alsoHermann & Kornblith (2019)). Similar results are also reported byBaker et al.  (2018). Hermann et al. (

