

Abstract

Humans rely heavily on shape information to recognize objects. Conversely, convolutional neural networks (CNNs) are biased more towards texture. This fact is perhaps the main reason why CNNs are susceptible to adversarial examples. Here, we explore how shape bias can be incorporated into CNNs to improve their robustness. Two algorithms are proposed, based on the observation that edges are invariant to moderate imperceptible perturbations. In the first one, a classifier is adversarially trained on images with the edge map as an additional channel. At inference time, the edge map is recomputed and concatenated to the image. In the second algorithm, a conditional GAN is trained to translate the edge maps, from clean and/or perturbed images, into clean images. The inference is done over the generated image corresponding to the input's edge map. A large number of experiments with more than 10 data sets demonstrate the effectiveness of the proposed algorithms against FGSM, ∞ PGD-40, Carlini-Wagner, Boundary, and adaptive attacks. Further, we show that edge information can a) benefit other adversarial training methods, b) be even more effective in conjunction with background subtraction, c) be used to defend against poisoning attacks, and d) make CNNs more robust against natural image corruptions such as motion blur, impulse noise, and JPEG compression, than CNNs trained solely on RGB images. From a broader perspective, our study suggests that CNNs do not adequately account for image structures and operations that are crucial for robustness. The code is available at: https://github.com/[masked].

1. INTRODUCTION

Deep neural networks (LeCun et al., 2015) remain the state of the art across many areas and are employed in a wide range of applications. They also provide the leading model of biological neural networks, especially in visual processing (Kriegeskorte, 2015) . Despite the unprecedented success, however, they can be easily fooled by adding carefully-crafted imperceptible noise to normal inputs (Szegedy et al., 2014; Goodfellow et al., 2015) . This poses serious threats in using them in safety-and security-critical domains. Intensive efforts are ongoing to remedy this problem. Our primary goal here is to learn robust models for visual recognition inspired by two observations. First, object shape remains largely invariant to imperceptible adversarial perturbations (Fig. 1 ). Shape is a sign of an object and plays a vital role in recognition (Biederman, 1987) . We rely heavily on edges and object boundaries, whereas CNNs emphasize more on texture (Geirhos et al., 2018) . Second, unlike CNNs, we recognize objects one at a time through attention and background subtraction (e.g., Itti & Koch (2001) ). These may explain why adversarial examples are perplexing. The convolution operation in CNNs is biased towards capturing texture since the number of pixels constituting texture far exceeds the number of pixels that fall on the object boundary. This in turn provides a big opportunity for adversarial image manipulation. Some attempts have been made to emphasize more on edges, for example by utilizing normalization layers (e.g., contrast and divisive normalization (Krizhevsky et al., 2012) ). Such attempts, however, have not been fully investigated for adversarial defense. Overall, how shape and texture should be reconciled in CNNs continues to be an open question. Here we propose two solutions that can be easily implemented and integrated in existing defenses. We also investigate possible adaptive attacks against them. Extensive experiments across ten datasets, over which shape and texture have different relative importance, demonstrate the effectiveness of our solutions against strong attacks. Our first method performs adversarial training on edge-augmented inputs. The second method uses a conditional GAN (Isola et al., 2017) to translate edge maps to clean images, essentially finding a perturbation-invariant transformation. Figure 1 : Adversarial attacks against ResNet152 over the giant panda image using FGSM (Goodfellow et al., 2015) , PGD-40 (Madry et al., 2017) (α=8/255), DeepFool (Moosavi-Dezfooli et al., 2016) and Carlini-Wagner (Carlini & Wagner, 2017) attacks. The second columns in panels show the difference (L2) between the original image (not shown) and the adversarial one (values shifted by 128 and clamped). The edge map (using Canny edge detector) remains almost intact at small perturbations. Notice that edges are better preserved for the PGD-40. See Appx. A for a more detailed version of this figure, and also the same using the Sobel method. There is no need for adversarial training (and hence less computation) in this method. Further, and perhaps less surprising, we find that incorporating edges also makes CNNs more robust to natural images corruptions and backdoor attacks. The versatility and effectiveness of these approaches, without significant parameter tuning, is very promising. Ultimately, our study shows that shape is the key to build robust models and opens a new direction for future research in adversarial robustness.

2. RELATED WORK

Here, we provide a brief overview of the closely related research with an emphasis on adversarial defenses. For detailed comments on this topic, please refer to Akhtar & Mian (2018) . Adversarial attacks. The goal of the adversary is to craft an adversarial input x ∈ R d by adding an imperceptible perturbation to the (legitimate) input x ∈ R d (here in the range [0,1]), i.e., x = x + . Here, we consider two attacks based on the ∞ -norm of , the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) , as well as the Projected Gradient Descent (PGD) method (Madry et al., 2017) . Both white-box and black-box attacks in the untargeted condition are considered. Deep models are also susceptible to image transformations other than adversarial attacks (e.g., noise, blur), as is shown in Hendrycks & Dietterich (2019) and Azulay & Weiss (2018) . Adversarial defenses. Recently, there has been a surge of methods to mitigate the threat from adversarial attacks either by making models robust to perturbations or by detecting and rejecting malicious inputs. A popular defense is adversarial training in which a network is trained on adversarial examples (Szegedy et al., 2014; Goodfellow et al., 2015) . In particular, adversarial training with a PGD adversary remains empirically robust to this day (Athalye et al., 2018) . Drawbacks of adversarial training include impacting clean performance, being computationally expensive, and overfitting to the attacks it is trained on. Some defenses, such as Feature Squeezing (Xu et al., 2017) , Feature Denoising (Xie et al., 2019) , PixelDefend (Song et al., 2017 ), JPEG Compression (Dziugaite et al., 2016) and Input Transformation (Guo et al., 2017) , attempt to purify the maliciously perturbed images by transforming them back towards the distribution seen during training. MagNet (Meng & Chen, 2017 ) trains a reformer network (one or multiple auto-encoders) to move the adversarial image closer to the manifold of legitimate images. Likewise, Defense-GAN (Samangouei et al., 2018) uses GANs (Goodfellow et al., 2014) to project samples onto the manifold of the generator before classifying them. A similar approach based on Variational AutoEncoders (VAE) is proposed in Li & Ji (2019) . Unlike these works which are based on texture (and hence are fragile (Athalye et al., 2018) ), our GAN-based defense is built upon edge maps. Some defenses are inspired by biology (e.g., Dapello et al. (2020) , Li et al. (2019) , Strisciuglio et al. (2020 ), Reddy et al. (2020) ). Shape vs. texture. Geirhos et al. (2018) discovered that CNNs routinely latch on to the object texture, whereas humans pay more attention to shape. When presented with stimuli with conflicting cues (e.g., a cat shape with elephant skin texture; Appx. A), human subjects correctly labeled them based on their shape. In sharp contrast, predictions made by CNNs were mostly based on the texture (See also Hermann & Kornblith (2019) ). Similar results are also reported by Baker et al. (2018) . Hermann et al. (2020) studied the factors that produce texture bias in CNNs and learned that data augmentation plays a significant role to mitigate texture bias. Xiao et al. (2019) , in parallel to our work, have also proposed methods to utilize shape for adversarial defense. They perform classification on the edge map rather than the image itself. This is a baseline method against which we compare our algorithms. Similar to us, they also use GANs to purify the input image. for t = 1 . . . T do for i = 1 . . . M do // launch adversarial attack (here FGSM and PGD attacks)  xi = clip(x i + sign(∇ x (f θ (x i ), y i ))) if β == imgedge & redetect train then xi = detect edge(x i ) // recompute and replace the edge map end if = α (f θ (x i ), y i ) + (1 -α) (f θ (x i ), y i ) // here α = 0.5 θ = θ -∇ θ //

Edge-guided Adversarial Training (EAT).

The intuition here is that the edge map retains the structure in the image and helps disambiguate the classification (See Fig. 1 ). In its simplest form (Fig. 7 (A) in Appx. A; Alg. 1), adversarial training is performed over the 2D (Gray+Edge) or 4D (RGB+Edge) input (i.e., number of channels; denoted as Img+Edge). In a slightly more complicated form (Fig. 7 (B)), first, for each input (clean or adversarial), the old edge map is replaced with the newly extracted one. The edge map can be computed from the average of only image channels or all available channels (i.e., image plus edge). The latter can sometimes improve the results, since the old edge map (although perturbed; Fig. 10 and Appx. B) still contains unaltered shape structures. Then, adversarial training is performed over the new input. The reason behind adversarial training with redetected edges is to expose the network to possible image structure damage. The loss for training is a weighted combination of loss over clean images and loss over adversarial images. At inference time, first, the edge map is computed and then classification is done over the edge-augmented input. As a baseline model, we also consider first detecting the input's edge map and then feeding it to the model trained on the edges for classification. We refer to this model as Img2Edge.

GAN-based Shape Defense (GSD).

Here, first, a conditional GAN is trained to map the edge image, from clean or adversarial images, to its corresponding clean image (Alg. 2). Any image translation method (here pix2pix by Isola et al. (2017) using this codefoot_0 ) can be employed for this purpose. Next, a CNN is trained over the generated images. At inference time, first, the edge map is computed and then classification is done over the generated image for this edge image. The intuition is that the edge map remains nearly the same over small perturbation budgets (See Appx. A). Notice that conditional GAN can also be trained on perturbed images (similar to Samangouei et al. (2018) and Li & Ji (2019) or edge-augmented perturbed images (similar to above).

4.1. DATASETS AND MODELS

Experiments are spread across 10 datasets covering a variety of stimulus types. Sample images from datasets are given in Fig. 2 . Models are trained with cross-entropy loss and Adam optimizer (Kingma & Ba, 2014) with a batch size of 100, for 20 epochs over MNIST and FashionMNIST, 30 over DogVsCat, and 10 over the remaining. Canny method (Canny, 1986) is used for edge detection over all datasets, except DogBreeds for which Sobel is used. Edge detection parameters are separately adjusted for each dataset. We did not carry out an exhaustive hyperparameter search, since we are interested in additional benefits edges may bring rather than training the best possible models. The first two datasets include MNIST (LeCun et al., 1998) and FashionMNIST (Xiao et al., 2017) . A CNN with 2 convolution, 2 pooling, and 2 fc layers is trained. Over the remaining datasets, we finetune a pre-trained ResNet18 (He et al., 2016) , trained over ImageNet (Deng et al., 2009) , and normalize images using ImageNet mean and standard deviation. The fourth dataset, CIFAR-10 (Krizhevsky, 2009), contains 50K training and 10K test images with a resolution of 32×32 which are resized here to 64×64 for better edge detection. The fifth dataset is DogBreeds (see footnote). It contains 1,421 training and 356 test images at resolution 224×224 over 16 classes. The sixth dataset is GTSRB (Stallkamp et al., 2012) and includes 39,209 and 1,2631 training and test images, respectively, over 43 classes (resolution 64×64 pixels). The seventh dataset, Icons-50, includes 6,975 training and 3,025 test images over 50 classes (Hendrycks & Dietterich, 2019) . The original image size is 120×120 which is resized to 64×64. The eighth dataset, Sketch, contains 14K training and 6K test images over 250 classes. Images have size 1111×1111 and are resized to 64×64 in experiments (Eitz et al., 2012) . The ninth and tenth datasets are derived from ImageNetfoot_2 . The Imagenette2-160 dataset has 3,925 training and 9,469 test images (resolution 160×160) over 10 classes (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, and parachute). The Tiny Imagenet dataset has 100K training images (resolution 64 × 64) and 10K validation images (used here as the test set) over 200 classes. For attacks, we use https://github.com/Harry24k/adversarial-attacks-pytorch, except Boundary attack for which we use https://github.com/bethgelab/foolbox.

4.2.1. EDGE-GUIDED ADVERSARIAL TRAINING

Results over MNIST and CIFAR-10 are shown in Tables 1 and 2 , respectively. In these experiments, edge maps are computed only from the gray-level image (in turn computed from the image channels). Please refer to Appx. B for results over the remaining datasets. Over MNIST and FashionMNIST, robust models trained using edges outperform models trained on gray-level images (the last column). The naturally trained models, however, perform better using gray-level images than edge maps (Orig. model column). Adversarial training with augmented inputs improves the robustness significantly over both datasets, except the FGSM attack on Fashion-MNIST. Over CIFAR-10, incorporating the edges improves the robustness by a large margin against the PGD-40 attack. At = 32/255, the performance of the robust model over clean and perturbed images is raised from (0.316, 0.056) to (0.776, 0.392). On average, the robust model shows 64% improvement over the RGB model (last column in Table 2 ). Results when using the Sobel edge detector instead of the Canny does not show a significant difference (Table 7 in Appx. B). Over the TinyImageNet dataset, as in CIFAR-10, classification using edge maps is poor perhaps due to the background clutter. Nevertheless, incorporating edges improves the results. We expect even better results with more accurate edge detection algorithms (e.g., supervised deep edge detectors). Over these 4 datasets, the final model (i.e., adversarial training using image + redetected edge, and edge redetection at inference time) leads to the best accuracy. The improvement over the image is more pronounced at larger perturbations, in particular against the PGD-40 attack (as expected; Fig. 1 ). Over the DogVsCat dataset, as in FashionMNIST, the model trained on the edge map is much more robust than the image-only model (Table 8 in Appx. B). Over the DogBreeds dataset, utilizing edges does not improve the results significantly (compared to the image model). The reason could be that texture is more important than shape in this fine-grained recognition task (Table 9 Appx. B). Over GTSRB, Icons-50, and Sketch datasets, image+edge model results in higher robustness than the image-only model, but leads to relatively less improvement compared to the edge-only model. Please see Tables 11, 13 , and 15. Over the Imagenette2-160 dataset (Table 17 ), classification using images does better than edges since the texture is very important on this dataset. Average results over 10 datasets is presented in Fig. 3 (left panel). Combining shape and texture (full model) leads to a substantial improvement in robustness over the texture alone (5.24% imp. against FGSM and 28.76% imp. against PGD-40). Also, image+edge model is slightly more robust than the image-only model. Computing the edge map from all image channels improves the results on some datasets (e.g., GTSRB and Sketch) but hurts on some others (e.g., CIFAR-10) as shown in Appx. B. The right two panels in Fig. 3 show a comparison of natural (Orig. model column in tables; solid lines) vs. adversarial training. Natural training with image+edge and redetection at inference time leads to enhanced robustness with little to no harm to standard accuracy. Despite the Edge model only being trained on edges from clean images, the Img2Edge model does better than other naturally-trained models against attacks. The best performance, however, belongs to models trained adversarially. Notice that our results set a new record on adversarial robustness on some of these datasets even without exhaustive parameter searchfoot_3 . Robustness against Carlini-Wagner (CW) and Boundary attacks. Performance of our method against l 2 CW attack on MNIST dataset is shown in Appx. J. To make experiments tractable, we set the number of attack iterations to 10. With even 10 iterations, the original Edge and Img mod- els are severely degraded. Img2Edge and Img+(Edge Redetect) models, however, remain robust. Adversarial training with CW attack results in robust models in all cases. Results against the decision-based Boundary attack (Brendel et al., 2017) are shown in Appx. K over MNIST and Fashion MNIST datasets. Edge, Img, and Img+Edge models perform close to zero over adversarial images. Img+(Edge Redetect) model remains robust since the Canny edge map does not change much after the attack, as is illustrated in Fig. 29 . Robustness against substitute model attacks. Following Papernot et al. (2016) , we trained substitute models to mimic the robust models (with the same architecture but with RGB channels) using the cross-entropy loss over the logits of the two networks, for 5 epochs. The adversarial examples crafted for the substitute networks were then fed to the robust networks. Results are shown in italics in Tables 1, 2 , 4 and 5 (performed only against the edge-redetect models). We find that this attack is not able to knock off the robust models. Surprisingly, it even improves the accuracy in some cases. Please refer to Appx. E for more details. Robustness against adaptive attacks. So far we have been using the Canny edge detector which is non-differentiable. What if the adversary builds a differentiable edge detector to approximate the Canny edge detector and then utilizes it to craft adversarial examples? To study this, we run two experiments. In the first one, we build the following pipeline using the HED deep edge detector (Xie & Tu, 2015) : Img -→ HED -→ Classifier HED . A CNN classifier (as above) is trained over the HED edges on the Imagenette2-160 dataset (See Appx. L). Attacking this classifier with FGSM and PGD-5 ( = 8/255) completely fools the network. The original classifier (Img2Edge here) trained on Canny edges, however, is still largely robust to the attacks (i.e., Img adv-HED -→ Canny -→ Classifier Canny ) as shown in Table 29 . Notice that the HED edge maps are continuous in the range [0,1], whereas Canny edge maps are binary, which may explain why it is easy to fool the HED classifier (See Fig. 30 ). Above, we used an off the shelf deep edge detector trained on natural scenes. As can be seen in Appx. L, its generated edge maps differ significantly from Canny edges. What if the adversary trains a model with the (input, output) pair as (input image, Canny edge map) to better approximate the Canny edge detector? In experiment two, we investigate this possibility. We build a pipeline consisting of a convolutional autoencoder followed by a CNN on MNIST. Details regarding architecture and training procedure are given in Appx. M. As results in Fig. 33 reveal, FGSM and PGD-40 attacks against the pipeline are very effective. Passing the adversarial images through Canny and then a trained (naturally or adversarially) classifier on Canny edges (i.e., Img2Edge), still leads to high accuracy, which means that transfer was not successful. We attribute this feat to the binary output of Canny. Two important point deserve attention. First, here we used the Img2Edge model, which as shown above, is less robust compared to the full model (i.e., img+edge and redetection). Thus, adaptive attacks may be even less effective against the full model. Second, proposed methods perform better when edge map is less disturbed. For example, as shown in Fig. 33 (bottom), the adaptive attack is less effective against the PGD attack since edges are preserved better. Analysis of parameter α. By setting α = 0, the network will be exposed only to adversarial examples (Alg. 1), which is computationally more efficient. However, it results in lower accuracy and robustness compared to when α = 0.5, which means exposing the network to both clean and adversarial images is important (See Table 19 ; Appx. D). Nevertheless, here again incorporating edges improves the robustness significantly compared to the image-only case. Why is this method working? The main reason is that the edge map acts as a checksum, and the network learns (through adversarial training) to rely more on the redetected edges when other channels are misleading (See Table 23 ). This aligns with prior observations such as shortcut learning in CNNs (Geirhos et al., 2020) . Also, our approach resembles adversarial patch or backdoor/trojan attacks where the goal is to fool a classifier by forcing it to rely on irrelevant cues. Conversely, here we use this trick to make a model more robust. Also, the Img2Edge model can purify the input before classifying it. Any adaptive attack against the EAT defense has to alter the edges which most likely will result in perceptible structural damages. See also When we trained the pix2pix over the edge maps from the perturbed images, the new CNN models became even more robust (stars in Fig. 4 ; top panels). We expect even better results with training over edge maps from both intact and perturbed imagesfoot_4 . Figure 4 : Results of GSD method. Over CIFAR-10 and Icons-50 datasets, generated images are poor. Consequently, GSD underperforms the original model over the original clean images. Over the adversarial inputs, however, GSD wins, especially at high perturbation budgets and against the PGD-40 attack. With better edge detection and image generation methods (e.g., using perceptual loss), even better results are expected. Why is this method working? The main reason is that cGAN learns a function f that is invariant to adversarial perturbations. Since the edge map is not completely invariant to (especially large) perturbations, one has to train the cGAN on the augmented dataset composed of clean and perturbed images. One advantage of this approach is it computational efficiency since there is no need for adversarial training. Any adaptive attack against this defense has to fool the cGAN which is perhaps not feasible since it will be noticed from the generated images (i.e., cGAN will fail to generate decent images). Compared to other adversarial defenses that utilize GANs (e.g., Samangouei et al. (2018) ; Li & Ji (2019) ), our approach relies less on texture. It can be integrated with these defenses.

5. FAST & FREE ADVERSARIAL TRAINING WITH SHAPE DEFENSE

Here, we examine whether incorporating shape bias can empower other defenses, in particular, a 3). Over FashionMNIST, the improvement is even more pronounced (from 69.6% to 85.5% at = 0.1 and from 0% to 82.3% at = 0.3 ). Over clean images, our full model outperforms other models in most of the cases. Over the CIFAR-10 dataset, the shape-based extension of the defenses results in high accuracy over both clean and perturbed images (using PGD-10 attack), compared to the image-only model. We expect similar improvements with the classic PGD adversarial training. Overall, our analyses in this section suggest that exploiting edges is not specific to the particular way we perform adversarial training (Algorithms 1&2), and be extended to other defense methods (e.g., TRADES algorithm by Zhang et al. (2019) ).

6. BACKGROUND SUBTRACTION

Background subtraction (a.k.a foreground detection) is an important mechanism by which humans process scenes and recognize objects. It interacts with other mechanisms such as edge and boundary detection. How useful is it for adversarial robustness? In other words, how robust the model will be assuming that the attacker has only access to the foreground object? To find out, we perform an experiment over MNIST and FashionMNIST, for which it is easy to derive the foreground masks. We compare the Img and Edge models (from Section 4.2.1) over the original and noisy (digits placed on white noise background) data, with and without background subtraction and edge detection, against the FGSM attack. Results are shown in Fig. 5(A) . First, both models perform poorly over noisy images with the Edge model doing better. Second, post background subtraction, models are much more robust. Third, applying the Edge model to the foreground region leads to almost perfect robustness over MNIST. Even without perfect edge detection, the Edge model does very well over FashionMNIST. This analysis provides an upper bound on the potential benefit from background subtraction on model robustness, assuming that foreground objects can be reliably detected.

7. HARNESSING BACKDOOR ATTACKS

Proposed mechanisms can also withstand invisible and visible backdoor attacks (Brown et al., 2017; Liu et al., 2017) . Over MNIST, we planted an invisible C-like patch in half of the 8s and relabeled them as 9. We then trained the Img model on this new dataset. The Img model on a test set where all 8s are contaminated (with the patch), classifies almost all of them as 9 (top-left panel in Fig. 5.B ). The Edge model, however, correctly classifies them as 8 since edge detection removes the pattern (top-right panel). Thanks to the edge detection, it is also not possible to train the Edge model on the poisoned dataset. A similar experiment on FashionMNIST, using a different patch, shows similar results (bottom panels in Fig. 5 .B). In presence of visible patches, the model would not be affected if the correct region is identified (via background subtraction) during training or testing (Appx. I).

8. ROBUSTNESS AGAINST NATURAL IMAGE DISTORTIONS

Previous work has shown that ImageNet-trained CNNs generalize poorly over a wide range of image distortions (e.g., Azulay & Weiss (2018) ; Dodge & Karam (2017) ). Our objective in this section is to study whether increasing shape bias improves robustness against common image distortions just as it did over adversarial examples. Following Hendrycks & Dietterich (2019) , we systematically test how model accuracies degrade if images are corrupted by 15 different types of distortions including brightness, contrast, defocus blur, elastic transform, fog, frost, Gaussian noise, glass blur, impulse noise, JPEG compression, motion blur, pixelatation, shot noise, snow, and zoom blur, at 5 levels of severity. Fig. 19 (Appx. H) shows sample images along with their distortions. We test the original models (trained naturally on clean training images) as well as the robust models (trained adversarially using Algorithm 1) over the corrupted versions of test sets on three datasets. Results are visualized in Fig. 6 . See Appx. H for breakdown results on each dataset and distortion. Two conclusions are drawn. First, incorporating edge information in original models (and hence increasing shape bias) improves robustness against common image distortions (solid curves in Fig. 6 ; RGB+Egde > RGB or Edge). Improvement is more noticeable at larger distortions and over datasets with less background clutter (e.g., Icons-50). This is in alignment with Geirhos et al. (2018) where they showed ResNet-50 trained on the Stylized-ImageNet dataset performs better than the vanilla ResNet-50 on both clean and distorted images. Second, adversarially-trained models (in particular those trained on Img + Edge) are more robust to image distortions compared to original models. In summary, incorporating edges and adversarial images leads to improved robustness against natural image distortions, despite models not being trained on any of the distortions during training. This in turn suggests that the proposed algorithms indeed rely more on shape than texture.

9. DISCUSSION AND OUTLOOK

Two algorithms are proposed to use shape bias and background subtraction to strengthen CNNs and defend against adversarial attacks and backdoor attacks. To fool these defenses one has to perturb the image such that the new edge map is significantly different from the old one while preserving image shape and geometry, which does not seem to be trivial at low perturbation budgets. Even though we did not perform an exhaustive parameter search (model architecture, epochs, edge detection, cGAN training, etc.), our results are better than or on par with the state of the art in some cases (e.g., over MNIST and CIFAR datasets). The proposed mechanisms are computationally efficient and excel with higher resolution images and low background clutter. They are also more effective against stronger attacks than weaker ones since strong attacks perturb the image less while being more destructive (e.g., PGD vs. FGSM; Fig. 1 ). Shape defense can also be combined with other defenses to produce robust models without a significant slowdown. Future work should assess shape defense against adversarial attacks such as e.g., gradient-free attacks, decision-based attacks, sparse attacks (e.g., the one pixel attack (Su et al., 2019) ), attacks that perturb only the edge pixels, attacks that manipulate the image structure (Xiao et al., 2018) , ad-hoc adaptive attacks, , and backdoor (Chen et al., 2017) ), as well as other p norms, and datasets. There might be also other ways to incorporate shape-bias in CNNs, such as 1) augmenting a dataset with edge maps or negative images, 2) overlaying texture from some objects onto some others as in Geirhos et al. (2018) , and 3) designing normalization layers (Carandini & Heeger, 2012) . Lastly, the interpretation of the shape defense, as in Zhang & Zhu (2019) , is another research direction.

A ILLUSTRATION OF SHAPE IMPORTANCE IN ADVERSARIAL ROBUSTNESS

Figure 7 : Edge-guided adversarial training (EAT). In its simplest form, adversarial training is performed over the 2D (Gray+Edge) or 4D (RGB+Edge) input (i.e., number of channels; denoted as Img+Edge). In a slightly more complicated form (B), first for each input (clean or adversarial), the old edge map is replaced with the newly extracted one. The edge map can be computed from the average of only image channels or all available channels (i.e., image plus edge). Figure 14 : An example visual illusion simultaneously depicting a portrait of a young lady or an old lady. While fooling humans takes a lot of effort and special skills are needed, deep models are much easier to be fooled. In this example, the artist has carefully added features to make the portrait look like an old lady while the new additions will not negatively impact the look of the young lady too much. For example, the right eyebrow of the old lady (marked in red below) does not distort the ear of the young lady too much. See https://medium.com/@jonathan_hui/ adversarial-attacks-b58318bb497b for more details. Analysis of making either the image channel or the edge channel in models (i.e., making them zero). Over the original edge augmented model (Img+Edge), the image channel is more important since masking it hurts the model more (compared to the making the edge channel). Conversely, over the robust and robust redetect models, masking the edge channel hurts more. This indicates that robust models rely more on shape than texture. Models used here are adversarially trained against each attack. For example, Img+Edge Robust model is trained separately for = 8/255. This is the same setup as in the main text and tables. The FGSM attack is used here. We find that background subtraction together with edge detection improves robustness. Table 26 : Performance of the models (naturally trained and adversarially-trained) against the images with only the foreground being impacted/perturbed. Compared with the results in Table . 1, applying the models to the foreground regions improves the accuracy by a large margin.  = Uniform(-, ) δ = δ + α • sign(∇ δ (f θ (x i + δ), y i )) δ = max(min(δ, ), -) xi = x i + δ if redetect train & β == imgedge then xi = detect edge(x i ) //

M ROBUSTNESS AGAINST ADAPTIVE ATTACKS OVER MNIST DATASET

Here, we attempt to explicitly approximate the Canny edge detector using a differentiable convolutional autoencoder. In our pipeline, a classifier (CNN) is stacked after the convolutional autoencoder (with sigmoid output neurons). We first freeze the classifier and train the autoencoder using the MSE loss with (input, output) pair being (image, canny edge map). We then freeze the autoencoder and train the classifier using Cross Entropy loss. After training the network, we then craft adversarial examples for it and feed them to a classifier trained on Canny edges (original models or robust models as was mentioned in the main text). Fig. 31 shows the pipeline and some sample approximated edge maps. Fig. 32 shows the architecture details in PyTorch. The top panel in Fig. 33 shows results using the FGSM and PGD-40 attacks against the pipeline itself, and also against the Img2Edge model (trained over clean edges or adversarial onesfoot_6 ). As can be seen, both attacks are very successful against the pipeline but they do not perform well against the Canny edge map classifier (i.e., crafted adversarial examples for the pipeline do not transfer well to the Imge2Edge trained over Canny Edge map; img-→ Canny -→ class label). Notice, that here we only used the model trained on edge maps. It is likely to gain even better robustness against the adaptive attacks in using the img+edge+redetect. The bottom panel in Fig. 33 shows sample adversarial digits (constructed using the adaptive attack) and their edge maps under the FGSM and PGD-40 attacks. Notice how PGD-40 attack preserves the edges (compered to FGSM). This is because it needs less perturbation to fool the classifier. Also, notice that the perturbations shown are perceptible which results in edges maps having noise. If we limit ourselves to imperceptible perturbations, then edge maps will not change much compared to the original edge maps on clean images. 



https://github.com/mrzhu-cool/pix2pix-pytorch www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition & www.kaggle.com/c/dog-breed-identification https://github.com/fastai/imagenette & https://tiny-imagenet.herokuapp.com cf.Zhang et al. (2019); the best robust accuracy on CIFAR-10 against PGD attacks is under 60%. Similarly, the edge map classifier used in the Img2Edge model in the previous section (EAT defense) can be trained on edge maps from both clean and adversarial examples to improve performance. https://github.com/sniklaus/pytorch-hed Here we used the model adversarially trained at eps=8/255 and test it against other perturbations; unlike the main text where we trained robust models separately for each epsilon.



Figure 2: Sample images from the datasets. Numbers in parentheses denote the number of classes.

Figure 3: Left) Average results of the EAT defense on all datasets (last cols. in tables). Middle and Right) Comparison of natural (Orig. model column; solid lines) vs. adversarial training averaged over all datasets.

Figs. 10 & 14 in Appx. A. 4.2.2 GAN-BASED SHAPE DEFENSE We trained the pix2pix model for 10 epochs over MNIST and FashionMNIST, and for 100 epochs over CIFAR-10 and Icons-50 datasets. Sample generated images are shown in Fig. 18 (Appx. F). A CNN (same architecture as before) was trained for 10 epochs to classify the generated images. Results are shown in Fig. 4. The model trained over the images generated by pix2pix (solid lines in the figure) is compared to the model trained over the original clean training set (denoted by the dashed lines). Both models are tested over the clean and perturbed versions of the original test sets of the four datasets. Over MNIST and FashionMNIST datasets, GSD performs on par with the original model on clean test images. It is, however, much more robust than the original model against the attacks.

) fast adversarial training by Wong et al. (2020), dubbed FastAT, and free adversarial training by Shafahi et al. (2019), dubbed FreeAT. Wong et al. trained robust models using a much weaker and cheaper adversary to lower the cost of adversarial training. They showed that adversarial training with the FGSM adversary is as effective as PGD-based training. The key idea in Shafahi et al. 's work is to simultaneously update both the model parameters and image perturbations in one backward pass, rather than using separate gradient computations at each update step. Please see also Appx. G.The same CNN architectures as in Wong et al. are employed here. For FastAT, we trained three models over MNIST (for 10 epochs), FashionMNIST (for 3 epochs), and CIFAR-10 (for 10 epochs & early-stopping) datasets. For FreeAT, we trained models only over CIFAR-10 for 10 epochs.Results are shown in Table3. Using shape-based FastAT and over MNIST, robust accuracy against PGD-50 grows from 95.5% (image-only model) to 98.4% (our full model) at = 0.1 and from

Figure 5: A) Background subtraction together with edge detection improves robustness (here against the FGSM attack). Noisy data is created by overlaying a digit over white noise (noise×(1-mask)+digit). B) Defending backdoor attacks. An almost invisible pattern (with intensity 10/255 of the digit intensity) is added to half of the samples from one class, which are then relabeled as another class. Notice that the Edge model is not confused over the edge maps (right panels) since edge detection removes the pattern. In case of a visible backdoor attack, background subtraction can help discard the irrelevant region. See Appx. I for more details.

Figure 6: Classification accuracy over naturally distorted images.

Figure 8: Adversarial attacks against ResNet152 over the giant panda image using 4 prominent attack types: FGSM (Goodfellow et al., 2015) and PGD-40 (Madry et al., 2017) (α=8/255) for different perturbation budgets ∈ {8, 16, 32, 64}, as well as DeepFool (Moosavi-Dezfooli et al., 2016) and Carlini-Wagner (Carlini & Wagner, 2017). The second column in each panel shows the difference (L 2 ) between the original image (not shown) and the adversarial one (values shifted by 128 and clamped). For DF and CW, values are magnified 20x and then shifted. The edge map (using the Canny edge detector) remains almost intact at small perturbations. Notice that edges are better preserved for the PGD-40 attack. See Appx. A for results using the Sobel method.

Figure 9: As is in Fig. 1 in the main text but using the Sobel edge detector. As it can be seen edge maps are almost invariant to adversarial perturbation.

Figure 10: Illustration of adversarial perturbation over the image as well as its edge map. The first row in each panel shows the clean or adversarial image (under the FGSM attack). The second row shows the perturbed edge map (i.e., the edge channel of the the 2D or 4D adversarial input). The third row shows the redetected edge map from the attacked gray or rgb image (i.e., calculated only from the image channels and excluding the edge map itself).

Figure 11: Samples images from Sketch and Icons-50 datasets, perturbed with FGSM = 8/255, and their corresponding edge maps using Canny edge detection.

Figure 12: Top) Adversarial example generated for the giant panda image using the FGSM attack (Goodfellow et al., 2015). Bottom) Adversarial examples generated for AlexNet from Szegedy et al. (2014). (Left) is a correctly predicted sample, (center) difference between correct image, and image predicted incorrectly magnified by 10x (values shifted by 128 and clamped), (right) adversarial example (i.e., left image + middle image). Even though the left and right images appear visually the same to humans, the left images are correctly classified by a DNN classifier while the right images are misclassified as "ostrich, Struthio camelus". Notice that in all of these images the overall image structure and edges are preserved.

Figure 13: A) Classification of a standard ResNet-50 of (a) a texture image (elephant skin: only texture cues); (b) a normal image of a cat (with both shape and texture cues), and (c) an image with a texture-shape cue conflict, generated by style transfer between the first two images, B) Accuracy and example stimuli for five different experiments without cue conflict, and C) Sample images from the Stylized-ImageNet (SIN) dataset created by applying AdaIN style transfer to an ImageNet image (left).Figure compiled from Geirhos et al. (2018).

Figure 15: Classification results based on shape vs. texture. The left-most column shows the image presented to a model. The second column in each row names the object from which the shape was sampled. The third column names the object from which the textured silhouette was obtained. Probabilities assigned to the object name in columns 2 and 3 are shown as percents below the object label. The remaining five columns show the probabilities (as percents) produced by the network for its top five classifications, ordered left to right in terms of probability. Correct shape classifications in the top five are shaded in blue and correct texture classifications are shaded in orange. Figure from Baker et al. (2018).

Figure 16: Comparison of natural (the Orig. column in the tables; solid curves) vs. adversarial training (blue dashed-dot curves). The accuracy at = 0 (for adversarial training) is averaged over different robust models (three over MNIST and two over others; corresponding to clean columns in tables). Left column) Average over MNIST and Fashion MNIST datasets, Right column) Average over all datasets. Results show a clear advantage of using edges. Over MNIST and FashionMNIST, the model trained on edges alone leads to a trade-off between accuracy and robustness. Img+edge model does worse than the Image model but its performance is recovered after adversarial training.Img2Edge model wins over models using natural training. Please see also tables in the main text and Appx. B and the explanation in the main text. Overall, incorporating edge and image together and redetection at inference times leads to higher accuracy and robustness.

Figure 17: Breakdown of natural training (the Orig. row in Tables) over datasets.

Figure 18: Top) GSD with a classifier trained on images generated (by pix2pix) only from the edge maps of the clean images, Bottom) GSD with edge maps derived from adversarial examples. Columns from left to right: adversarial images by the FGSM attack, their edge maps, and generated images by pix2pix.

Figure 19: Sample images alongside their corruptions with 5 severity levels.

Figure 20: Edges for images in Fig. 19.

Figure 21: Performance of models against natural image corruptions over the TinyImageNet dataset. Robust models are trained against FGSM attack.

Figure 22: Performance of models against natural image corruptions over the GTSRB dataset.

Figure 23: Performance of models against natural image corruptions over the Icons-50 dataset.

Figure 27: Similar to Fig. 5 in the main text with the difference that here the noise model is trained over the noisy data. Removing the perturbations on the image background (via background subtraction) improves the robustness.

Orig. model 0/clean adv. (FGSM) adv. (PGD-5) Img2Edge (Img -→ HED -→ Classifier HED ) 0.793 0.052 0.003 Img2Edge (Img adv-HED -→ Canny -→ Classifier Canny )

Figure 30: Two sample adversarial images (FGSM) along with their edge maps using HED and Canny edge detection methods.

Figure 31: Top: our pipeline to approximate the Canny edge detector and our approach for crafting adversarial examples, Bottom: Sample digits and their generated edge maps.

Algorithm 1 Edge-guided adversarial training (EAT) for T epochs, perturbation budget , and loss balance ratio α, over a dataset of size M for a network f θ (performed in minibatches in practice). β ∈ {edge, img, imgedge} indicates network type and redetect train means edge redetection during training.

i=1

Each of these datasets contains 60K training images (resolution 28×28) and 6K test images over 10 classes. The third dataset, DogVsCat 2 contains 18,085 training and 8,204 test images. Images in this dataset are of varying dimensions. They are resized here to 150×150 pixels to save computation. A CNN with 4 convolution, 4 pooling, and 2 fc layers is trained from scratch.

Results (Top-1 acc) over MNIST. The best accuracy in each column is highlighted in bold. In italics are the results of the substitute attack. Epsilon values are over 255. We used the ∞ variants of FGSM and PGD. Img2Edge means applying the Edge model (first row) to the edge map of the image.

Results over the CIFAR-10 dataset.

Performance of edge-augmented FastAT and FreeAT adversarial defenses over clean and perturbed images (See Appx. G for extended algorithms). FastAT is trained with the FGSM adversary ( = 0.1 or = 0.3) over MNIST and FashionMNIST datasets, and = 8/255 over CIFAR-10). FreeAT is trained over CIFAR-10 with = 8/255 and 8 minibatch replays. CIFAR-10 results are averaged over 3 runs (Appx. G). PGD attacks use 10 random restarts. The remaining settings and parameters are as inWong et al. (2020).

Results over the Fashion MNIST dataset (*)

Results over the TinyImageNet dataset (*)

Results on CIFAR-10 dataset [edge map computed from 4 channels]

Results on CIFAR dataset using Sobel edge detection [edge map computed from 4 channels]

Results on DogVsCat dataset [edge map computed from 4 channels] (*)

Results on DogBreeds dataset using Sobel edge detection [edge map computed from 4 channels] (*)

Results on DogBreeds dataset using Sobel edge detection [edge map computed from 3 channels]

Results on GTSRB dataset [edge map computed from 4 channels] (*)

Results on GTSRB dataset [edge map computed from 3 channels]

Results on Icons-50 dataset [edge map computed from 4 channels] (*)

Results on Icons-50 dataset [edge map computed from 3 channels]

Results on Sketch dataset [edge map computed from 2 channels] (*)

Results on Sketch dataset [edge map computed from 1 channel]

Results on Imagenette2-160 dataset [edge map computed from 4 channels] (*)

Results on Imagenette2-160 dataset [edge map computed from 3 channels]

Results (Top-1 acc.) over MNIST corresponding to α = 0 (i.e., adversarial training only on adversarial examples taking part in the loss function). See also Table 1 in the main text.

Results (Top-1 acc.) over Fashion MNIST corresponding to α = 0 (i.e., adversarial training only on adversarial examples taking part in the loss function). See alsoTable 4 in the main text.

Masking channels over MNIST dataset

Masking channels over Fashion MNIST dataset

recompute and replace the edge map end if θ = θ -∇ θ (f θ (x i ), y i ) // Update model weights with some optimizer, e.g. SGD end for end for Performance of the Fast Adversarial Training (FastAT) method over three runs.

Performance of the Free Adversarial Training (FreeAT) method over three runs.

We use the PyTorch implementation 6 of the HED edge detector proposed byXie & Tu (2015). Here, a classifier is first trained on top of the edge maps from the HED. Then, the entire pipeline (Img -→ HED -→ Classifier HED ) is attacked to generate an adversarial image. The performance of this classifier is measured on both clean and adversarial images. The adversarial image is also fed to the classifier trained on Canny edge maps (Img adv-HED -→ Canny -→ Classifier Canny ). Results are shown in Tablebelow. As it can be seen, adversarial examples crafted for HED fail to completely fool the model trained on Canny edges (i.e., they do not transfer). Results over 500 images from the Imagenette2-160 dataset against the FGSM and PGD-5 ( = 8/255) attacks.

E RESULTS OF THE SUBSTITUTE MODEL ATTACK

The FGSM attack is used here. We find that background subtraction together with edge detection improves robustness.

J ROBUSTNESS AGAINST THE CW ATTACK OVER MNIST DATASET

Performance of the the EAT defense against the l 2 Carlini-Wagner attack (Carlini & Wagner, 2017) with the following parameters: 

