

Abstract

Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i.e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly. Recent research suggests that the noise in adversarial examples breaks the textural structure, which eventually leads to wrong predictions. To mitigate the threat of such adversarial attacks, we propose defective convolutional networks that make predictions rely less on textural information but more on shape information by properly integrating defective convolutional layers into standard CNNs. The defective convolutional layers contain defective neurons whose activations are set to be a constant function. As defective neurons contain no information and are far different from standard neurons in its spatial neighborhood, the textural features cannot be accurately extracted, and so the model has to seek other features for classification, such as the shape. We show extensive evidence to justify our proposal and demonstrate that defective CNNs can defend against black-box attacks better than standard CNNs. In particular, they achieve state-of-the-art performance against transfer-based attacks without any adversarial training being applied.

1. INTRODUCTION

Deep learning (LeCun et al., 1998; 2015) , especially deep Convolutional Neural Network (CNN) (Krizhevsky et al., 2012) , has led to state-of-the-art results spanning many machine learning fields (Girshick, 2015; Chen et al., 2018; Luo et al., 2020) . Despite the great success in numerous applications, recent studies show that deep CNNs are vulnerable to some well-designed input samples named as Adversarial Examples (Szegedy et al., 2013; Biggio et al., 2013) . Take the task of image classification as an example, for almost every commonly used well-performed CNN, attackers are able to construct a small perturbation on an input image, which is almost imperceptible to humans but can make the model give a wrong prediction. The problem is serious as some well-designed adversarial examples can be transferred among different kinds of CNN architectures (Papernot et al., 2016b) . As a result, a machine learning system can be easily attacked even if the attacker does not have access to the model parameters, which seriously affect its use in practical applications. There is a rapidly growing body of work on how to obtain a robust CNN, mainly based on adversarial training (Szegedy et al., 2013; Goodfellow et al., 2015; Madry et al., 2017; Buckman et al., 2018; Mao et al., 2019) . However, those methods need lots of extra computation to obtain adversarial examples at each time step and tend to overfit the attacking method used in training (Buckman et al., 2018) . In this paper, we tackle the problem in a perspective different from most existing methods. In particular, we explore the possibility of designing new CNN architectures which can be trained using standard optimization methods on standard benchmark datasets and can enjoy robustness by themselves, without appealing to other techniques. Recent studies (Geirhos et al., 2017; 2018; Baker et al., 2018; Brendel & Bethge, 2019) show that the predictions of standard CNNs mainly depend on the texture of objects. However, the textural information has a high degree of redundancy and may be easily injected with adversarial noise (Yang et al., 2019; Hosseini et al., 2019) . Also, Cao et al. (2020); Das et al. (2020) finds adversarial attack methods may perturb local patches to contain textural features of incorrect classes. All the literature suggests that the wrong prediction by CNNs for adversarial examples mainly comes from the change in the textural information. The small perturbation of adversarial examples will change the textures and eventually affect the features extracted by the CNNs. Therefore, a natural way to avoid adversarial examples is to let the CNN make predictions relying less on textures but more on other information, such as the shape, which cannot be severely distorted by small perturbations. In practice, sometimes a camera might have mechanical failures which cause the output image to have many defective pixels (such pixels are always black in all images). Nonetheless, humans can still recognize objects in the image with defective pixels since we are able to classify the objects even in the absence of local textural information. Motivated by this, we introduce the concept of defectiveness into the convolutional neural networks: we call a neuron a defective neuron if its output value is fixed to zero no matter what input signal is received; similary, a convolutional layer is a defective convolutional layer if it contains defective neurons. Before training, we replace the standard convolutional layers with the defective version on a standard CNN and train the network in the standard way. As defective neurons of the defective convolutional layer contain no information and are very different from their spatial neighbors, the textural information cannot be accurately extracted from the bottom defective layers to top layers. Therefore, we destroy local textural information to a certain extent and prompt the neural network to rely more on other information for classification. We call the architecture deployed with defective convolutional layers as defective convolutional network. We find that applying the defective convolutional layers to the bottomfoot_0 layers of the network and introducing various patterns for defective neurons arrangement across channels are critical. In summary, our main contributions are: • We propose Defective CNNs and four empirical evidences to justify that, compared to standard CNNs, the defective ones rely less on textures and more on shapes of the inputs for making predictions. • Experiments show that Defective CNNs has superior defense performance than standard CNNs against transfer-based attacks, decision-based attacks, and additive Gaussian noise. • Using the standard training method, Defective CNN achieves state-of-the-art results against two transfer-based black-box attacks while maintaining high accuracy on clean test data. • Through proper implementation, Defective CNNs can save a lot of computation and storage costs; thus may lead to a practical solution in the real world.

2. RELATED WORK

Various methods have been proposed to defend against adversarial examples. One line of research is to derive a meaningful optimization objective and optimize the model by adversarial training (Szegedy et al., 2013; Goodfellow et al., 2015; Huang et al., 2015; Madry et al., 2017; Buckman et al., 2018; Mao et al., 2019) . The high-level idea of these works is that if we can predict the potential attack to the model during optimization, then we can give the attacked sample a correct signal and use it during training. Another line of research is to take an adjustment to the input image before letting it go through the deep neural network (Liao et al., 2017; Song et al., 2017; Samangouei et al., 2018; Sun et al., 2018; Xie et al., 2019; Yuan & He, 2020) . The basic intuition behind this is that if we can clean the adversarial attack to a certain extent, then such attacks can be defended. Although these methods achieve some success, a major difficulty is that it needs a large extra cost to collect adversarial examples and hard to apply on large-scale datasets. Several studies (Geirhos et al., 2017; 2018; Baker et al., 2018; Brendel & Bethge, 2019) show that the prediction of CNNs is mainly from the texture of objects but not the shape. Also, Cao et al. (2020); Das et al. ( 2020) found that adversarial examples usually perturb a patch of the original image to contain the textural feature of incorrect classes. For example, the adversarial example of the panda image is misclassified as a monkey because a patch of the panda skin is perturbed adversarially so that it alone looks like the face of a monkey (see Figure 11 in (Cao et al., 2020) ). All previous works above suggest that the CNN learns textural information more than shape and the adversarial attack might come from textural-level perturbations. This is also correlated with robust features (Tsipras et al., 2018; Ilyas et al., 2019; Hosseini et al., 2019; Yang et al., 2019) which has attracted more interest recently. Pixels which encode textural information contain high redundancy and may be easily deteriorated to the distribution of incorrect classes. However, shape information is more compact and thus may serve as a more robust feature for predicting.



In this paper, bottom layer means the layer close to the input and top layer means the layer close to the output prediction.

