

Abstract

Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i.e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly. Recent research suggests that the noise in adversarial examples breaks the textural structure, which eventually leads to wrong predictions. To mitigate the threat of such adversarial attacks, we propose defective convolutional networks that make predictions rely less on textural information but more on shape information by properly integrating defective convolutional layers into standard CNNs. The defective convolutional layers contain defective neurons whose activations are set to be a constant function. As defective neurons contain no information and are far different from standard neurons in its spatial neighborhood, the textural features cannot be accurately extracted, and so the model has to seek other features for classification, such as the shape. We show extensive evidence to justify our proposal and demonstrate that defective CNNs can defend against black-box attacks better than standard CNNs. In particular, they achieve state-of-the-art performance against transfer-based attacks without any adversarial training being applied.

1. INTRODUCTION

Deep learning (LeCun et al., 1998; 2015) , especially deep Convolutional Neural Network (CNN) (Krizhevsky et al., 2012) , has led to state-of-the-art results spanning many machine learning fields (Girshick, 2015; Chen et al., 2018; Luo et al., 2020) . Despite the great success in numerous applications, recent studies show that deep CNNs are vulnerable to some well-designed input samples named as Adversarial Examples (Szegedy et al., 2013; Biggio et al., 2013) . Take the task of image classification as an example, for almost every commonly used well-performed CNN, attackers are able to construct a small perturbation on an input image, which is almost imperceptible to humans but can make the model give a wrong prediction. The problem is serious as some well-designed adversarial examples can be transferred among different kinds of CNN architectures (Papernot et al., 2016b) . As a result, a machine learning system can be easily attacked even if the attacker does not have access to the model parameters, which seriously affect its use in practical applications. There is a rapidly growing body of work on how to obtain a robust CNN, mainly based on adversarial training (Szegedy et al., 2013; Goodfellow et al., 2015; Madry et al., 2017; Buckman et al., 2018; Mao et al., 2019) . However, those methods need lots of extra computation to obtain adversarial examples at each time step and tend to overfit the attacking method used in training (Buckman et al., 2018) . In this paper, we tackle the problem in a perspective different from most existing methods. In particular, we explore the possibility of designing new CNN architectures which can be trained using standard optimization methods on standard benchmark datasets and can enjoy robustness by themselves, without appealing to other techniques. Recent studies (Geirhos et al., 2017; 2018; Baker et al., 2018; Brendel & Bethge, 2019) show that the predictions of standard CNNs mainly depend on the texture of objects. However, the textural information has a high degree of redundancy and may be easily injected with adversarial noise (Yang et al., 2019; Hosseini et al., 2019) . Also, Cao et al. (2020); Das et al. (2020) finds adversarial attack methods may perturb local patches to contain textural features of incorrect classes. All the literature suggests that the wrong prediction by CNNs for adversarial examples mainly comes from the change in the textural information. The small perturbation of adversarial examples will change the textures and eventually affect the features extracted by the CNNs. Therefore, a natural way to avoid adversarial examples is to let the CNN make predictions relying less on textures but more on other information, such as the shape, which cannot be severely distorted by small perturbations.

