TOWARDS NATURAL ROBUSTNESS AGAINST ADVERSARIAL EXAMPLES

Abstract

Recent studies have shown that deep neural networks are vulnerable to adversarial examples, but most of the methods proposed to defense adversarial examples cannot solve this problem fundamentally. In this paper, we theoretically prove that there is an upper bound for neural networks with identity mappings to constrain the error caused by adversarial noises. However, in actual computations, this kind of neural network no longer holds any upper bound and is therefore susceptible to adversarial examples. Following similar procedures, we explain why adversarial examples can fool other deep neural networks with skip connections. Furthermore, we demonstrate that a new family of deep neural networks called Neural ODEs (Chen et al., 2018) holds a weaker upper bound. This weaker upper bound prevents the amount of change in the result from being too large. Thus, Neural ODEs have natural robustness against adversarial examples. We evaluate the performance of Neural ODEs compared with ResNet under three white-box adversarial attacks (FGSM, PGD, DI 2 -FGSM) and one black-box adversarial attack (Boundary Attack). Finally, we show that the natural robustness of Neural ODEs is even better than the robustness of neural networks that are trained with adversarial training methods, such as TRADES and YOPO.

1. INTRODUCTION

Deep neural networks have made great progress in numerous domains of machine learning, especially in computer vision. But Szegedy et al. (2013) found that most of the existing state-of-theart neural networks are easily fooled by adversarial examples that generated by putting only very small perturbations to the input images. Since realizing the unstability of deep neural networks, researchers have proposed different kinds of methods to defense adversarial examples, such as adversarial training (Goodfellow et al., 2014 ), data compression (Dziugaite et al., 2016) , and distillation defense (Papernot et al., 2016) . But each of these methods is a remedy for the original problem, and none of these methods can solve it fundamentally. For example, Moosavi-Dezfooli et al. (2016) Early explanations for adversarial examples considered that a smoothness prior is typically valid for kernel methods that imperceptibly tiny perturbations of a given image do not normally change the underlying class, while the smoothness assumption does not hold for deep neural networks due to its high non-linearity (Szegedy et al., 2013) . This analysis underlies plain deep neural networks like AlexNet (Krizhevsky et al., 2012) . But later than that, Goodfellow et al. (2014) claim adversarial examples are a result of models being too linear rather than too non-linear, they can be explained as a property of high-dimensional dot products. Unfortunately, both of these explanations seem to imply that adversarial examples are inevitable for deep neural networks. On the other hand, we notice that skip connections are widely used in current deep neural networks after the appearance of Highway Network (Srivastava et al., 2015) and ResNet (He et al., 2016) . It turns out that the identity mapping in ResNet is formally equivalent to one step of Euler's method which has been used to solve ordinary differential equations (Weinan, 2017) . More than that, other kinds of skip connections used by different network architectures can be considered as different numerical methods for solving ordinary differential equations. The link between numerical ordinary differential equations with deep neural networks can bring us a whole new perspective to explain adversarial examples through the numerical stability analysis. In this paper, we attempt to utilize the natural property of neural networks to defense adversarial examples. We first analyze how adversarial examples affect the output of neural networks with identity mappings, obtain an upper bound for this kind of neural networks, and find that this upper bound is impractical in actual computations. In the same way, we figure out why adversarial examples can fool commonly used deep neural networks with skip connections. Then, we demonstrate that Neural ODEs hold a weaker upper bound and verify the natural robustness of Neural ODEs under four types of perturbations. Finally, we compare Neural ODEs with three types of adversarial training methods to show that the natural robustness of Neural ODEs is better than the robustness of neural networks that are trained with adversarial training. The main contributions of our work are as follows: • We introduce and formalize the numerical stability analysis for deep neural networks with identity mappings, prove that there is an upper bound for neural networks with identity mappings to constrain the error caused by adversarial noises. 



showed that no matter how much adversarial examples are added to training sets, there are new adversarial examples that can successfully attack the adversarial trained deep neural network. So, avoiding adversarial examples technically cannot solve the most essential problem: why such subtle change in adversarial examples can beat deep neural networks? Meanwhile, it leads to a more important question: how to make deep neural networks have natural robustness so that they can get rid of malicious adversarial examples.

We provide a new reason why commonly used deep neural networks with skip connections cannot resist adversarial examples. • We demonstrate that Neural ODEs hold a weaker upper bound which limits the amount of change in the result from being too large. Compare with ResNet and three types of adversarial training methods, we show the natural robustness of Neural ODEs. Madry et al. (2017) take the optimization as a saddle point problem. Zhang et al. (2019a) cast adversarial training as a discrete time differential game. Adversarial training can be seen as a data augmentation particularly enhance the robustness to white-box attacks (Tramèr et al., 2017). Zantedeschi et al. (2017) augmented the training sets with examples perturbed using Gaussian noises which can also enhance the robustness to black-box attacks. Lee et al. (2017) proposed a novel adversarial training method using a generative adversarial network framework. Besides, Finlay et al. (2018) augmented adversarial training with worst case adversarial training which improves adversarial robustness in the 2 norm on CIFAR10.In addition to augmenting datasets or modifying original neural networks, there exist adversarial defense methods that rely on using external models and detecting adversarial examples.Akhtar  et al. (2018)  presented Perturbation Rectifying Network (PRN) as 'pre-input' layers to a targeted model, if a perturbation is detected, the output of the PRN is used for label prediction instead of the actual image.Xu et al. (2017)  proposed a strategy called feature squeezing to reduce the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample.

