TOWARDS NATURAL ROBUSTNESS AGAINST ADVERSARIAL EXAMPLES

Abstract

Recent studies have shown that deep neural networks are vulnerable to adversarial examples, but most of the methods proposed to defense adversarial examples cannot solve this problem fundamentally. In this paper, we theoretically prove that there is an upper bound for neural networks with identity mappings to constrain the error caused by adversarial noises. However, in actual computations, this kind of neural network no longer holds any upper bound and is therefore susceptible to adversarial examples. Following similar procedures, we explain why adversarial examples can fool other deep neural networks with skip connections. Furthermore, we demonstrate that a new family of deep neural networks called Neural ODEs (Chen et al., 2018) holds a weaker upper bound. This weaker upper bound prevents the amount of change in the result from being too large. Thus, Neural ODEs have natural robustness against adversarial examples. We evaluate the performance of Neural ODEs compared with ResNet under three white-box adversarial attacks (FGSM, PGD, DI 2 -FGSM) and one black-box adversarial attack (Boundary Attack). Finally, we show that the natural robustness of Neural ODEs is even better than the robustness of neural networks that are trained with adversarial training methods, such as TRADES and YOPO.

1. INTRODUCTION

Deep neural networks have made great progress in numerous domains of machine learning, especially in computer vision. But Szegedy et al. (2013) found that most of the existing state-of-theart neural networks are easily fooled by adversarial examples that generated by putting only very small perturbations to the input images. Since realizing the unstability of deep neural networks, researchers have proposed different kinds of methods to defense adversarial examples, such as adversarial training (Goodfellow et al., 2014 ), data compression (Dziugaite et al., 2016) , and distillation defense (Papernot et al., 2016) . But each of these methods is a remedy for the original problem, and none of these methods can solve it fundamentally. For example, Moosavi-Dezfooli et al. ( 2016 Early explanations for adversarial examples considered that a smoothness prior is typically valid for kernel methods that imperceptibly tiny perturbations of a given image do not normally change the underlying class, while the smoothness assumption does not hold for deep neural networks due to its high non-linearity (Szegedy et al., 2013) . This analysis underlies plain deep neural networks like AlexNet (Krizhevsky et al., 2012) . But later than that, Goodfellow et al. (2014) claim adversarial examples are a result of models being too linear rather than too non-linear, they can be explained as a property of high-dimensional dot products. Unfortunately, both of these explanations seem to imply that adversarial examples are inevitable for deep neural networks. On the other hand, we notice that skip connections are widely used in current deep neural networks after the appearance of Highway Network (Srivastava et al., 2015) and ResNet (He et al., 2016) . It turns out that the identity mapping in ResNet is formally equivalent to one step of Euler's method 1



) showed that no matter how much adversarial examples are added to training sets, there are new adversarial examples that can successfully attack the adversarial trained deep neural network. So, avoiding adversarial examples technically cannot solve the most essential problem: why such subtle change in adversarial examples can beat deep neural networks? Meanwhile, it leads to a more important question: how to make deep neural networks have natural robustness so that they can get rid of malicious adversarial examples.

