ROBUSTNESS EVALUATION USING LOCAL SUBSTITUTE NETWORKS

Abstract

Robustness of a neural network against adversarial examples is an important topic when a deep classifier is applied in safety critical use cases like health care or autonomous driving. In order to assess the robustness, practitioners use a range of different tools ranging from the adversarial attacks to exact computation of the distance to the decision boundary. We use the fact that robustness of a neural network is a local property and empirically show that computing the same metrics for the smaller local substitute networks yields good estimates of the robustness for lower cost. To construct the substitute network we develop two pruning techniques that preserve the local properties of the initial network around a given anchor point. Our experiments on MNIST dataset prove that this approach saves a significant amount of computing time and is especially beneficial for the larger models. (a) Global, before pruning (b) Local, before pruning (c) Global, 12.5% pruned (d) Local, 12.5% pruned (e) Global, 50% pruned (f) Local, 50% pruned (g) Global, 100% pruned (h) Local, 100% pruned



Figure 1 : A toy example in the two-dimensional setting. We show the boundaries of a classifier, before (1a and 1b) and after the pruning (1c -1h), when up to the 100% of the hidden neurons are removed. The sample we are interested in is marked by the black point and the square around it on the plots with the global view (1a, 1c, 1e and 1g) shows the local region that is depicted on the other four plots (1b, 1d, 1f and 1h). While the global behaviour changes a lot, the boundary around the chosen anchor point remains similar. Most importantly, the distance to the closest adversarial from the anchor point (shown by the black line on the plots 1b, 1d, 1f and 1h) does not change significantly. Note, that this applies even in the extreme case when we prune all the hidden layers and what remains is a linear classifier (1g and 1h). That means, in order to solve the complex task of finding the distance to the decision boundary for the initial model, we save cost by working with the simple local substitute and still get a good approximation of the exact solution.

