IMPROVING LOCAL EFFECTIVENESS FOR GLOBAL ROBUST TRAINING Anonymous

Abstract

Despite its popularity, deep neural networks are easily fooled. To alleviate this deficiency, researchers are actively developing new training strategies, which encourage models that are robust to small input perturbations. Several successful robust training methods have been proposed. However, many of them rely on strong adversaries, which can be prohibitively expensive to generate when the input dimension is high and the model structure is complicated. We adopt a new perspective on robustness and propose a novel training algorithm that allows a more effective use of adversaries. Our method improves the model robustness at each local ball centered around an adversary and then, by combining these local balls through a global term, achieves overall robustness. We demonstrate that, by maximizing the use of adversaries via focusing on local balls, we achieve high robust accuracy with weak adversaries. Specifically, our method reaches a similar robust accuracy level to the state of the art approaches trained on strong adversaries on MNIST, CIFAR-10 and CIFAR-100. As a result, the overall training time is reduced. Furthermore, when trained with strong adversaries, our method matches with the current state of the art on MNIST and outperforms them on CIFAR-10 and CIFAR-100.

1. INTRODUCTION

With the proliferation of deep neural networks (DNN) in areas including computer vision, natural language processing and speech recognition, there has been a growing concern over their safety. For example, Szegedy et al. (2013) demonstrated that naturally trained DNNs are in fact fragile. By adding to each data a perturbation that is carefully designed but imperceptible to humans, DNNs previously reaching almost 100% accuracy performance could hardly make a correct prediction any more. This could cause serious issues in areas such as autonomous navigation or personalised medicine, where an incorrect decision can endanger life. To tackle these issues, training DNNs that are robust to small perturbations has become an active area of research in machine learning. Various algorithms have been proposed (Papernot et al., 2016; Kannan et al., 2018; Zhang et al., 2019b; Qin et al., 2019; Moosavi-Dezfooli et al., 2020; Madry et al., 2018; Ding et al., 2020) . Among them, adversarial training (ADV) (Madry et al., 2018) and TRADES (Zhang et al., 2019b) are two of the most frequently used training methods so far. Although developed upon different ideas, both methods require using strong adversarial attacks, generally computed through several steps of projected gradient descent. Such attacks can quickly become prohibitive when model complexity and input dimensions increase, thereby limiting their applicability. Since the cost of finding strong adversaries is mainly due to the high number of gradient steps performed, one potential approach to alleviate the problem is to use cheap but weak adversaries. Weak adversaries are obtained using fewer gradient steps, and in the extreme case with a single gradient step. Based on this idea, Wong et al. (2020) argue that by using random initialization and a larger step-size, adversarial training with weak adversaries found via one gradient step is sufficient to achieve a satisfactory level of robustness. We term this method as one-step ADV from now on. While one-step ADV does indeed exhibit robustness, there is still a noticeable gap when compared with its multi-step counterpart. In this paper, we further bridge the gap by proposing a new robust training algorithm: Adversarial Training via LocAl Stability (ATLAS). Local stability, in our context, implies stability of prediction and is the same as local robustness. Specifically, we make the following contributions: • We adopt a new perspective on robust accuracy and introduce a framework for constructing robust training losses that allow more effective use of adversaries. The framework consists of a local component and a global component. The local component maximizes the effectiveness of an given adversary by improving the network's robustness on both the adversary and points around it. In other words, the local component attempts to increase the radius of a ball centered at the adversary on which the network is being robust. The global component combines all local balls in a regularized way to achieve the desired overall robust performance. • Based on the framework and guided by the need of fast robust training, we propose our novel robust training algorithm ATLAS. • We show that ATLAS makes a more effective use of weak adversaries by favourably comparing it against one-step ADV on three datasets: MNIST, CIFAR-10 and CIFAR-100. • Although one-step ATLAS is more expensive than its other one-step counterparts, ATLAS still allows efficient robust training. We show that, with a one-step weak adversary, ATLAS manages to achieve comparable levels of robust accuracy to multi-step state of the art methods on all datasets. • Finally, we show that when strong adversaries are used, ATLAS matches with the current state of the art on MNIST and outperforms them on CIFAR-10 and CIFAR-100.

2. RELATED WORKS

Robust training aims to learn a network such that it is able to give the same correct output even when the input is slightly perturbed. Existing robust training algorithms can be divided into two categories: natural image based methods and adversaries based methods. Within the first category, the common form of loss is a natural loss term plus a regularizer computed at natural images. We briefly mention some of these methods. Moosavi-Dezfooli et al. (2020) ) , in which adversaries are used to maximize the penalty for non-linearity. Applying LLR also allows efficient robust training. We note that the underlying idea of LLR is complementary to ATLAS. In addition, several works have suggested to adopt input dependent treatments. These works include incorporating the fact of whether the given input is correctly classified (Wang et al., 2020) and using adaptive perturbation for different inputs (Ding et al., 2019) . One major drawback of adversary based methods (Madry et al., 2018; Zhang et al., 2019b; Qin et al., 2019; Wang et al., 2020; Ding et al., 2019) is that most of them rely on strong adversaries, computed via expensive PGD. When the input dimension is high and the model structure is complicated, finding adversaries can be too expensive for these methods to work effectively. Several works have researched possible ways to speed up the process. Algorithmically, Zhang et al. (2019a) cut down the total



observed empirically that reducing the curvature of the loss function and the decision boundary could lead to robust models. The authors thus propose a regularizer based on the Hessian of the loss function. Closely related is the regularizer, introduced in (Jakubovitz &Giryes, 2018; Hoffman et al., 2019; Ross & Doshi-Velez,  2018), that penalizes the Frobenius norm of the Jacobian of the loss function. Jacobian regularizer can also be seen as a way of reducing the curvature of the decision boundary(Jakubovitz & Giryes,  2018). Although calculating the norm is computationally expensive, a fast estimator with empirically high accuracy has been developed inHoffman et al. (2019).We focus on adversary based robust training, as they generally perform better in terms of robust accuracy. Under this category, an early fundamental work is the Fast Gradient Sign Method (FGSM) byGoodfellow et al. (2015). Adversarial Training (ADV)(Madry et al., 2018) is a multi-step variant of FGSM. Rather than using one step FGSM, ADV employs multi-step projected gradient descent (PGD)(Kurakin et al., 2016)  with smaller step-sizes to generate perturbed inputs. These modifications have allowed ADV to become one of the most effective robust training methods so far(Athalye  et al., 2018). Another frequently used robust training method is TRADES(Zhang et al., 2019b). Motivated by its theoretical analyses of the trade-off between natural accuracy and robust accuracy in a binary classification example, TRADES encourages model robustness by adding to the natural loss a regularizer involving adversaries to push away the decision boundary.

