EXPLORING AND EXPLOITING DECISION BOUNDARY DYNAMICS FOR ADVERSARIAL ROBUSTNESS

Abstract

The robustness of a deep classifier can be characterized by its margins: the decision boundary's distances to natural data points. However, it is unclear whether existing robust training methods effectively increase the margin for each vulnerable point during training. To understand this, we propose a continuous-time framework for quantifying the relative speed of the decision boundary with respect to each individual point. Through visualizing the moving speed of the decision boundary under Adversarial Training, one of the most effective robust training algorithms, a surprising moving-behavior is revealed: the decision boundary moves away from some vulnerable points but simultaneously moves closer to others, decreasing their margins. To alleviate these conflicting dynamics of the decision boundary, we propose Dynamics-Aware Robust Training (DyART), which encourages the decision boundary to engage in movement that prioritizes increasing smaller margins. In contrast to prior works, DyART directly operates on the margins rather than their indirect approximations, allowing for more targeted and effective robustness improvement. Experiments on the CIFAR-10 and Tiny-ImageNet datasets verify that DyART alleviates the conflicting dynamics of the decision boundary and obtains improved robustness under various perturbation sizes compared to the state-of-the-art defenses.

1. INTRODUCTION

Deep neural networks have exhibited impressive performance in a wide range of applications (Krizhevsky et al., 2012; Goodfellow et al., 2014; He et al., 2016a) . However, they have also been shown to be susceptible to adversarial examples, leading to issues in security-critical applications such as autonomous driving and medicine (Szegedy et al., 2013; Nguyen et al., 2015) . To alleviate this problem, adversarial training (AT) (Madry et al., 2017; Shafahi et al., 2019; Zhang et al., 2019; Gowal et al., 2020) was proposed and is one of the most prevalent methods against adversarial attacks. Specifically, AT aims to find the worst-case adversarial examples based on some surrogate loss and adds them to the training dataset in order to improve robustness. Despite the success of AT, it has been shown that over-parameterized neural networks still have insufficient model capacity for fitting adversarial training data, partly because AT does not consider the vulnerability difference among data points (Zhang et al., 2021) . The vulnerability of a data point can be measured by its margin: its distance to the decision boundary. As depicted in Figure 1a , some data points have smaller margins and are thus more vulnerable to attacks. Since AT does not directly operate on the margins and it uses a pre-defined perturbation bound for all data points regardless of their vulnerability difference, it is unclear whether the learning algorithm can effectively increase the margin for each vulnerable point. Geometrically, we would like to know if the decision boundary moves away from the data points, especially the vulnerable ones. As illustrated in Figure 1b , there can exist conflicting dynamics of the decision boundary: it moves away from some vulnerable points but simultaneously moves closer to other vulnerable ones during training. This motivates us to ask: Question 1 Given a training algorithm, how can we analyze the dynamics of the decision boundary with respect to the data points?

annex

Published as a conference paper at ICLR 2023 (a) The decision boundary, vulnerable (solid) and robust (hollow) points.StepStep

Less Robust

More Robust (b) An update with conflicting impacts on robustness.

Continuous movement

Step (c) Continuous movement of the decision boundary.Figure 1 : The movement of the decision boundary. Red triangles and green circles are data points from two classes. Figure 1a shows the vulnerability difference among the data points: some are closer to the decision boundary, whereas others are farther from it. In Figure 1b , the decision boundary after an update moves away from some vulnerable points (made more robust) but simultaneously moves closer to other vulnerable ones (made less robust). Figure 1c describes the continuous movement of the decision boundary in Figure 1b .To answer the above question, we propose a continuous-time framework that quantifies the instantaneous movement of the decision boundary as shown in Figure 1c . Specifically, we define the relative speed of the decision boundary w.r.t. a point to be the time derivative of its margin, which can be interpreted as the speed of its closest adversarial example moving away from it. We show that the speed can be derived from the training algorithm using a closed-form expression.Using the proposed framework, we empirically compute the speed of the decision boundary w.r.t. data points for AT. As will be shown in Figure 3 , the aforementioned conflicting dynamics of the decision boundary (Figure 1b, 1c ) is revealed: the decision boundary moves towards many vulnerable points during training and decrease their margins, directly counteracting the objective of robust training. The desirable dynamics of the decision boundary, on the other hand, should increase the margins of all vulnerable points. This leads to another question:Question 2 How to design algorithms that encourage the decision boundary to engage in movements that increase margins for vulnerable points, and not decrease them?To this end, we propose Dynamics-Aware Robust Training (DyART), which prioritizes moving the decision boundary away from more vulnerable points and increasing their margins. Specifically, DyART directly operates on margins of training data and carefully designs its cost function on margins for more desirable dynamics. Note that directly optimizing margins in the input space is technically challenging since it was previously unclear how to compute the gradient of the margin. In this work, we derive the closed-form expression for the gradient of the margin and present an efficient algorithm to compute it, making gradient descent viable for DyART. In addition, since DyART directly operates on margins instead of using a pre-defined uniform perturbation bound for training as in AT, DyART is naturally robust for a wide range of perturbation sizes ϵ. Experimentally, we demonstrate that DyART mitigates the conflicting dynamics of the decision boundary and achieves improved robustness performance on diverse attacking budgets.

Summary of contributions. (1)

We propose a continuous-time framework to study the relative speed of the decision boundary w.r.t. each individual data point and provide a closed-form expression for the speed. (2) We visualize the speed of the decision boundary for AT and identify the conflicting dynamics of the decision boundary. (3) We present a close-form expression for the gradient of the margin, allowing for direct manipulation of the margin. (4) We introduce an efficient alternative to compute the margin gradient by replacing the margin with our proposed soft margin, a lower bound of the margin whose approximation gap is controllable. (5) We propose Dynamics-Aware Robust Training (DyART), which alleviates the conflicting dynamics by carefully designing a cost function on soft margins to prioritize increasing smaller margins. Experiments show that DyART obtains improved robustness over state-of-the-art defenses on various perturbation sizes.

2. RELATED WORK

Decision boundary analysis. Prior works on decision boundary of deep classifiers have studied the small margins in adversarial directions (Karimi et al., 2019) , the topology of classification regions

