FINDING ACTUAL DESCENT DIRECTIONS FOR ADVER-SARIAL TRAINING

Abstract

Adversarial Training using a strong first-order adversary (PGD) is the gold standard for training Deep Neural Networks that are robust to adversarial examples. We show that, contrary to the general understanding of the method, the gradient at an optimal adversarial example may increase, rather than decrease, the adversarially robust loss. This holds independently of the learning rate. More precisely, we provide a counterexample to a corollary of Danskin's Theorem presented in the seminal paper of Madry et al. ( 2018) which states that a solution of the inner maximization problem can yield a descent direction for the adversarially robust loss. Based on a correct interpretation of Danskin's Theorem, we propose Danskin's Descent Direction (DDi) and we verify experimentally that it provides better directions than those obtained by a PGD adversary. Using the CIFAR10 dataset we further provide a real world example showing that our method achieves a steeper increase in robustness levels in the early training stages of smooth-activation networks without BatchNorm, and is more stable than the PGD baseline. As a limitation, PGD training of ReLU+BatchNorm networks still performs better, but current theory is unable to explain this.

1. INTRODUCTION

Adversarial Training (AT) (Goodfellow et al., 2015; Madry et al., 2018) has become the de-facto algorithm used to train Neural Networks that are robust to adversarial examples (Szegedy et al., 2014) . Variations of AT together with data augmentation yield the best-performing models in public benchmarks (Croce et al., 2020) . Despite lacking optimality guarantees for the inner-maximization problem, the simplicity and performance of AT are enough reasons to embrace its heuristic nature. From an optimization perspective, the consensus is that AT is a sound algorithm: based on Danskin's Theorem, Madry et al. (2018, Corollary C.2 ) posit that by finding a maximizer of the inner non-concave maximization problem, i.e., an optimal adversarial example, one can obtain a descent direction for the adversarially robust loss. What if this is not true? are we potentially overlooking issues in its algorithmic framework? As mentioned in (Dong et al., 2020, Section 2.3), Corollary C.2 in Madry et al. ( 2018) can be considered the theoretical optimization foundation of the non-convex non-concave min-max optimization algorithms that we now collectively refer to as Adversarial Training. It justifies the two-stage structure of the training loop: first we find one approximately optimal adversarial example and then we update the model using the gradient (with respect to the model parameters) at the perturbed input. The only drawbacks of a first-order adversary seem to be its computational complexity and its approximate suboptimal solver nature. Ignoring the computational complexity issue, suppose we have access to a theoretical oracle that provides a single solution of the inner-maximization problem. In such idealized setting, can we safely assume AT is decreasing the adversarially robust loss on the data sample? According to the aforementioned theoretical results, it would appear so. In this work, we scrutinize the optimization paradigm on which Adversarial Training (AT) has been founded, and we posit that finding multiple solutions of the inner-maximization problem is necessary to find good descent directions of the adversarially robust loss. In doing so, we hope to improve our understanding of the non-convex/non-concave min-max optimization problem that underlies the Adversarial Training methodology, and potentially improving its performance. Our contributions: We present two counterexamples to Madry et al. (2018, Corollary C.2), the motivation behind AT. They show that using the gradient (with respect to the parameters of the model) evaluated at a single solution of the inner-maximization problem, can increase the robust loss, i.e., it can harm the robustness of the model. In particular, in counterexample 2 many descent directions exist, but they cannot be found if we only compute a single solution of the inner-maximization problem. In Section 2 we explain that the flaw in the proof is due to a misunderstanding of the directional derivative notion that is used in the original work of Danskin (1966) . Based on our findings, we propose Danskin's Descent Direction (DDi, Algorithm 1). It aims to overcome the problems of the single adversarial example paradigm of AT by exploiting multiple adversarial examples, obtaining better update directions for the network. For a data-label pair, DDi finds the steepest descent direction for the robust loss, assuming that (i) there exists a finite number of solutions of the inner-maximization problem and (ii) they can be found with first-order methods. In Section 5 we verify experimentally that: (i) it is unrealistic to assume a unique solution of the inner-maximization problem, hence making a case for our method DDi, (ii) our method can achieve more stable descent dynamics than the vanilla AT method in synthetic scenarions and (iii) on the CIFAR10 dataset DDi is more stable and achieves higher robustness levels in the early stages of traning, compared with a PGD adversary of equivalent complexity. This is observed in a setting where the conditions of Danskin's Theorem holds, i.e., using differentiable activation functions and removing BatchNorm. As a limitation, PGD training of ReLU+BatchNorm networks still performs better, but there is no theory explaining this. The code to reproduce our results will be available at https://github.com/LIONS-EPFL/ddi_at. Remark. The fact that (Madry et al., 2018, Corollary C.2) is false, might be well-known in the optimization field. In the convex setting it corresponds to the common knowledge that a negative subgradient of a non-smooth convex function might not be a descent direction c.f., (Boyd, 2014, Section 2.1). However, we believe this is not well-known in the AT community given that (i) its practical implications i.e., methods deriving steeper descent updates using multiple adversarial examples, have not been previously introduced, and (ii) the results in Madry et al. ( 2018) have been central in the development of AT. Hence, our contribution can be understood as raising awareness about the issue, and demonstrating its practical implications for AT. 



Figure 1: (a) and (b): comparison of our method (DDi) and the single-adversarial-example method (PGD) on a synthetic min-max problem. Using a single example may increase the robust loss. DDi computes 10 examples and can avoid this. (c): similar improvement over PGD training shown on CIFAR10, where DDi with 10 examples speeds up convergence. More details in Section 5

2 A COUNTEREXAMPLE TO MADRY ET AL. (2018, COROLLARY C.2) Preliminaries. Let θ ∈ R d be the parameters of a model, (x, y) ∼ D a data-label distribution, δ a perturbation in a compact set S 0 and L a loss function. The optimization objective of AT is: min θ ρ(θ), where ρ(θ) := E (x,y)∼D max δ∈S0 L(θ, x + δ, y)

