SELF-ADAPTIVE PERTURBATION RADII FOR ADVERSARIAL TRAINING

Abstract

Adversarial training has been shown to be the most popular and effective technique to protect models from imperceptible adversarial samples. Despite its success, it also accompanies the significant performance degeneration to clean data. To achieve a good performance on both clean and adversarial samples, the main effort is searching for an adaptive perturbation radius for each training sample, which essentially suffers from a conflict between exact searching and computational overhead. To address this conflict, in this paper, firstly we show the superiority of adaptive perturbation radii intuitively and theoretically regarding the accuracy and robustness respectively. Then we propose our novel self-adaptive adjustment framework for perturbation radii without tedious searching. We also discuss this framework on both deep neural networks (DNNs) and kernel support vector machines (SVMs). Finally, extensive experimental results show that our framework can improve not only natural generalization performance but also adversarial robustness. It is also competitive with existing searching strategies in terms of running time.

1. INTRODUCTION

The security of machine learning models has long been questioned since most models are vulnerable to perturbations (Papernot et al., 2016) . Extremely tiny perturbations may be imperceptible to human beings but yet cause poor performance of models such as deep neural networks (DNNs) (Goodfellow et al., 2014; Madry et al., 2017; Papernot et al., 2017) , support vector machines (SVMs) (Xiao et al., 2012; Biggio et al., 2012; 2014) and logistic regression (LR) (Papernot et al., 2016) . Examples attacked by such perturbations are generally called as adversarial examples. To learn robust models, adversarial training has now become one of the most effective and widelyused methods, especially on DNNs and SVMs (Zhou et al., 2012; Kurakin et al., 2017; Miyato et al., 2018; Wang et al., 2019; Shafahi et al., 2019; Wu et al., 2021) . However, the success of adversarial training comes at a cost (Tsipras et al., 2018; Zhang et al., 2019) . Specifically, as stated in (Tsipras et al., 2018) , robustness may be at odds with accuracy, which means models after adversarial training may fail to generalize well on unperturbed examples. It is generally believed that this phenomenon is due to the fixed strength of attack throughout the training process, which ignores the fact that every example may have different intrinsic robustness (Cheng et al., 2020; Zhang et al., 2020) . Naturally, the main effort to mitigate this issue is to find the suitable perturbation radius ϵ i for each training sample with explicit or implicit searching strategies. For explicit searching strategies, IAAT (Balaji et al., 2019) uses the brute-force search to find the suitable perturbation radii. MMA (Ding et al., 2018) aims to find the optimal ϵ * i via the bisection search. For implicit searching strategy, Zhang et al. propose an early-stopped PGD strategy called FAT, which adjusts ϵ implicitly in essence. Although FAT skillfully skips the step of searching ϵ * i , it is sensitive to hyperparameters such as steps of PGD attack τ and the uniform perturbation radius ϵ. Thus, in this paper, we mainly focus on explicit searching strategies. We also give a brief review of the above algorithms in Table 1 . From this table, we can see that these searching strategies essentially have an inherent conflict between exact searching and time complexity. To solve this conflict, in this paper, we propose a novel self-adaptive adjustment framework (SAAT) for perturbation radii. It achieves a better trade-off between natural generalization performance and adversarial robustness without much computational overhead. Firstly, for the adaptive perturbation radius, we intuitively show its superiority in generalization and theoretically illustrate its strength in robustness. Then we design a new learning objective that can construct a self-adaptive perturbation radius for each sample inspired by self-paced learning (SPL) (Jiang et al., 2015) . We discuss SAAT on not only DNNs but also kernel SVMs. Correspondingly, we propose two types of optimization algorithms. One is built on the original minimax formulation of adversarial training. It determines the optimal perturbation radii based on the observation that the inner maximization is piecewise approximately linear to perturbation radii. Then we use a fine search to calibrate the values. The other is built on kernel perspective for SVMs and DNNs which transforms the original minimax objective function into an equivalent minimization one. Extensive experimental results show that our framework enjoys better natural generalization performance and higher adversarial robustness compared with other adversarial training algorithms. It is also competitive with existing searching strategies in terms of training time. We summarize the main contributions as follows: • Theoretically, we prove that adaptive perturbation radii contribute to a lower expected adversarial risk than fixed and uniform perturbation radii, which implies higher robustness against adversarial examples. • Our self-adaptive adversarial training algorithms can skillfully assign the optimal perturbation radius for each data, which avoids the step of exact searching and achieves a better trade-off between adversarial robustness and natural accuracy. • The self-adaptive adversarial training strategy that we propose from the kernel perspective is applicable to both SVMs and DNNs. It efficiently optimizes a minimization problem instead of the conventional minimax one since we transform the inner maximization into a simplified and equivalent form.

2.1. NOTATIONS

We focus on C-class classification problems, then the dataset can be defined as D = {x i , y i } n i=1 , where x i ∈ R d is the input data, and y i ∈ {1, • • • , C} is the label. We will use I{a}, the 0-1 loss, to represent an indicator function, which returns 1 if a is true and 0 otherwise.We will use l(•) to indicate the surrogate loss function of 0-1 loss. The set B(δ, ϵ) = {δ : ||δ|| p ≤ ϵ} means that the sample is constrained by an l p -normedfoot_0 perturbation δ with the perturbation radius ϵ. We denote the maximum adversarial loss as l(x, y, f, ϵ) = max δ∈B(δ,ϵ) l(f (x + δ), y), where f ∈ F and F : X → R is one neural network class with depth-D and width-H: F = {x → W D ρ(W D-1 ρ(• • • W 1 x • • • )), ||W i || F ≤ M i , i ∈ [D]}. (1) Here ρ(•) is an activation function with L ρ -Lipschitz and W i is a H i × H i-1 matrix. Then, we have H = max{H 0 , • • • , H D }, H D = 1 and H 0 = d. Thus the class of the maximum adversarial loss can be formulated as lF = { lf : f ∈ F}.

2.2. STANDARD ADVERSARIAL TRAINING

The standard adversarial training considers a minimax problem as follows: min w 1 n n i=1 max δi∈B(δi,ϵ) l(y i , f w (x i + δ i )), ( ) where w is the model parameter, x i + δ i is the adversarial example of x i . The inner maximization problem actually follows the principle of adversarial attack and aims to construct the most aggressive adversarial examples (Madry et al., 2017) , while the outer minimization is to find model parameters to minimize the loss caused by the adversarial examples. It is notable that a fixed and uniform perturbation radius ϵ is exerted for all training samples here.

3. SELF-ADAPTIVE ADVERSARIAL TRAINING

In this section, we first show the superiority of adaptive perturbation radii intuitively and theoretically. Inspired by that, we formulate a novel framework for self-adaptive adversarial training. Then we propose SAAT-minimax to solve the objective.

3.1. SUPERIORITY OF ADAPTIVE PERTURBATION RADII

Although adversarial training with adaptive perturbation radii has been widely studied empirically, its theoretical advantages are seldom explored. To fill this vacancy, in the following section, we first intuitively show its superiority on natural generalization and then theoretically illustrate its strength on adversarial robustness. Adaptive Perturbation Radii Contribute to Better Generalization Performance. Intuitively, as shown in Fig. 1b , for standard adversarial training, the perturbation radii are kept the same for all training samples. However, for samples near the decision boundary, enforcing large perturbation radii will lead to the cross-over mixture of samples in different classes. In this case, it leads to a distorted and undesirable decision boundary and unavoidably destroy the accuracy on unperturbed examples. Thus, we come to the idea of adversarial training with adaptive perturbation radii. As shown in Fig. 1c , the perturbation radii are set according to the specific location of the samples. It effectively avoids the severe distortion of the decision boundary and will not hurt the natural generalization much. Adaptive Perturbation Radii Contribute to Lower Adversarial Risk. In this part, we theoretically prove that adaptive perturbation radii can lead to a tighter upper bound of adversarial risk than fixed ones in the case of binary classification, which implies higher robustness against adversarial examples. Firstly, we provide the definition of the expected adversarial risk R rob as follows: Definition 1. (Expected Adversarial Risk) Following Zhang et al. (2019) ; Schmidt et al. (2018) ; Bubeck et al. (2019) , to characterize the robustness of a binary classifier f : R → {0, 1}, the expected adversarial risk can be defined as R rob (f ) = E (x,y)∼D I{∃δ ∈ B(δ, ϵ) : yf (x + δ) ≤ 0} Based on Figure 1 above, we do not want to further increase ϵ i if the adversarial example already can be misclassified by the classifier. This leads to the definition of the theoretically optimal adaptive perturbation radius ϵ * i as follows: Definition 2. (Optimal Adaptive Perturbation Radius) Theoretically, the optimal adaptive perturbation radius ϵ * i for each sample can be defined as ϵ * i = ϵ max , if ∀δ i ∈ B(δ i , ϵ max ), y i f (x i + δ i ) > 0, arg min ϵi≤ϵmax ϵ i , s.t. ∃δ i ∈ B(δ i , ϵ i ), y i f (x i + δ i ) ≤ 0, otherwise. ( ) where ϵ max is the maximum perturbation radius, ϵ i is the perturbation radius assigned to x i . Remark 3. A few examples of the optimal adaptive perturbation radius can be seen in Figure 1c . For the samples that are misclassified after adversarial attack, ϵ * i is the minimum radii that achieve this goal. For the samples that can be robustly classified even with ϵ max , ϵ * i equals to ϵ max . Before giving our main theorem (i.e., Theorem 5), we provide Assumption 4 as follows. Assumption 4. For the binary classification surrogate loss function l(•), we assume it can be written as l(f (x), y) = ϕ(yf (x)), where ϕ is a non-increasing function and is L ϕ -Lipschitz. Examples of satisfied loss functions include hinge loss, logistic loss (Xiang, 2011 ), exponential loss (Wyner, 2003) and many others. Based on Assumption 4, the upper bound to the expected adversarial risk can be gotten as follows, the detailed proof is in the appendix. Theorem 5. When Assumption 4 holds, for any ω ∈ (0, 1) and any lf ∈ lF , with probability at least 1 -ω, the following holds: R rob (f ) ≤ 1 n n i=1 l (x i , y i , f, ϵ * i ) + 3B log 2/ω 2n + 24B √ n L ϕ L D-1 ρ max{1, d 1 2 -1 p }(X p + ϵ max )Q. where X p = max{||x i || p } n i=1 , Q = 24B √ n L ϕ L D-1 ρ max{1, d 1 2 -1 p }(X p + ϵ max ) log D i=1 π H i H i-1 /2 Γ( H i H i-1 2 +1)

M

HiHi-1 i D i=1 M i and Γ means the gamma function. Then we give Theorem 6 to show that the maximum loss function l(x i , y i , f, ϵ) increases with regard to ϵ. The detailed proof is provided in the appendix. Theorem 6. The maximum loss function l(x i , y i , f, ϵ) is an increasing function with regard to the perturbation radius ϵ. Remark 7. Combing Theorem 5 with Theorem 6, it is evident that replacing ϵ max with ϵ * i will contribute to a tighter upper bound for the expected adversarial risk R rob . It indicates that adaptive perturbation radius in training stage is a better choice than fixed and uniform radius that can lead to higher adversarial robustness.

3.2. FRAMEWORK OF SELF-ADAPTIVE ADVERSARIAL TRAINING

Although several methods have been proposed to search for a suitable perturbation radius for each training sample, there exists a conflict between exact searching and computational load, as mentioned in Section 1 and Table 1 . To achieve fast self-adaptive adversarial training, we creatively introduce a self-adaptive regularizer of perturbation radii (i.e., -λ 1 n n i=1 ϵ i ) into formulation (2), and give our new formulation of self-adaptive adversarial training (SAAT) as follows: min w,ϵ 1 n n i=1 max ∥δi∥p≤ϵi l(y i , f w (x i + δ i )) -λϵ i , s.t. ϵ i ∈ [0, ϵ max ], i = 1, . . . , n. where ϵ i is the customized perturbation radius of x i achieved by the self-adaptive item and λ is the regularization parameter. Thus ϵ i can update dynamically as the maximum adversarial loss of x i changes. Remark 8. Note that a similar term is used in self-paced learning (SPL) (Jiang et al., 2015) . The core idea of SPL is to learn a model by gradually including samples from easy to complex according to their losses since SPL decides whether the samples can be selected into training via a self-paced regularization. Inspired by SPL, we aim to assign a specific perturbation radius ϵ i to each sample according to its loss. Formally, we design a self-adaptive regularizer imposed on ϵ i and add it to the original formulation of adversarial training.

3.3. SAAT-MINIMAX

In this part, we aim to optimize the SAAT framework (5). Specifically, we propose a two-stage search strategy to find the approximate optimal perturbation radii. The first stage is built on the closed-form solution via the piecewise approximate linearity of l(x i , y i , f w , ϵ i ) wrt. the perturbation radii ϵ i . The second stage is a fine search to calibrate the results of the first stage. In the following, we will discuss the two-stage search strategy in detail. Firstly, we observe that l(x i , y i , f w , ϵ i ) is piecewise approximately linear with regard to ϵ i for each sample as shown in Fig. 2 and propose Assumption 9. This assumption is verified in the appendix. Assumption 9. l(x i , y, f w , ϵ i ) is piecewise linear with regard to ϵ i as follows: l(x i , y i , f w , ϵ i ) = max(0, kϵ i + b) (6) where k > 0 is the slope of l with regard to ϵ i and b denotes y-intercept. Then, with the aid of Assumption 9, we come to Theorem 10, which provides the approximate optimal perturbation radii ϵ * i for optimizing objective function (5). Its proof is presented in the appendix. The detailed setting of k i and b i can be seen in section 5.1.3. Theorem 10. For the minimization problem min ϵi∈[0,ϵmax] l(x i , y i , f w , ϵ i ) -λϵ i , if f w is given, and Assumption 9 holds, we have the optimal ϵ * i as follows: ϵ * i =    0, if b i ≥ 0 and k i ≥ λ; -bi ki , if b i < 0 and k i ≥ λ; ϵ max , otherwise. (7) The second stage is a fine search to calibrate the results of Theorem 10. Since Assumption 9 may not hold exactly, we use a simple search strategy with a fixed step size to find more accurate values of ϵ * i . Specifically, if the PGD attack fails to find an adversarial image x i + δ i that can be misclassified, it implies ϵ * i is too small. Thus, we set ϵ * i = ϵ * i + η. Otherwise, we set ϵ * i = ϵ * i -η, where η is a pre-specified fixed step size. Finally, we combine the above two-stage search strategy with the standard adversarial training procedure and give the pseudo-code of our SAAT-minimax in Algorithm 1.

4. SELF-ADAPTIVE ADVERSARIAL TRAINING FROM KERNEL PERSPECTIVE

As we all know, traditional adversarial training aims to optimize a minimax problem. It typically uses a gradient-based iterative solver such as multi-step PGD to approximately solve the inner problem, which often leads to high computational overhead. To solve this problem, we propose a new self-adaptive adversarial training strategy from kernel perspectivefoot_1 . Specifically, it efficiently transforms the minimax problem (5) into an equivalent minimization one. Then we discuss the detailed self-adaptive adversarial training algorithms via the kernel perspective for both DNNs and SVMs. Algorithm 1 SAAT-minimax with l ∞ -norm constrained perturbations Input: D : training set; T : number of epochs; ϵ max : maximum perturbation radius; γ : learning rate; K : PGD steps; α : PGD step size; B : batch size. Output: w. 1: for epoch= 1, • • • , T do 2: Choose a batch of training samples {(x i , y i )} B i=1 ∼ D.

3:

Obtain ϵ * i via Theorem 10.

4:

ϵ * i = max(min(ϵ * i , ϵ max ), 0). 5: for k = 1, • • • , K do 6: δ i = δ i + α • sign(∇ δi l(y i , f w (x i + δ i ))). 7: δ i = max(min(δ i , ϵ * i ), -ϵ * i ). 8: Calibrate ϵ * i via the fine search strategy. 9: end for 10: w = w -γ∇ w l(y i , f w (x i + δ i )). 11: end for 4.1 PRIMARY RESULTS FROM KERNEL PERSPECTIVE The transformation of the minimax problem (5) contains two steps: firstly we map the perturbations from linear to kernel spaces, then we can solve the unconstrained equivalent form of the inner minimization. We first discuss the kernelization of the perturbations δ. For an adversarial example x + δ in the linear space, it is known that if we map it into the kernel space, the kernelized example ϕ(x + δ) will be unpredictable, here ϕ(•) is the feature mapping function. Fortunately, Theorem 14 in (Xu et al., 2009) provides a tight connection between perturbations in the linear and kernel space, i.e., the perturbation range of ϕ(x) + δ ϕ tightly covers that of ϕ(x + δ), where δ ϕ is the perturbation in the kernel space and ∥δ ϕ ∥ 2 ≤ 2f (0) -2f (ϵ). Since we can use a l 2 -norm ball to wrap a l p -norm ball, e.g., {∥δ∥ ∞ ≤ ϵ} ⊆ ∥δ∥ 2 ≤ √ 2ϵ , this theorem is applicable to other norms as well. Based on it, our formulation of self-adaptive adversarial training (5) can be rewritten as the following form in the RKHS H: min f ∈H,ϵ ′ 1 n n i=1 max ∥δ i ϕ ∥2≤ϵ ′ i l y i , ⟨f, ϕ(x i ) + δ i ϕ ⟩ H -λϵ ′ i , s.t. ϵ ′ i ∈ [0, ϵ ′ max ], i = 1, . . . , n. where ϵ ′ i = 2f (0) -2f (ϵ i ), ϵ ′ max = 2f (0) -2f (ϵ max ). Then we can obtain the simplified and equivalent form of the inner maximization of Eq. ( 8) via Theorem 11. The detailed proof can be found in the appendix. Theorem 11. If f is a function in an RKHS H, the inner maximization problem 8) is equivalent to the regularized loss function l (y i , f (x i ) + ϵ ′ ∥f ∥ H ), where ∥ • ∥ H stands for the norm in the RKHS. max ∥δ i ϕ ∥2≤ϵ ′ l(y i , ⟨f, ϕ(x i ) + δ i ϕ ⟩ H ) in ( According to this theorem, our goal turns to optimize the following minimization problem: min f ∈H,ϵ ′ 1 n n i=1 {l (y i , f (x i ) + ϵ ′ i ∥f ∥ H ) -λϵ ′ i } . (9) s.t. ϵ ′ i ∈ [0, ϵ ′ max ], i = 1, . . . , n. For the new problem (9), it is obvious that Theorem 10 can be easily applied here to get the optimal perturbation radius ϵ ′ * i as well, since we denote l (y i , f (x i ) + ϵ ′ i ∥f ∥ H ) as l(x i , y i , f, ϵ ′ i ). In this case, we give the optimization framework of SAAT from the kernel perspective in Algorithm 2, which clearly shows the alternative updating for {ϵ ′ * i } n i=1 and function f . In the following subsection, we will discuss its applications on DNNs and kernel SVMs in detail. (Bietti et al., 2019) to approximate its value: ∥f ∥ H ≥ ∥f ∥ 2 δ := sup ∥δ∥2≤1 f (x + δ) -f (x). In this way, since the optimal solution for the perturbation radii ϵ ′ i has already been attained, we can easily optimize learning objective (9) via optimization algorithms such as SGD (Bottou, 2010) and ADAM (Kingma & Ba, 2014) . The procedures to alternatively optimize {ϵ ′ * i } n i=1 and the model function f is shown in Algorithm 2.

4.2.2. SAAT-SVM ON KERNEL SVMS

Similar with SAAT on DNNs, SAAT on kernel SVMs can be formulated as the following problem: min f ∈H,ϵ ′ ∥f ∥ 2 H 2 + C n n i=1 l(yi, f (xi) + ϵ ′ i ∥f ∥H) -λϵ ′ i . s.t. ϵ ′ i ∈ [0, ϵ ′ max ], i = 1, . . . , n. where 1 2 ∥f ∥ 2 H is the added norm similar to the SVM formulation in (Dai et al., 2014) . As the doubly stochastic gradient descent (DSG) algorithm (Dai et al., 2014) has been proved to be a powerful technique for scalable kernel learning, here we use it optimize Eq. ( 11). The detailed optimization procedure is provided in the appendix.

5. EXPERIMENTS

In this section, we compare SAAT with different adversarial training algorithms on MNIST (Lecun & Bottou, 1998) , CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009) under l 2 /l ∞ -norm constrained perturbations on DNNs. Due to the page limit, we only show partial results of l ∞ norm in the following, other results are presented in the appendix. Experiments on kernel SVMs and the verification of Assumption 9 are also presented in the appendix. • TRADES (Zhang et al., 2019) : This method aims to achieve a trade-off between robustness and accuracy via decomposing the robust error as the sum of natural error and boundary error. • SAAT-kernel: Our self-adaptive adversarial training algorithm on DNNs from the kernel perspective. We apply both the hinge loss and the cross entropy loss in the experiments, i.e., SAAT-kernel h and SAAT-kernel c . • SAAT-minimax: Our self-adaptive adversarial training algorithm on the minimax problem for DNNs. We apply both the hinge loss and the cross entropy loss in the experiments, i.e., SAAT-minimax h and SAAT-minimax c . Four popular attack methods are used in the experiments: FGSM (Goodfellow et al., 2014) , 10-PGD (PGD with 10 steps) (Madry et al., 2017) , CW (Carlini & Wagner, 2017) and AutoAttack (Croce & Hein, 2020) . All the attacks can be performed with both l 2 and l ∞ versions. In the l ∞ version,



In this paper, we consider the lp-norm ball of p ≥ 1 such that the region is convex. The kernel perspective means our function f is in the reproducing kernel Hilbert space (RKHS)(Iii, 2004).



Figure 1: Conceptual illustration of standard adversarial training and our self-adaptive adversarial training (i.e., SAAT).

Figure 2: The sketch map of lf (z i , ϵ i ) wrt. ϵ i .

Natural model training on DNNs which minimizes the cross entropy loss. • Standard (Madry et al., 2017): The standard adversarial training method which uses the K-step PGD as an attacker. • IAAT (Balaji et al., 2019): Instance adaptive adversarial training which uses brute-force search to assign instance-specific perturbation radius ϵ i to each sample. • MMA (Ding et al., 2018): Max-margin adversarial training which directly maximizes the distances from inputs to the decision boundary via binary search for the optimal perturbation radii. • FAT (Zhang et al., 2020): A friendly adversarial training strategy which generates friendly adversarial data by stopping the adversarial data searching algorithms early.

Comparisons of different adversarial training algorithms which aim at achieving better generalization performance on DNNs and SVMs. (Complexity here refers to the time complexity. n is the training size, T is the number of epochs, K and τ are the numbers of steps for PGD attack, where τ ≤ K, c 1 and c 2 denote searching steps for ϵ i .)

Since the RKHS norm ∥f ∥ H cannot be computed on DNNs, we use the lower bound of ∥f ∥ H proposed in

Test accuracy (%) of various defense methods trained on MNIST with l ∞ -norm constrained perturbations on DNNs. (The results of Natural on clean data are just baselines for reference.) 98.44±0.67 97.42±0.82 94.80±0.76 94.42±0.59 93.35±0.46 SAAT-minimax c 98.88±0.27 96.29±0.54 91.97±0.69 92.27±0.74 90.03±0.57

Test accuracy (%) of various defense methods trained on CIFAR10 with l ∞ -norm constrained perturbations on DNNs. (The results of Natural on clean data are just baselines for reference.) 83.06±0.73 27.84±0.59 20.77±0.47 18.75±0.64 14.33±0.71 SAAT-minimax h 85.29±0.68 61.52±0.85 53.86±1.37 47.00±0.96 49.84±0.33 SAAT-minimax c 86.98±0.52 63.73±0.72 51.70±0.66 49.37±0.84 50.68±0.54

Test accuracy (%) of various defense methods trained on CIFAR100 with l ∞ -norm constrained perturbations on DNNs. (The results of Natural on clean data are just baselines for reference.) h 70.72±0.47 47.18±0.33 39.68±0.51 34.70±0.36 32.77±0.57 SAAT-minimax c 68.11±0.57 43.80±0.44 35.33±0.39 31.69±0.67 30.83±0.62 5.1.2 ATTACK SETTINGS:

annex

for FGSM and 10-PGD, the perturbation radius is set as ϵ test = 0.3 for MNIST and ϵ test = 8/255 for CIFAR10 and CIFAR100, the step size for 10-PGD is ϵ test /4, which is a standard setting for adversarial attack (Madry et al., 2017; Ding et al., 2018) .

5.1.3. IMPLEMENTATION DETAILS:

Under l ∞ -norm constrained perturbations, we set ϵ max = 0.3 for MNIST, ϵ max = 8/255 for CIFAR10 and Tiny Imagenet, and the step size is set as ϵ max /4. For all algorithms, we set the batch size as 100 with 10 epochs. We use 5-fold cross validation to choose the optimal learning rate γ ∈ 2 [-3,3] .We use the PreAct ResNet18 architecture for CIFAR10 and CIFAR100 and use two convolutional networks with 16 and 32 convolutional filters followed by a fully connected layer of 100 units for MNIST, which are the same model structures provided by Wong et al. (2020) . For all the compared algorithms, we use the cross entropy loss function. For SAAT-minimax and SAAT-kernel, b i is gotten by l(x i , y i , f w , ϵ max ) -k i ϵ max , we set k i = 2, η = 0.05, linearly increase regularization parameter λ from 1 to 3 on MNIST, set k i = 0.15, η = 0.3/255, linearly increase λ from 0 to 0.5 on CIFAR10, and set k i = 0.3, η = 0.3/255, linearly increase λ from 0 to 0.6 on CIFAR100.

5.2. EXPERIMENTAL RESULTS AND ANALYSES

Robustness against various attacks. We first explore robustness of adversarial training algorithms against different attacks in Tables 2, 3 , 4. It can be seen clearly that our SAAT-minimax not only improves natural generalization performance, but also enjoys stronger defensive ability against various adversarial examples. Moreover, it indicates that hinge loss contributes to higher adversarial robustness than cross entropy. Although SAAT-kernel is not as robustness as adversarial training algorithms on the minimax problem, it largely improves accuracy on clean data. As for the compared algorithms, although they improve the generalization performance on clean data to some extent, they sacrifice much robustness on strong attacks, especially CW and AutoAttack. Running time with different sizes of training samples. Fig. 3 shows the running time of various adversarial training algorithms when training samples of different sizes. We can find that SAAT-kernel is much more efficient due to its one-layer objective function. For other algorithms on the minimax problem, the time-consuming factor lies on the K-step PGD attack. Among adversarial training algorithms with adaptive ϵ i , SAAT-minimax is superior to others since it avoids brute-force search for the optimal ϵ * i . We also note that the time of SAAT-minimax costs a little longer than that of FAT since FAT applies the early-stopped PGD strategy. But the extra time can be ignored compared with the superiority we have in robustness and generalization. 

6. CONCLUSION

To achieve a better trade-off between robustness and accuracy without much computation overhead, in this paper, we propose an adversarial training framework with self-adaptive perturbation radii named SAAT. This framework can also get the closed-form solution of the optimal perturbation radii and avoids tedious searching compared with existing works, which is applicable to both DNNs and kernel SVMs. Comprehensive experimental results verify that our algorithms not only improve adversarial robustness and natural generalization, but also can be competitive with other adversarial training algorithms in terms of running time.

