IMPROVING ROBUSTNESS OF SOFTMAX CORSS-ENTROPY LOSS VIA INFERENCE INFORMATION

Abstract

Adversarial examples easily mislead the vision systems based on deep neural networks (DNNs) trained with the softmax cross entropy (SCE) loss. Such a vulnerability of DNN comes from the fact that SCE drives DNNs to fit on the training samples, whereas the resultant feature distributions between the training and adversarial examples are unfortunately misaligned. Several state-of-the-arts start from improving the inter-class separability of training samples by modifying loss functions, where we argue that the adversarial examples are ignored and thus limited robustness to adversarial attacks is resulted. In this paper, we exploit inference region which inspires us to involve a margin-like inference information to SCE, resulting in a novel inference-softmax cross entropy (I-SCE) loss, which is intuitively appealing and interpretable. The inference information is a guarantee to both the inter-class separability and the improved generalization to adversarial examples, which is furthermore demonstrated under the min-max framework. Extensive experiments show that under strong adaptive attacks, the DNN models trained with the proposed I-SCE loss achieve superior performance and robustness over the state-of-the-arts.

1. INTRODUCTION

Although deep neural networks have achieved state-of-the-art performance on various tasks (Szegedy et al., 2015; Zagoruyko & Komodakis, 2016; He et al., 2015; Huang et al., 2016; Larsson et al., 2016) , it is recently shown that adversarial examples by adding imperceptible disturbances are not hard to fool well-trained neural networks (Szegedy et al., 2014; Goodfellow et al., 2015) , leading to malfunction in intelligent systems such as image classification (Goodfellow et al., 2015; Szegedy et al., 2014) , natural language processing (Jia & Liang, 2017; Carlini & Wagner, 2018) , and autonomous driving (Liu et al., 2019; Chernikova et al., 2019) . The vulnerability to adversarial attacks indicates that the neural networks do not convey proper feature representations and may overfit on the training samples which are even of large amounts (Ilyas et al., 2019) . A reason of this issue is about the loss function used in training. Take the softmax cross entropy (SCE) loss as an example, which is widely adopted in regressing probabilities and is a core building block for high performance. The neural networks trained with SCE are shown to be limited in robustness to input perturbation, hence being suboptimal in real applications where adversarial attacks exist (Carlini & Wagner, 2017; Goodfellow et al., 2015; Kurakin et al., 2017b; Moosavi-Dezfooli et al., 2016; Papernot et al., 2016a) . The above issue brings many attempts that optimize SCE to enhance the robustness and anti-attack properties of neural networks (Sun et al., 2014; Schroff et al., 2015; Wen et al., 2016; Wan et al., 2018; Pang et al., 2020) . These methods follow the same principle that they minimize the losses to maximally fit the training examples. However, the adversarial examples have a misaligned distribution with the training data, meaning that the fitted models in training could be repellent to the adversarial data (Ilyas et al., 2019) . In fact, given a well-trained model, the distribution difference between the training and adversarial data is a blind region to the model, where we term as the inference region. The samples in this region are expected to be generalizable by the well-trained model, which is not the case in existing methods, resulting in the vulnerability of neural networks (Szegedy et al., 2014) . The reason why this region exists, according to our analyses, is that the model overfits on the training data even when large amounts of data is accessible in training and the adversarial data is clearly absent. Hence, how to generalize to the samples in this region still remains unresolved. Unfortunately, the above methods fail to take this fact into consideration. In this paper, we exploit the inference region between the distributions of training data and adversarial examples. This region guides us to develop an inference schema which imposes a margin-like inference information on the predicted logit of the network. Based on this, we propose an inferencesoftmax cross entropy (I-SCE) loss. In this loss, the inference information is intuitively regarded as an additive term imposed on the prediction, which is extremely easy to implement and appealing. We further show the robustness of I-SCE under the Min-Max framework. Under severe adversarial attacks, I-SCE still maintains high accuracy and robustness, and has better resistance. The experiments on MNIST and CIFAR10 demonstrate that the proposed loss produces improved effectiveness and robustness compared with the state-of-the-art methods.

2. RELATED WORK

Adversarial attacks exist widely in open environment, imposing critical robustness demand of neural networks to the security quality and the overall performance of systems. Therefore, how to design an anti-attack and robust neural network has attracted the interest of many researchers, which are briefly reviewed in this section. Adversarial attack: Szegedy et al. (2014) first proposed the concept of adversarial examples and employed the L-BFGS method as the solver of a disturbed problem to mislead neural networks. Goodfellow et al. (2015) proposed the Fast Gradient Symbol Method (FGSM) to generate adversarial examples with a single gradient step. Before backpropagation, FGSM was used to perturb the input of the model, which was an early form of adversarial training. Moosavi-Dezfooli et al. (2016) proposed the DeepFool which calculated the minimal necessary disturbance and applied it to construct adversarial examples. By imposing the 2 regularization to limit the disturbance scale, DeepFool achieved good performance. After this, Madry et al. (2018) proposed the projected gradient descent (PGD) attack which had a strong attack strength, and was used in adversarial training to improve robustness. Recently, Guo et al. (2019) developed a local searching-based technique to construct a numerical approximation of the gradient, which was then used to perturb a small part of the input image.

Adversarial defense:

The features of adversarial examples could follow a different distribution from the clean training data, making the defense progress very difficult. Distillation temperature was used to stabilize the gradient during training, thereby reducing the sensitivity of the model to disturbances (Papernot et al., 2016b) . Metzen et al. (2017) introduced a novel model to detect adversarial examples. Chen et al. (2017) injected annealing noise into the softmax function during training to alleviate the early saturation problem of softmax loss. Xie et al. (2018) proposed the use of random resizing and random padding on images for defense. Ross & Doshi-Velez (2018) and Yan et al. (2018) proposed to regularize the gradients during training to improve the model robustness. Farnia et al. (2019) used a spectral regularization as the gradient penalty which was combined with adversarial training to alleviate vulnerability. In addition, data augmentation (Zhang et al., 2018; Hendrycks et al., 2020) was a typical option to enhance the generalization ability of neural networks and to reduce the risk of overfitting on training data. However, this option could not completely solve the problem of adversarial attack which always generated new kinds of adversarial examples. As a top performer, the adversarial training (AT) achieved advanced robustness in different adversarial attack environments (Kurakin et al., 2017a; Miyato et al., 2017; Madry et al., 2018; Sinha et al., 2018; Najafi et al., 2019; Shafahi et al., 2019) . By using extra adversarial examples, it enabled the model to learn more generalizable feature representations. The AT mechanism accepted various losses and regularizers, and was a powerful tool to resist attacks. Despite of this, AT might sacrifice the performance on clean input and was computationally expensive (Xie et al., 2019) . Schmidt et al. (2018) showed that the sample complexity of robust learning might be much larger than standard learning. Robust loss functions: Lots of studies have been conducted to improve the widely used SCE loss function, most of which focused on encouraging higher intra-class compactness and greater separation between classes. The comparing loss (Sun et al., 2014) and the triplet loss (Schroff et al., 2015) were proposed to improve the internal compactness of each class which, however, suffered from the slowed training process and the unstable convergence. Center loss (Wen et al., 2016) avoided the problem of slow convergence and instability by minimizing the Euclidean distance between features and the corresponding class centers, but the resultant robustness was not satisfactory. Liu et al. (2016) converted the softmax loss to the cosine space, and proposed that the angular distance margin favoured high intra-class compactness and inter-class separability. Wan et al. (2018) proposed the large-margin Gaussian Mixture loss, which used the Gaussian mixture distribution to fit the training data and increased the distance between feature distributions of different classes. Pang et al. (2020) proposed the Max-Mahalanobis center (MMC) loss to induce dense feature regions, which encouraged the model to concentrate on learning ordered and compacted representations. Different from the previous works which improve the loss function to better fit the data distribution, the proposed method (i.e. I-SCE) is a much simple and interpretable way to enable the neural networks to learn freely. Moreover, we advocate that I-SCE encourages the models to be more generalizable with respect to the adversarial data instead of being overfitting on the training data.

3. METHODS

In this section, we introduce the inference-softmax cross entropy loss by first presenting the definition of inference region, which motivates us to develop an inference schema.

3.1. INFERENCE REGION

Current neural networks tend to overfit on the by-hand clean training data which, however, cannot work out a robust model and instead, makes them vulnerable to adversarial attacks. We advocate that this scenario is caused by the misaligned distribution between the clean training data and the adversarial data, and overfitting prevents the model to be tolerant to input perturbations. The distribution difference is termed as inference region, which characterizes why adversarial examples are outliers to the neural networks trained on clean data. Given Figure 1 The softmax cross entropy (SCE) loss is a typical loss function used in training deep models, which imposes a hard constraint on the label of the input, i.e. regressing the probability of 1 on the correct label and the probability of 0 on the incorrect labels (generally in the case of one-hot label representation). Unfortunately, the hard constraint, on one hand, causes a difficult regression process in training, and on the other hand, makes the resultant model over-confident on the predictions, hence bringing the issue of vulnerability. This has already been mentioned in the literatures of label smoothing (Szegedy et al., 2016; Papernot et al., 2016b; Müller et al., 2019; Pereyra et al., 2017; Zou et al., 2019) , which solves the problem by designing a soft label or a soft output distribution, i.e. regressing the probabilities of 1and /K on the correct and incorrect labels, respectively, where K is the number of task labels. We give an intuitive explanation of the above discussion in Figure 2 (a) and Figure 2(b) . As seen, the SCE encourages the label regression from one side along the 0-1 axis, whereas the label smoothing drives the regression from both sides around the target probabilities. Besides, we also identify that the margin-based idea in SCE is similar to the label smoothing. Specifically, the soft label implies a margin between the true distribution and the soft output distribution. Considering that softmax is a monotonically increasing function, a margin between the label distributions can induce a margin between features in the logit layer of a neural network, as in the ArcFace loss (Deng et al., 2019) . From Figure 2 (c), we see that ArcFace pushes regression towards the target angles from both sides in a circle axis. While the above analyses inform us that the regression is performed from either one side or both sides, here, we propose an alternative definition of soft label which could be regressed from arbitrary directions in feature space. Specifically, we free the circle constraint in ArcFace and impose the additive margin to the features only normalized by L2-norm. In this way, the resultant features are not necessarily located on a circle or a sphere, and on the contrary, the marge is isotropically posed around each sample in the feature space, as shown in Figure 2(d ). We will empirically demonstrate the effectiveness of this operation over the ArcFace. By implementing this margin idea, the inference information is then contained in the margin, which could help 1) to avoid overfitting and 2) to improve the generalization ability of the feature representation, driving the decision boundary towards the boundary of inference region in feature space. Hence, the small perturbation of an adversarial examples is not easy to cross the decision boundary, greatly alleviating the problem of vulnerability. This schema is simple, interpretable, and effective as demonstrated in experiments. In the following, we present the inference-softmax cross entropy in details.

3.2. INFERENCE-SOFTMAX CROSS ENTROPY

To derive a robust loss for neural network training, in this section, we apply the inference-schema on SCE and propose an inference-softmax cross entropy (I-SCE) loss, which could encourage the tolerance of the model to adversarial perturbations, thus avoid overfitting. Given a k-class classification task, the posterior probability predicted by the deep model using softmax is P (y = i|x) = e fi(x) j e fj (x) , where i ∈ [1, k] is the label candidate, and f i is the prediction function for the i-th class which specifies both the backbone and the softmax layer in a typical classification network. To improve the vulnerability of SCE, we impose the inference information to the logits produced by the neural networks and propose an inference softmax as P I (y = i|x) = e sfi(x)+m e sfi(x)+m + j =i e fj (x) , which then induces the inference-softmax cross entropy loss as I-SCE = - k i=1 y i ln e yi(sfi(x)+m)+(1-yi)fi(x) e yi(sfi(x)+m)+(1-yi)fi(x) + j =i e fj (x) , where y i = 1 if the ground truth label of x is i and otherwise 0, and s ≥ 1 is used to scale the predication f i (x) and control the gradient update rate on the right class. Note that we use y i as an indicator of the inference information, that is, s and m are only imposed on the right class instead of all classes. As seen, the implementation of this loss is very easy by simply adding a scalar and a constant on the prediction of the right class, which is unaggressive to the original training code of neural networks. In the implementation of I-SCE, we find that the case of f i f j , j = i possibly occurs, which reduces the effect of m. To address this issue, we normalize f (x) by L 2 to increase the numerical stabilization. During the inference process, Eq. 2 is calculated by firstly finding the index i of the maximal value f i (x) among i ∈ [1, k] and then applying s and m on the i-th class according to this equation. This operation does not change the class decision since s ≥ 1 and m > 0.

3.3.1. EXPECTED INTERVAL OF CORRECT CLASS

To demonstrate the robustness of I-SCE, we analyze the expected intervals of the correct class predicted by both I-SCE and SCE. Here, assume the minimum perturbation δ which makes the model just misclassified. The probability that the SCE model recognizes the adversarial sample x + δ as the correct label i is x+δ) . P (i|x + δ) = e fi(x+δ) j e fj ( (4) Regarding the I-SCE model, the probability is then P I (i|x + δ) = e sfi(x+δ )+m e sfi(x+δ)+m + j =i e fj (x+δ) . (5) The expected intervals of the correct class by using SCE and I-SCE are defined as L = P (i|x) -P (i|x + δ) = e fi(x) j e fj (x) - e fi(x+δ) j e fj (x+δ) and L I = P I (i|x) -P I (i|x + δ) = e sfi(x )+m e sfi(x)+m + j =i e fj (x) -e sfi(x+δ)+m e sfi(x+δ)+m + j =i e fj (x+δ) , (7) respectively. The vulnerability of SCE to adversarial attacks states that f (x + δ) < f (x). Considering that the perturbation δ is a just value that misleads the SCE model, the expected interval measures the maximal level of perturbation that the model is robust on. The larger the interval is, the more robust the model is. Starting from this point, we show the following property of I-SCE: When s ≥ 1, m > 0, and se sf i (x)+m (e sf i (x)+m + j =i e f j (x) ) 2 - e f i (x) ( j e f j (x) ) 2 > 0, L I is larger than L. The condition in the above property is both theoretically demonstrated and empirically validated in Appendix A.2. This states that the robustness of I-SCE is improved compared with SCE.

3.3.2. MIN-MAX FRAMEWORK

The above robustness conclusion is also applicable to the Min-Max framework (Madry et al., 2018) , which is a typical framework of adversarial attack and defense. The Min-Max framework is formulated as min θ ρ(θ), where ρ(θ) = E (x,y)∼D max δ∈S (θ, x + δ, y) , where θ is the model parameter and δ is the input perturbation. The internal maximization is an attack process that finds the perturbation that maximally mislead the model θ. The external minimization is a defense process that encourages the model tolerant to such an attack. We use ρ I and ρ to represent the objective losses by using I-SCE and SCE, respectively. Given an input perturbation δ and a trained model {f i }, we have when s ≥ 1 and m > 0, P I (i|x + δ) > P (i|x + δ), which is proven in Eq. 12 of Appendix A.2. This states that the P I results in a lower loss than P , i.e. ρ I < ρ. Hence, the lower loss indicates the better defense performance on adversarial attacks, which demonstrates the improved robustness of I-SCE.

3.4. RELATIONSHIP WITH LARGE MARGIN LEARNING

While the proposed method (I-SCE) could be viewed as a margin-based loss, the difference to the ArcFace loss is how the margin (or inference information) is applied to the logits. ArcFace loss normalizes the features and the weights such that the resultant features are located on a hypersphere, and the training process regresses the class targets along the surface of the hyper-sphere. Instead, the proposed method only normalizes the features but not the weights in order to locate the features in a free space, in which case the regression process can be performed in any direction. We advocate that freeing the sphere constraint will bring performance improvement of adversarial defencing, which is demonstrated in the experiments. The reason of the effectiveness may be that the adversarial perturbation causes a large variation in the feature space. Constraining the features on a hyper-sphere would bring a large feature shift if the normalization direction (to the sphere) is undesirable. By contrast, the proposed method prefers isotropic tolerance to feature perturbations, hence being better.

4. EXPERIMENTS

In this section, we conduct a series of experiments on MNIST (Lecun & Bottou, 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009) to demonstrate the effectiveness of the proposed I-SCE. The backbone used in our implementation is ResNet-32 with five stages (He et al., 2016) , which is optimized & Ba, 2015) . We employ the white-box attack and the blackbox attack including the targeted and untargeted PGD (Madry et al., 2018) , deepfool (Moosavi-Dezfooli et al., 2016) , and SimBA (Guo et al., 2019) . We select the state-of-the-arts as competitors, such as the Center loss (Wen et al., 2016) , the large-margin Gaussian Mixture (L-GM) loss (Wan et al., 2018) , ArcFace loss (Deng et al., 2019) , the Max-Mahalanobis center (MMC) loss (Pang et al., 2020) , the random method (Xie et al., 2018) , Label Smoothing (Szegedy et al., 2016) , and the adversarial training (AT) method (Madry et al., 2018) .

4.1. ABLATION STUDIES

There are two hyper-parameters s and m in the proposed I-SCE, which affects the defense performance. We set the ranges as s ∈ [1, 2] and m ∈ (0, 0.1], and densely evaluate the performance of I-SCE under different settings and different attacks. Figure 3 illustrates the results, from which we see that the performance is highly correlated with the settings, the attack types, and the datasets. Therefore, to get better robustness, the parameters need to be reset in different tasks by using a small validation set. In the following experiments, to make fair comparison, we set s = 1 and m = 0.1.

4.2. COMPARISON WITH STATE-OF-THE-ARTS

PGD attack: The PGD attack is a strong white-box untargeted and targeted attack. We use L 2 constrained untargeted and targeted PGD attacks for comparison. The results are listed in Table 1 and Table 2 . The Clean column is the accuracy on clean samples, is the perturbation level, and PGD tar,un 10,50 represents the targeted or untargeted attacks with 10 or 50 iterations. The results indicate that I-SCE produces better performance than the others in most cases. While AT sometimes achieves good performance, it has a noticeable sacrifice of accuracy on clean examples, e.g. on CIFAR10 and it has weaker defense against strong PGD attacks than I-SCE. By constrast, I-SCE preserves high Deepfool attack: The Deepfool attack generates minimal input perturbations to mislead the neural Networks. Here, we use the L 2 constrained Deepfool attack on MNIST and CIFAR10. From the results in Table 3 , it is clearly observed that I-SCE produces much higher performance than all competitors, which have very limited defense ability against Deep fool. The performance improvement of I-SCE is above 50% in most cases, which is significant and exciting. In real applications, the minimal disturbance generated by Deepfool is more usual than the strong offensive disturbance generated by PGD. Therefore, the results indicate that I-SCE is more suitable and can achieve better performance in real scenarios than the other methods. Black-box attack: Robust performance is critical to claim reliable robustness against the black-box attacks (Carlini et al., 2019) . SimBA (Guo et al., 2019 ) is a black-box query-based attack, which is employed here. We set the frequency of query as 300 times per image on MNIST and 500 times per image on CIFAR10. The results under different disturbance levels are shown in Table 4 , from which we see that I-SCE has higher accuracy and little sacrifice of accuracy compared with the others. This evidence indicates that I-SCE can induce reliable robustness rather than the false one caused by, e.g., gradient mask (Athalye et al., 2018) . Feature embedding: To visually investigate the effect of I-SCE, we compute a 3d representation of the input by adding a three-dimensional embedding layer before the output layer. The embedded points are plotted in Figure 4 , where the samples are selected from the test set of MNIST and CIFAR10 without any perturbation. As seen, the samples of SCE distribute confusedly in the space, where little perturbations on the samples could change the category decision. In contrast, I-SCE produces separable clusters for each classes with large margins among them, which hence has higher tolerance to the perturbations than the other competitors. In real applications on mobiles, shallow networks are generally preferred because of low computational costs. Hence, in this section, we evaluate the robustness of the proposed I-SCE with shallow networks. Specifically, we following the same settings of competitors and attack methods as in Section 4. The backbone network is LeNet-5 (Lecun & Bottou, 1998) for MNIST and an 8-layer neural network for CIFAR10. Table 5 and Table 6 illustrate the performance under the PGD attack on MNIST and CIFAR10, respectively. The results indicate that I-SCE performs surprisingly well in all cases of attack, while remaining slight sacrifice of accuracy on clean data. Notably, the performance gaps between I-SCE and the others are above 50% in many cases, which validates the effectiveness of the proposed schema. More importantly, under severe attacks, I-SCE still shows strong robustness. Table 7 lists the results of all methods under the Deepfool attack. We find that the performance of I-SCE is comparable with the state-of-the-arts. MMC produces the best accuracy under attacks, but has noticeable sacrifice of accuracy on clean data. By contrast, I-SCE shows better trade-off between accuracy and robustness.

A.2 PROVE OF THE PROPERTY ON EXPECTED INTERVAL OF CORRECT CLASS

According to the definition L and L I in Eq. 6 and Eq. 7, we can derive L I -L = e sfi(x)+m e sfi(x)+m + j =i e fj (x) -e sfi(x+δ)+m e sfi(x+δ)+m + j =i e fj (x+δ) -e fi(x) j e fj (x) + e fi(x+δ) j e fj (x+δ) . (10) By defining h(f (x)) = P I (i|x) -P (i|x) = e sfi(x)+m e sfi(x)+m + j =i e fj (x) -e fi(x) j e fj (x) , (11) we then have h(f (x)) = P I (i|x) -P (i|x) = e sfi(x)+m e sfi(x)+m + j =i e fj (x) -e fi(x) j e fj (x) = e sfi(x)+m j e fj (x) -e fi(x) (e sfi(x)+m + j =i e fj (x) ) (e sfi(x)+m + j =i e fj (x) ) j e fj (x) = e sfi(x)+m j =i e fj (x) -e fi(x) j =i e fj (x) (e sfi(x)+m + j =i e fj (x) ) j e fj (x)

=

j =i e fj (x) (e sfi(x)+m -e fi(x) ) (e sfi(x)+m + j =i e fj (x) ) j e fj (x) . (12) The above equation gives us that h(f (x)) > 0 ⇐⇒ e sfi(x)+m -e fi(x) > 0 ⇐⇒ e sf i (x)+m e f i (x) > 1 ⇐⇒ e (s-1)fi(x)+m > 1 ⇐⇒ (s -1)f i (x) + m > 0 ⇐⇒ s > fi(x)-m fi(x) . Hence, when the parameters s and m satisfy s ≥ 1 > fi(x)-m fi(x) and m > 0, h(f (x)) > 0. When s = 1 and m = 0, P I (i|x) degenerates to P (i|x). Similarly, regarding h(f (x + δ)), we have h(f (x + δ)) = P I (i|x + δ) -P (i|x + δ) = e sfi(x+δ)+m e sfi(x+δ)+m + j =i e fj (x+δ) -e fi(x+δ) j e fj (x+δ) j =i e fj (x+δ) (e sfi(x+δ)+m -e fi(x+δ) ) (e sfi(x+δ)+m + j =i e fj (x+δ) ) j e fj (x+δ) . (13) When s ≥ 1 > fi(x+δ)-m fi(x+δ) and m > 0, h(f (x + δ)) > 0. Based on the above derivations, we calculate L I -L = h(f (x)) -h(f (x + δ)) To analyze the sign of the above equation, we compute the derivative of h(f (x)) with respective to f i (x) as ∂h(f (x)) ∂f i (x) = ∂( e sf i (x)+m e sf i (x)+m + j =i e f j (x) -e f i (x) j e f j (x) ) ∂f i (x) = j =i e fj (x) se sfi(x)+m (e sfi(x)+m + j =i e fj (x) ) 2j =i e fj (x) e fi(x) ( j e fj (x) ) 2 = j =i e fj (x) ( se sfi(x)+m (e sfi(x)+m + j =i e fj (x) ) 2 -e fi(x) ( j e fj (x) ) 2 ). (15) ( j e f j (x) ) 2 > 0, h(f (x)) is monotonically increasing. This guarantees that L I -L > 0. However, the condition se sf i (x)+m (e sf i (x)+m + j =i e f j (x) ) 2 -e f i (x) ( j e f j (x) ) 2 > 0 is not easy to validate in the case of s ≥ 1 and m > 0. Here, we conduct an experiment to empirically demonstrate its validation. Specifically, we compute the empirical values j =i e fj (x) ≈ 8, f i (x) ≈ 0.97 by averaging the corresponding values of all samples in MNIST and CIFAR10. By employing these two values, we plot the 3D surface of z = se sf i (x)+m (e sf i (x)+m + j =i e f j (x) ) 2 - e f i (x) ( j e f j (x) ) 2 with respect to s and m, which is shown in Figure 5 . The surface indicates that z is always larger than 0 when s > 1 and m > 0. This empirically demonstrates the validness of the conditions in the property.



as an illustration, the features of adversarial examples reside outside the feature space of training examples, whereas the decision boundary specified by the well-trained model closely fit the training data area. Considering that the type of adversarial attack incrementally appear in real scenarios, the decision boundary in Figure 1(a) is not good enough to give the right prediction, even if several kinds of input perturbation are involved in training. Instead, adversarial attacks are assumed to result in an isotropic expansion of the feature space, where the expanded region is the inference region as shown in Figure 1(b). Then, our task is to encourage the model generalizable to this region.

Figure 1: Illustration of the inference region: The grey circle region contains the features of the clean data x, and the orange circle region contains the features of the adversarial data x + δ, where δ is the adversarial perturbation. When using SCE, the optimized decision boundary is located closely to the clean data area as shown in subfigure (a), whereas the expected boundary is around the adversarial data area as shown in subfigure (b). Considering the isotropic expansion of the space caused by adversarial perturbation, the inference region is then induced from the annular area.

Figure 2: Intuitive explanation of label regression. (a) is the softmax cross entropy case Which regresses the probabilities from one side. (b) is the label smooth case which regresses the soft labels from both sides. (c) is the ArcFace case which regresses the targets on a circle axis in feature space, i.e. encouraging the circular margins between different classses. (d) is the inference softmax cross entropy case which regresses the targets from all directions, i.e. encouraging the isotropic margins between different classes.

Figure 3: Performance of I-SCE under different parameter settings. The x-axis is s, the y-axis is m, and the z-axis is the accuracy. (a) Deepfool attack on MNIST. (b) Deepfool attack on CIFAR10. (c) Untargeted PGD attack on MNIST. (d) Untargeted PGD attack on CIFAR10.

Figure 4: Illustration of three-dimensional feature embedding.

Figure 5: Parameter selection s,m

Classification accuracy (%) under PGD attack on MNIST.

Classification accuracy (%) under PGD attack on CIFAR10.

Classification accuracy (%) under Deepfool attack. Method Clean DF 10 DF 50 DF 10 DF 50 Clean DF 3

Classification accuracy (%) under SimBA attack.

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. CoRR, abs/1605.07146, 2016. Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 -May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=r1Ddp1-Rb.

Performance (%) of shallow neural networks under PGD attack on MNIST.

Performance (%) of shallow neural networks under PGD attack on CIFAR10.

Performance (%) of shallow neural networks under Deepfool attack. Method Clean DF 10 DF 50 DF 10 DF 50 Clean DF 10 DF 50 DF 10 DF 50 .20 20.07 19.86 19.75 19.64 I-SCE 98.42 97.48 28.26 92.62 25.13 81.22 29.38 21.39 28.21 20.48 Random 98.09 71.46 42.60 59.34 49.49 78.71 28.75 21.54 31.75 25.44 AT 96.76 4.08 1.70 3.35 1.43 78.81 66.51 30.39 66.09 29.77

