IMPROVING ROBUSTNESS OF SOFTMAX CORSS-ENTROPY LOSS VIA INFERENCE INFORMATION

Abstract

Adversarial examples easily mislead the vision systems based on deep neural networks (DNNs) trained with the softmax cross entropy (SCE) loss. Such a vulnerability of DNN comes from the fact that SCE drives DNNs to fit on the training samples, whereas the resultant feature distributions between the training and adversarial examples are unfortunately misaligned. Several state-of-the-arts start from improving the inter-class separability of training samples by modifying loss functions, where we argue that the adversarial examples are ignored and thus limited robustness to adversarial attacks is resulted. In this paper, we exploit inference region which inspires us to involve a margin-like inference information to SCE, resulting in a novel inference-softmax cross entropy (I-SCE) loss, which is intuitively appealing and interpretable. The inference information is a guarantee to both the inter-class separability and the improved generalization to adversarial examples, which is furthermore demonstrated under the min-max framework. Extensive experiments show that under strong adaptive attacks, the DNN models trained with the proposed I-SCE loss achieve superior performance and robustness over the state-of-the-arts.

1. INTRODUCTION

Although deep neural networks have achieved state-of-the-art performance on various tasks (Szegedy et al., 2015; Zagoruyko & Komodakis, 2016; He et al., 2015; Huang et al., 2016; Larsson et al., 2016) , it is recently shown that adversarial examples by adding imperceptible disturbances are not hard to fool well-trained neural networks (Szegedy et al., 2014; Goodfellow et al., 2015) , leading to malfunction in intelligent systems such as image classification (Goodfellow et al., 2015; Szegedy et al., 2014) , natural language processing (Jia & Liang, 2017; Carlini & Wagner, 2018) , and autonomous driving (Liu et al., 2019; Chernikova et al., 2019) . The vulnerability to adversarial attacks indicates that the neural networks do not convey proper feature representations and may overfit on the training samples which are even of large amounts (Ilyas et al., 2019) . A reason of this issue is about the loss function used in training. Take the softmax cross entropy (SCE) loss as an example, which is widely adopted in regressing probabilities and is a core building block for high performance. The neural networks trained with SCE are shown to be limited in robustness to input perturbation, hence being suboptimal in real applications where adversarial attacks exist (Carlini & Wagner, 2017; Goodfellow et al., 2015; Kurakin et al., 2017b; Moosavi-Dezfooli et al., 2016; Papernot et al., 2016a) . The above issue brings many attempts that optimize SCE to enhance the robustness and anti-attack properties of neural networks (Sun et al., 2014; Schroff et al., 2015; Wen et al., 2016; Wan et al., 2018; Pang et al., 2020) . These methods follow the same principle that they minimize the losses to maximally fit the training examples. However, the adversarial examples have a misaligned distribution with the training data, meaning that the fitted models in training could be repellent to the adversarial data (Ilyas et al., 2019) . In fact, given a well-trained model, the distribution difference between the training and adversarial data is a blind region to the model, where we term as the inference region. The samples in this region are expected to be generalizable by the well-trained model, which is not the case in existing methods, resulting in the vulnerability of neural networks (Szegedy et al., 2014) . The reason why this region exists, according to our analyses, is that the model overfits on the training data even when large amounts of data is accessible in training and the adversarial data In this paper, we exploit the inference region between the distributions of training data and adversarial examples. This region guides us to develop an inference schema which imposes a margin-like inference information on the predicted logit of the network. Based on this, we propose an inferencesoftmax cross entropy (I-SCE) loss. In this loss, the inference information is intuitively regarded as an additive term imposed on the prediction, which is extremely easy to implement and appealing. We further show the robustness of I-SCE under the Min-Max framework. Under severe adversarial attacks, I-SCE still maintains high accuracy and robustness, and has better resistance. The experiments on MNIST and CIFAR10 demonstrate that the proposed loss produces improved effectiveness and robustness compared with the state-of-the-art methods.

2. RELATED WORK

Adversarial attacks exist widely in open environment, imposing critical robustness demand of neural networks to the security quality and the overall performance of systems. Therefore, how to design an anti-attack and robust neural network has attracted the interest of many researchers, which are briefly reviewed in this section. 2019) used a spectral regularization as the gradient penalty which was combined with adversarial training to alleviate vulnerability. In addition, data augmentation (Zhang et al., 2018; Hendrycks et al., 2020) was a typical option to enhance the generalization ability of neural networks and to reduce the risk of overfitting on training data. However, this option could not completely solve the problem of adversarial attack which always generated new kinds of adversarial examples. As a top performer, the adversarial training (AT) achieved advanced robustness in different adversarial attack environments (Kurakin et al., 2017a; Miyato et al., 2017; Madry et al., 2018; Sinha et al., 2018; Najafi et al., 2019; Shafahi et al., 2019) . By using extra adversarial examples, it enabled the model to learn more generalizable feature representations. The AT mechanism accepted various losses and regularizers, and was a powerful tool to resist attacks. Despite of this, AT might sacrifice the performance on clean input and was computationally expensive (Xie et al., 2019 ). Schmidt et al. (2018) showed that the sample complexity of robust learning might be much larger than standard learning. Robust loss functions: Lots of studies have been conducted to improve the widely used SCE loss function, most of which focused on encouraging higher intra-class compactness and greater separation between classes. The comparing loss (Sun et al., 2014) and the triplet loss (Schroff et al., 2015) were proposed to improve the internal compactness of each class which, however, suffered from the slowed training process and the unstable convergence. Center loss (Wen et al., 2016) avoided 



Adversarial attack: Szegedy et al. (2014) first proposed the concept of adversarial examples and employed the L-BFGS method as the solver of a disturbed problem to mislead neural networks. Goodfellow et al. (2015) proposed the Fast Gradient Symbol Method (FGSM) to generate adversarial examples with a single gradient step. Before backpropagation, FGSM was used to perturb the input of the model, which was an early form of adversarial training. Moosavi-Dezfooli et al. (2016) proposed the DeepFool which calculated the minimal necessary disturbance and applied it to construct adversarial examples. By imposing the 2 regularization to limit the disturbance scale, DeepFool achieved good performance. After this, Madry et al. (2018) proposed the projected gradient descent (PGD) attack which had a strong attack strength, and was used in adversarial training to improve robustness. Recently, Guo et al. (2019) developed a local searching-based technique to construct a numerical approximation of the gradient, which was then used to perturb a small part of the input image. Adversarial defense: The features of adversarial examples could follow a different distribution from the clean training data, making the defense progress very difficult. Distillation temperature was used to stabilize the gradient during training, thereby reducing the sensitivity of the model to disturbances (Papernot et al., 2016b). Metzen et al. (2017) introduced a novel model to detect adversarial examples. Chen et al. (2017) injected annealing noise into the softmax function during training to alleviate the early saturation problem of softmax loss. Xie et al. (2018) proposed the use of random resizing and random padding on images for defense. Ross & Doshi-Velez (2018) and Yan et al. (2018) proposed to regularize the gradients during training to improve the model robustness. Farnia et al. (

