IMPROVING ROBUSTNESS OF SOFTMAX CORSS-ENTROPY LOSS VIA INFERENCE INFORMATION

Abstract

Adversarial examples easily mislead the vision systems based on deep neural networks (DNNs) trained with the softmax cross entropy (SCE) loss. Such a vulnerability of DNN comes from the fact that SCE drives DNNs to fit on the training samples, whereas the resultant feature distributions between the training and adversarial examples are unfortunately misaligned. Several state-of-the-arts start from improving the inter-class separability of training samples by modifying loss functions, where we argue that the adversarial examples are ignored and thus limited robustness to adversarial attacks is resulted. In this paper, we exploit inference region which inspires us to involve a margin-like inference information to SCE, resulting in a novel inference-softmax cross entropy (I-SCE) loss, which is intuitively appealing and interpretable. The inference information is a guarantee to both the inter-class separability and the improved generalization to adversarial examples, which is furthermore demonstrated under the min-max framework. Extensive experiments show that under strong adaptive attacks, the DNN models trained with the proposed I-SCE loss achieve superior performance and robustness over the state-of-the-arts.

1. INTRODUCTION

Although deep neural networks have achieved state-of-the-art performance on various tasks (Szegedy et al., 2015; Zagoruyko & Komodakis, 2016; He et al., 2015; Huang et al., 2016; Larsson et al., 2016) , it is recently shown that adversarial examples by adding imperceptible disturbances are not hard to fool well-trained neural networks (Szegedy et al., 2014; Goodfellow et al., 2015) , leading to malfunction in intelligent systems such as image classification (Goodfellow et al., 2015; Szegedy et al., 2014) , natural language processing (Jia & Liang, 2017; Carlini & Wagner, 2018) , and autonomous driving (Liu et al., 2019; Chernikova et al., 2019) . The vulnerability to adversarial attacks indicates that the neural networks do not convey proper feature representations and may overfit on the training samples which are even of large amounts (Ilyas et al., 2019) . A reason of this issue is about the loss function used in training. Take the softmax cross entropy (SCE) loss as an example, which is widely adopted in regressing probabilities and is a core building block for high performance. The neural networks trained with SCE are shown to be limited in robustness to input perturbation, hence being suboptimal in real applications where adversarial attacks exist (Carlini & Wagner, 2017; Goodfellow et al., 2015; Kurakin et al., 2017b; Moosavi-Dezfooli et al., 2016; Papernot et al., 2016a) . The above issue brings many attempts that optimize SCE to enhance the robustness and anti-attack properties of neural networks (Sun et al., 2014; Schroff et al., 2015; Wen et al., 2016; Wan et al., 2018; Pang et al., 2020) . These methods follow the same principle that they minimize the losses to maximally fit the training examples. However, the adversarial examples have a misaligned distribution with the training data, meaning that the fitted models in training could be repellent to the adversarial data (Ilyas et al., 2019) . In fact, given a well-trained model, the distribution difference between the training and adversarial data is a blind region to the model, where we term as the inference region. The samples in this region are expected to be generalizable by the well-trained model, which is not the case in existing methods, resulting in the vulnerability of neural networks (Szegedy et al., 2014) . The reason why this region exists, according to our analyses, is that the model overfits on the training data even when large amounts of data is accessible in training and the adversarial data

