INCREASING-MARGIN ADVERSARIAL (IMA) TRAIN-ING TO IMPROVE ADVERSARIAL ROBUSTNESS OF NEURAL NETWORKS

Abstract

Deep neural networks (DNNs), including convolutional neural networks, are known to be vulnerable to adversarial attacks, which may lead to disastrous consequences in life-critical applications. Adversarial samples are usually generated by attack algorithms and can also be induced by white noises, and therefore the threats are real. In this study, we propose a novel training method, named Increasing Margin Adversarial (IMA) Training, to improve DNN robustness against adversarial noises. During training, the IMA method increases the margins of training samples by moving the decision boundaries of the DNN model far away from the training samples to improve robustness. The IMA method is evaluated on six publicly available datasets (including a COVID-19 CT image dataset) under strong 100-PGD white-box adversarial attacks, and the results show that the proposed method significantly improved classification accuracy on noisy data while keeping a relatively high accuracy on clean data. We hope our approach may facilitate the development of robust DNN applications, especially for COVID-19 diagnosis using CT images.

1. INTRODUCTION

Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have become the first choice for automated image analysis due to its superior performance. However, recent studies have shown that DNNs are not robust to a special type of noise, called adversarial noise. Adversarial noise was first discovered by (Szegedy et al., 2013) and then explained by (Goodfellow et al., 2014) . Adversarial noises can significantly affect robustness of DNNs for a wide range of image classification applications (Akhtar & Mian, 2018) , such as handwritten digits recognition (Graese et al., 2016) , human face recognition (Mirjalili & Ross, 2017) , and even traffic sign detection (Eykholt et al., 2018) . DNN-based image segmentation can also be affected by adversarial noises because image segmentation is often realized by pixel classification. The COVID-19 pandemic has infected millions of people and caused the death of about 1 million people as of today (WHO, 2020) . A large scale study in China shows that CT had higher sensitivity for the diagnosis of COVID-19 as compared with initial reverse-transcription polymerase chain reaction (RT-PCR) from swab samples (Ai et al., 2020) . As reviewed in (Shi et al., 2020) , many DNN models for COVID-19 diagnosis from CT images have been developed and achieved very high classification accuracy. However, none of these studies (Shi et al., 2020) considered DNN robustness against adversarial noises. We modified a Resnet-18 model (He et al., 2016) and trained it on a public COVID-19 CT image dataset (Soares et al., 2020) , and then the model robustness is tested (details in section 3.3). Fig. 1 shows a CT image (denoted by x) of a lung that was infected by COVID-19 and correctly classified as infected. After adding a small amount of noise δ to the image x, the noisy image x + δ is classified as uninfected. On the test set, although the model achieved ≥ 95% accuracy on original clean images, its accuracy dropped to zero on the noise level of 0.03. This non-robust model clearly cannot be used in real clinical applications. It may be argued that adversarial noises are created by algorithms and therefore may not exist in the real world unless some bad people want to hack the system to achieve personal gain at the expense of public health. However, on the COVID-19 dataset, we found that 2.75% of the noisy samples with uniform white noises on the noise level of 0.05, can cause the DNN model to make wrong classifications, which shows that white-noise-induced adversarial samples can exist. For medical applications, 2.75% is not a negligible number, and it is worth developing methods to improve DNN adversarial robustness. There are mainly two categories of adversarial attacks: white-box attack and black-box attack. For a white-box attack, the attacker knows everything about the DNN to be attacked. For a black-box attack, the attacker only can use the DNN as a black-box (i.e., send an input to the DNN and get an output from the DNN, not knowing anything else). From the perspective of defense, we should consider the worst-case scenario: white-box attack. People have explored many ideas to improve robustness, but many of them have been shown to be ineffective (Uesato et al., 2018; Tramer et al., 2020) . A general and effective strategy is adversarial training (Goodfellow et al., 2014; Madry et al., 2017; Miyato et al., 2018; Ding et al., 2019a) , and the basic idea is to use adversarial attack algorithms to generate noisy samples and add those samples to the training set, which basically is a special data augmentation strategy. Through adversarial training, the DNN model can learn from the noisy samples and become robust to noises. Adversarial training is straightforward but computationally expensive. Thus, one needs to make sure that the generated noisy samples can indeed help to improve robustness: samples with too much noise can harm performance while samples with too little noise may have no effect at all. Generative adversarial training has been directly applied to improve robustness; however, it can only defend against black-box attack (Wang & Yu, 2019) . In this paper, we propose a novel method, Increasing-Margin Adversarial (IMA) Training, to improve robustness of deep neural networks for classification tasks. Our method aims to increase margins of training samples by moving decision boundaries far away from the samples to improve robustness. We evaluated our method on six datasets with 100-PGD white-box attack and the results show that our proposed method can achieve a significant improvement in DNN robustness against adversarial noises. 

2. METHODOLOGY 2.1 ADVERSARIAL ATTACK AND NEURAL NETWORK ROBUSTNESS

To evaluate the robustness of different adversarial training methods, we use projected gradient descent (PGD) (Madry et al., 2017; Kurakin et al., 2016) to generate adversarial noises, which is widely used for method evaluation (Uesato et al., 2018; Tramer et al., 2020) . For the convenience of the reader, we briefly describe PGD: Let x denote an input sample and y be the true class label. Let J (x) denote the scalar objective function of PGD, which could be the cross-entropy loss function or other classification loss functions. Let δ denote the adversarial noise, and its magnitude is ε which is measured by the vector Lp norm of δ, i.e., ε = ||δ|| p , where p is inf or 2 usually. PGD adds noises to x iteratively: x (i) = clip x (i-1) + η • h J x (i-1) where η is step size, J (x) = ∂J ∂x , and i is iteration index. x (0) = x + ξ, where ξ is random noise with ||ξ|| p ≤ ε. || || p denotes the vector p-norm. The clip operation in Eq.(1) ensures that ||x (i) -x|| p ≤ ε (called ε-ball). If x is an image, the clip operation also ensures that pixel values stay within the feasible range (e.g. 0 to 1). If L-inf norm is used, h (J ) is the sign function; and if L2 norm is used, then h (J ) normalizes J by its L2 norm. The total adversarial noise is δ = x (N P GD ) -x, and N P GD is the number of iterations. η is usually set to α • ε/N P GD , and therefore the algorithm may sweep the ε-ball α times (α ≥ 1) within N P GD iterations. By adding



Figure 1: An example of clean and noisy images.

Figure 2: Left: case-0 in BPGD; Right: case-1 in BPGD. Please zoom in for better visualization.

