INCREASING-MARGIN ADVERSARIAL (IMA) TRAIN-ING TO IMPROVE ADVERSARIAL ROBUSTNESS OF NEURAL NETWORKS

Abstract

Deep neural networks (DNNs), including convolutional neural networks, are known to be vulnerable to adversarial attacks, which may lead to disastrous consequences in life-critical applications. Adversarial samples are usually generated by attack algorithms and can also be induced by white noises, and therefore the threats are real. In this study, we propose a novel training method, named Increasing Margin Adversarial (IMA) Training, to improve DNN robustness against adversarial noises. During training, the IMA method increases the margins of training samples by moving the decision boundaries of the DNN model far away from the training samples to improve robustness. The IMA method is evaluated on six publicly available datasets (including a COVID-19 CT image dataset) under strong 100-PGD white-box adversarial attacks, and the results show that the proposed method significantly improved classification accuracy on noisy data while keeping a relatively high accuracy on clean data. We hope our approach may facilitate the development of robust DNN applications, especially for COVID-19 diagnosis using CT images.

1. INTRODUCTION

Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have become the first choice for automated image analysis due to its superior performance. However, recent studies have shown that DNNs are not robust to a special type of noise, called adversarial noise. Adversarial noise was first discovered by (Szegedy et al., 2013) and then explained by (Goodfellow et al., 2014) . Adversarial noises can significantly affect robustness of DNNs for a wide range of image classification applications (Akhtar & Mian, 2018) , such as handwritten digits recognition (Graese et al., 2016) , human face recognition (Mirjalili & Ross, 2017) , and even traffic sign detection (Eykholt et al., 2018) . DNN-based image segmentation can also be affected by adversarial noises because image segmentation is often realized by pixel classification. The COVID-19 pandemic has infected millions of people and caused the death of about 1 million people as of today (WHO, 2020) . A large scale study in China shows that CT had higher sensitivity for the diagnosis of COVID-19 as compared with initial reverse-transcription polymerase chain reaction (RT-PCR) from swab samples (Ai et al., 2020) . As reviewed in (Shi et al., 2020) , many DNN models for COVID-19 diagnosis from CT images have been developed and achieved very high classification accuracy. However, none of these studies (Shi et al., 2020) considered DNN robustness against adversarial noises. We modified a Resnet-18 model (He et al., 2016) and trained it on a public COVID-19 CT image dataset (Soares et al., 2020) , and then the model robustness is tested (details in section 3.3). Fig. 1 shows a CT image (denoted by x) of a lung that was infected by COVID-19 and correctly classified as infected. After adding a small amount of noise δ to the image x, the noisy image x + δ is classified as uninfected. On the test set, although the model achieved ≥ 95% accuracy on original clean images, its accuracy dropped to zero on the noise level of 0.03. This non-robust model clearly cannot be used in real clinical applications. It may be argued that adversarial noises are created by algorithms and therefore may not exist in the real world unless some bad people want to hack the system to achieve personal gain at the expense of public health. However, on the COVID-19 dataset, we found that 2.75% of the noisy samples with uniform white noises on the noise level of 0.05, can cause the DNN model to make wrong classifications, which

