ROBUSTNESS FOR FREE: ADVERSARIALLY ROBUST ANOMALY DETECTION THROUGH DIFFUSION MODEL

Abstract

Deep learning-based anomaly detection models have achieved remarkably high accuracy on commonly used benchmark datasets. However, the robustness of those models may not be satisfactory due to the existence of adversarial examples, which pose significant threats to the practical deployment of deep anomaly detectors. To tackle this issue, we propose an adversarially robust anomaly detector based on the diffusion model. There are two things that make diffusion models a perfect match for our task: 1) the diffusion model itself is a reconstruction-based modeling method whose reconstruction error can serve as a natural indicator of the anomaly score; 2) previous studies have shown that diffusion models can help purify the data for better adversarial robustness. In this work, we highlight that our diffusion model based method gains the adversarial robustness for free: the diffusion model will act both as an anomaly detector and an adversarial defender, thus no extra adversarial training or data purification is needed as in standard robust image classification tasks. We also extend our proposed method for certified robustness to l 2 norm bounded perturbations. Through extensive experiments, we show that our proposed method exhibits outstanding (certified) adversarial robustness while also maintaining equally strong anomaly detection performance on par with the state-of-the-art anomaly detectors on benchmark datasets.

1. INTRODUCTION

Anomaly detection aims at identifying data instances that are inconsistent with the majority of data, which has been widely applied in various domains such as industrial defect detection (Bergmann et al., 2019) , IT infrastructure management (Sun et al., 2021) , medical diagnostics (Fernando et al., 2021) , and cyber security (Feng & Tian, 2021) . Recently, deep learning (DL) based anomaly detection methods have achieved remarkable improvement over traditional anomaly detection strategies (Ruff et al., 2021; Pang et al., 2021) . DL-based methods take the advantage of neural networks to estimate the anomaly score of a data instance which reflects how likely it is an anomaly. One common practice defines anomaly score as the reconstruction error between the original data instance and the recovered one decoded by a symmetric neural network model (e.g., autoencoder) (Hawkins et al., 2002; Chen et al., 2017) . The insight that the reconstruction error can serve as anomaly score is that the model trained on normal data usually cannot reproduce anomalous instances (Bergmann et al., 2021) , thus a high reconstruction error for a data instance indicates a larger probability of it being an anomaly. Though DL-based anomaly detection methods have achieved remarkably high accuracy on commonly used benchmark datasets (Yu et al., 2021; Lee et al., 2022a) , the robustness of the detection models is still unsatisfactory due to the existence of adversarial examples (Goodge et al., 2020; Lo et al., 2022) , which poses significant threats to the practical deployment of deep anomaly detectors. Specifically, an imperceptible perturbation on the input data could cause a well-trained anomaly detector to return incorrect detection results. Figure 1 shows a simple case of how such an adversarial attack can disrupt OCR-GAN (Liang et al., 2022) which is a recent deep image anomaly detector. We observe that an anomalous "hazelnut" in the upper row, when added with an invisible noise, could cheat the detector to output a low anomaly score; while the normal "hazelnut" in the lower row can also be perturbed to make the detector raise a false alarm with a high anomaly score. In fact, such a robustness issue is not unique to OCR-GAN, but a common problem for various state-of-the-art deep anomaly detection models (as will be seen in our later experiments in Section 3). To tackle this issue, we explore the possibility of using the diffusion model to achieve adversarially robust anomaly detection. As a powerful class of generative models, diffusion models (Ho et al., 2020; Nichol & Dhariwal, 2021) are capable of generating samples with high quality, beating GANs in image synthesis (Dhariwal & Nichol, 2021) . Specifically, diffusion models first construct a diffusion process to convert the data into standard Gaussian noise by gradually adding random noise, and then learn the generative process to reverse the diffusion process and generate samples from the noise by denoising one step at a time. There are two aspects about diffusion models that make them a perfect match for building an adversarially robust anomaly detector: 1) anomaly detection capability, as the diffusion model itself is a reconstructionbased modeling method whose reconstruction error can serve as a natural indicator of the anomaly score. A diffusion model trained on normal data ideally can reconstruct anomalies as normal ones through the diffusion and reverse generative process, thus bringing high reconstruction scores for anomalies compared with normal instances; 2) adversarial robustness, as previous studies have shown that diffusion models can be used as a data purifier to mitigate adversarial noises for better robustness (Nie et al., 2022) in supervised learning tasks, which suggests its potential in defending adversarial examples in the anomaly detection task. Based on the nice properties of diffusion models, we propose a novel adversarially robust anomaly detection method, inside which the diffusion model acts both as an anomaly detector and an adversarial defender. The introduction of the diffusion model enables us to gain adversarial robustness for free, as no extra adversarial training or data purification is needed. Note that our design is fundamentally different from the purification-based adversarial robust models in standard image classification tasks (Nie et al., 2022) where an extra external purifier (e.g., diffusion model) is needed before the actual classifier for robust classification, which is not needed in our designfoot_0 . We summarize our contributions as follows: • We build a unified adversarial attack framework for various kinds of anomaly detectors to facilitate the adversarial robustness study in the anomaly detection domain, through which we systematically evaluate the adversarial robustness of state-of-the-art deep anomaly detection models. • We propose an anomaly detection method based on the diffusion model, which gains adversarial robustness for free: the diffusion model acts both as an anomaly detector and an adversarial defender, without extra need for adversarial training or data purification as in standard robust image classification tasks. We also extend our method for certified robustness to l 2 norm perturbations through randomized smoothing which provides additional robustness guarantees. • We conduct extensive experiments and show that our method exhibits outstanding (certified) adversarial robustness, while also maintaining equally strong anomaly detection performance on par with the state-of-the-art anomaly detectors on benchmark datasets (Bergmann et al., 2019) .

2. RELATED WORK

Anomaly Detection Methods. Existing anomaly detection methods can be roughly categorized into two kinds: reconstruction-based and feature-based. One commonly used reconstruction-based approach for anomaly detection is to train the autoencoder and use the l p norm distance between input and its reconstruction as the anomaly score (Hawkins et al., 2002; Chen et al., 2017; Zhou & Paffenroth, 2017 ). Bergmann et al. (2018) replace l p distance with SSIM (Wang et al., 2004) to



In fact, the strategy of using the diffusion model as a purifier before another anomaly detector will not work, as the purifier will break the anomaly signals.



Figure 1: An adversarial example on OCR-GAN. δ refers to invisible perturbations. "GT" denotes "Ground Truth".

