ROBUSTNESS FOR FREE: ADVERSARIALLY ROBUST ANOMALY DETECTION THROUGH DIFFUSION MODEL

Abstract

Deep learning-based anomaly detection models have achieved remarkably high accuracy on commonly used benchmark datasets. However, the robustness of those models may not be satisfactory due to the existence of adversarial examples, which pose significant threats to the practical deployment of deep anomaly detectors. To tackle this issue, we propose an adversarially robust anomaly detector based on the diffusion model. There are two things that make diffusion models a perfect match for our task: 1) the diffusion model itself is a reconstruction-based modeling method whose reconstruction error can serve as a natural indicator of the anomaly score; 2) previous studies have shown that diffusion models can help purify the data for better adversarial robustness. In this work, we highlight that our diffusion model based method gains the adversarial robustness for free: the diffusion model will act both as an anomaly detector and an adversarial defender, thus no extra adversarial training or data purification is needed as in standard robust image classification tasks. We also extend our proposed method for certified robustness to l 2 norm bounded perturbations. Through extensive experiments, we show that our proposed method exhibits outstanding (certified) adversarial robustness while also maintaining equally strong anomaly detection performance on par with the state-of-the-art anomaly detectors on benchmark datasets.

1. INTRODUCTION

Anomaly detection aims at identifying data instances that are inconsistent with the majority of data, which has been widely applied in various domains such as industrial defect detection (Bergmann et al., 2019), IT infrastructure management (Sun et al., 2021), medical diagnostics (Fernando et al., 2021), and cyber security (Feng & Tian, 2021). Recently, deep learning (DL) based anomaly detection methods have achieved remarkable improvement over traditional anomaly detection strategies (Ruff et al., 2021; Pang et al., 2021). DL-based methods take the advantage of neural networks to estimate the anomaly score of a data instance which reflects how likely it is an anomaly. One common practice defines anomaly score as the reconstruction error between the original data instance and the recovered one decoded by a symmetric neural network model (e.g., autoencoder) (Hawkins et al., 2002; Chen et al., 2017). The insight that the reconstruction error can serve as anomaly score is that the model trained on normal data usually cannot reproduce anomalous instances (Bergmann et al., 2021), thus a high reconstruction error for a data instance indicates a larger probability of it being an anomaly. Though DL-based anomaly detection methods have achieved remarkably high accuracy on commonly used benchmark datasets (Yu et al., 2021; Lee et al., 2022a), the robustness of the detection models is still unsatisfactory due to the existence of adversarial examples (Goodge et al., 2020; Lo et al., 2022), which poses significant threats to the practical deployment of deep anomaly detectors. Specifically, an imperceptible perturbation on the input data could cause a well-trained anomaly detector to return incorrect detection results. Figure 1 shows a simple case of how such an adversarial attack can disrupt OCR-GAN (Liang et al., 2022) which is a recent deep image anomaly detector. We observe that an anomalous "hazelnut" in the upper row, when added with an invisible noise, could cheat the detector to output a low anomaly score; while the normal "hazelnut" in the lower row can also be perturbed to make the detector raise a false alarm with a high anomaly score. In fact, such a robustness issue is not unique to OCR-GAN, but a common problem for various state-of-the-art deep anomaly detection models (as will be seen in our later experiments in Section 3). To tackle this issue, we explore the possibility of using the diffusion model to achieve adversarially robust anomaly detection. As a powerful class of generative models, diffusion models (Ho et al., 2020; Nichol & Dhariwal, 2021) are capable of generating samples with high quality, beating GANs in image synthesis (Dhariwal & Nichol, 2021). Specifically, diffusion models first construct a diffusion process to convert the data into standard Gaussian noise by gradually adding random noise, and then learn the generative process to reverse the diffusion process and generate samples from the noise by denoising one step at a time. There are two aspects about diffusion models that make them a perfect match for building an adversarially robust anomaly detector: 1) anomaly detection capability, as the diffusion model itself is a reconstructionbased modeling method whose reconstruction error can serve as a natural indicator of the anomaly score. A diffusion model trained on normal data ideally can reconstruct anomalies as normal ones through the diffusion and reverse generative process, thus bringing high reconstruction scores for anomalies compared with normal instances; 2) adversarial robustness, as previous studies have shown that diffusion models can be used as a data purifier to mitigate adversarial noises for better robustness (Nie et al., 2022) in supervised learning tasks, which suggests its potential in defending adversarial examples in the anomaly detection task. Based on the nice properties of diffusion models, we propose a novel adversarially robust anomaly detection method, inside which the diffusion model acts both as an anomaly detector and an adversarial defender. The introduction of the diffusion model enables us to gain adversarial robustness for free, as no extra adversarial training or data purification is needed. Note that our design is fundamentally different from the purification-based adversarial robust models in standard image classification tasks (Nie et al., 2022) where an extra external purifier (e.g., diffusion model) is needed before the actual classifier for robust classification, which is not needed in our designfoot_0 . We summarize our contributions as follows: • We build a unified adversarial attack framework for various kinds of anomaly detectors to facilitate the adversarial robustness study in the anomaly detection domain, through which we systematically evaluate the adversarial robustness of state-of-the-art deep anomaly detection models. • We propose an anomaly detection method based on the diffusion model, which gains adversarial robustness for free: the diffusion model acts both as an anomaly detector and an adversarial defender, without extra need for adversarial training or data purification as in standard robust image classification tasks. We also extend our method for certified robustness to l 2 norm perturbations through randomized smoothing which provides additional robustness guarantees. • We conduct extensive experiments and show that our method exhibits outstanding (certified) adversarial robustness, while also maintaining equally strong anomaly detection performance on par with the state-of-the-art anomaly detectors on benchmark datasets (Bergmann et al., 2019).

2. RELATED WORK

Anomaly Detection Methods. Existing anomaly detection methods can be roughly categorized into two kinds: reconstruction-based and feature-based. One commonly used reconstruction-based approach for anomaly detection is to train the autoencoder and use the l p norm distance between input and its reconstruction as the anomaly score (Hawkins et al., 2002; Chen et al., 2017; Zhou & Paffenroth, 2017). Bergmann et al. (2018) replace l p distance with SSIM (Wang et al., 2004) to have a better measure for perceptual similarity. Another more advanced branch of reconstructionbased models combines autoencoder with GAN, where the generator of the GAN is implemented using autoencoder (Hou et al., 2021; Liang et al., 2022; Akc ¸ay et al., 2019). These methods additionally incorporate the anomaly score with the similarity between the features of the input and the reconstructed images extracted from the discriminator to boost performance on categories that are difficult to reconstruct accurately. Feature-based methods use pre-trained Resnet and vision transformer (Yu et al., 2021), or pre-trained neural networks with feature adaptation (Lee et al., 2022a) to extract discriminative features for normal images, and estimate distribution of these normal features by Flow-based model (Gudovskiy et al., 2022; Rudolph et al., 2022), KNN (Reiss et al., 2021), or Gaussian distribution modeling (Li et al., 2021). These methods calculate the anomaly score using the distance from the features of test images to the established distribution for features of normal images. Adversarial Attacks and Defenses for Anomaly Detectors. To the best of our knowledge, existing attack and defense strategies for anomaly detectors only focus on autoencoder-based models. Diffusion Models. As a class of powerful generative models, diffusion models have attracted the most recent attention due to their high sample quality and strong mode coverage (Sohl-Dickstein et al., 2015; Ho et al., 2020; Nichol & Dhariwal, 2021). Recently, Nie et al. (2022) used diffusion models to purify adversarial perturbations for downstream robust classification, and present empirically strong robustness. Wolleb et al. (2022) adopt deterministic DDIM (Song et al., 2020) for supervised anomaly localization. Wyatt et al. (2022) solve the same task under an unsupervised scenario using DDPM (Ho et al., 2020) with partial diffusion strategy and simplex noise. Note that they are pixel-level anomaly detection methods which are not directly comparable to our image-level anomaly detection. Moreover, diffusion models have not been studied to improve the adversarial robustness of anomaly detectors.

3. BUILDING ADVERSARIAL ATTACKS FOR ANOMALY DETECTORS THROUGH A UNIFIED FRAMEWORK

To facilitate the adversarial robustness study on various kinds of anomaly detectors, we first build a unified adversarial attack framework in the context of anomaly detection. We consider the adversarial perturbations to be imperceptible, i.e, their existence will not flip the ground truth class of the image (label-preserving). The general goal of the unified attack framework is to make detectors return incorrect detection results by reducing anomaly scores for anomalous samples and increasing anomaly scores for normal samples. In particular, we take commonly used Projected Gradient Descent (PGD) attack (Madry et al., 2018) as an example to illustrate our attack formulation. PGD Attack on Anomaly Detector. Consider a sample x ∈ R d from the test dataset with label y ∈ {-1, 1} (where "-1" denotes the anomalous class and "1" indicates the normal class), and a welltrained anomaly detector A θ : R d → R that computes an anomaly score for each data sample. We define the optimization objective of PGD attack on the anomaly detector as: arg max x L θ (x, y) = yA θ (x), where y guides the direction of perturbing x to increase or decrease its anomaly score. Depending on the perturbation constraint, adversarial examples can be generated by l ∞ -norm or l 2 -norm bounded PGD, respectively as: x n+1 = P l∞ x,ϵ {x n + α • sgn(∇ xn L θ (x n , y)} (3.1) x n+1 = P l2 x,ϵ {x n + α ∇ xn L θ (x n , y) ∥∇ xn L θ (x n , y)∥ } (3.2) where α is the step size, n ∈ [0, N -1] is the current step of in total N iterations, and x 0 = x. P lp x,ϵ {•} denotes the projection on x n+1 such that ∥x n+1 -x∥ p ≤ ϵ. The final adversarial example is generated by x adv = x N . This attacking strategy encapsulates previous works on adversarial examples for anomaly detectors, where only autoencoder-based models were considered (Lo et al., 2022; Goodge et al., 2020). The anomaly score can be specified as A θ (x) = ∥D(E(x)) -x∥ to accommodate to their scenarios, where D denotes the decoder and E corresponds to the encoder. Robustness Evaluation on Existing Anomaly Detectors. Based on the unified PGD attack, we systematically evaluate the adversarial robustness of the state-of-the-art detectors with various model architectures. Table 1 demonstrates the efficacy of the attack in disclosing the vulnerability of existing anomaly detectors: the AUC scores of these advanced anomaly detectors drop to as low as 0% under adversarial perturbations with l ∞ norm less than 2/255 on Toothbrush dataset from benchmark MVTec AD (Bergmann et al., 2019). This suggests that current anomaly detectors suffer from fragile robustness on adversarial data, which urges us to build adversarially robust anomaly detectors that can achieve excellent detection performance and strong adversarial robustness simultaneously. Table 1 : Standard AUC and robust AUC against l ∞ -PGD (ϵ = 2/255) attacks on Toothbrush dataset from benchmark MVTec AD, obtained by various anomaly detection SOTAs.

Method

Standard AUC Robust AUC OCR-GAN (Liang et al., 2022) 96.7 0 SPADE (Cohen & Hoshen, 2020) 88.9 0 CFlow (Gudovskiy et al., 2022) 85.3 0 FastFlow (Yu et al., 2021) 94.7 0 CFA (Lee et al., 2022a) 100 0

4. ADVERSARIALLY ROBUST ANOMALY DETECTION

Before we introduce our diffusion-based robust anomaly detection method, we first give a brief review on diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020; Nichol & Dhariwal, 2021).

4.1. PRELIMINARIES ON DIFFUSION MODELS

DDPM (Ho et al., 2020) defines a T steps diffusion process q(x 1:T |x 0 ) := T t=1 q(x t |x t-1 ) parameterized by a well behaved variance schedule β 1 , . . . , β T as q(x t |x t-1 ) := N (x t ; x t-1 √ 1 -β t , β t I), which iteratively transforms an unknown data distribution q(x 0 ) to standard Gaussian q(x T ) = N (0, I). The generative process p θ (x 0:T ) := p(x T ) T t=1 p θ (x t-1 |x t ) is learned to approximate each q(x t-1 |x t ) using neural networks as follows: p θ (x t-1 |x t ) := N (x t-1 ; µ θ (x t , t), Σ θ (x t , t)) (4.1) A noticeable property of the diffusion process is that it allows directly sampling x t at an arbitrary timestep t given x 0 . Using the notation α t := 1 -β t and α t := t s=1 α s , we have x t = √ α t x 0 + √ 1 -α t ϵ, ϵ ∈ N (0, I) (4.2) This property makes it possible to quickly sample x t . For training the diffusion model, motivated by the connection to generative score matching (Song & Ermon, 2019; 2020), Ho et al. (2020) show that directly predicting the noise term ϵ results in higher sample quality, especially when combined with a simplified objective without learning signals for Σ θ (x t , t): L simple = E t,x0,ϵ [∥ϵ -ϵ θ (x t , t)∥]. (4.3) In this paper, we follow Nichol & Dhariwal (2021) and train the diffusion model using a hybrid loss for better sample quality with fewer generation steps. More details can be found in Appendix A.

4.2. FREERAD: ADVERSARIALLY ROBUST ANOMALY DETECTION FOR FREE

Based on the diffusion model, we now introduce our proposed Robust Anomaly Detection for Free method, termed as FreeRAD. FreeRAD consists of two parts: robust reconstruction, which aims to reconstruct the normal input in a robust manner, and anomaly score calculation, which aims to calculate the final anomaly score based on the robust reconstruction error. Robust Reconstruction: Robust reconstruction is the first step for our FreeRAD method and is the key to achieving adversarially robust anomaly detection. Since the diffusion model training procedure is essentially predicting noise added in the diffusion process and then denoising, its reconstruction error can serve as a natural indicator of the anomaly score. Specifically, as shown in Figure 2 , for normal data, the reconstruction is nearly identical to the input. For anomaly data, the diffusion model (after adding noise and denoising) could "repair" the anomaly regions, thus obtaining high reconstruction error, which could be easily detected as anomalies. Now let's consider adversarial robustness in anomaly detection. Note that one basic assumption of adversarial examples is that the perturbation is usually imperceivable, e.g., with small L p norms. In the diffusion process, if we add sufficiently large Gaussian noise to the input data, such adversarial perturbations would be dominated by the added Gaussian and thus be invalid. After the reverse diffusion (denoising) process, the reconstruction could still recover it to normal and thus obtain a high reconstruction error as shown in Figure 2 . This suggests that FreeRAD is indeed robust to adversarial data perturbations. Algorithm 1 summarizes the main steps for robust reconstruction. Specifically, to perform adversarially robust reconstruction, we first choose the diffusion steps k and apply Eq. 4.2 on x to obtain diffused images x k . Unlike the diffusion model training process, here we do not need to diffuse the data into complete Gaussian noise (a large k). Instead, we pick a moderate number of k for noise injection and start denoising thereafter, similar to Nie et al. (2022). Note that k should be chosen such that the amount of Gaussian noise is dominating the adversarial perturbations and anomaly signals while the high-level features of the input data are still preserved for reconstruction. In terms of the denoising process, a typical full-shot setting uses the full k denoising steps: in each step t, we iteratively predict the true input x given the current diffused data x t , termed x 0 , then sampling the new iterate x t-1 according to the current prediction x 0 and the current diffused data x t . Algorithm 1 Full-shot Robust Reconstruction in FreeRAD Input: Test images: x, diffusion steps: k(k ≤ T ) Output: Reconstructions of x: x 1: x 0 = x 2: ϵ ∼ N (0, I) 3: x k = √ α k x 0 + √ 1 -α k ϵ 4: for t = k, . . . , 1 do ▷ full-shot denoising 5: x 0 = 1 √ αt (x t - √ 1 -α t ϵ θ (x t , t)) 6: if t > 1 then 7: z ∼ N (0, I) 8: x t-1 = √ αt-1βt 1-αt x 0 + √ αt(1-αt-1) 1-αt x t + Σ θ (x t , t))z 9: end if 10: end for 11: x = x 0 Anomaly Score Calculation: To calculate the final anomaly score in a robust and stable manner, we first calculate the Multiscale Reconstruction Error Map (denoted as Err ms ), which considers both pixel-wise and patch-wise reconstruction errors. Specifically, for each scale l in L = {1, 1 2 , 1 4 , 1 8 }, we first calculate the error map Err(x, x) l between the downsampled input x l and the downsampled reconstruction c,:,:] where the square operator is abused here for elementwise square operation, then unsampled to the original resolution. The final Err ms is obtained by averaging each scale's error map and applying a mean filter for better stability similar to Zavrtanik et al. (2021): Err ms (x, x) = ( 1N L l∈L Err(x, x) l ) * f s×s where f s×s is the mean filter of size s × s, * is the convolution operation. Similar to Pirnay & Chai (2022), we take the pixel-wise maximum of the absolute deviation of the Err ms (x, x) on normal training data as the scalar anomaly score. Due to space limits, we leave the complete anomaly score calculation algorithm in Appendix B.2. x l with 1 C C c=1 (x l -x l ) 2 [ One-shot Denoising: One major problem with full-shot denoising (Algorithm 1) is that the denoising procedure is time consuming, making it unacceptable for real-time anomaly detection in critical situations (Sun et al., 2021). Moreover, extra reconstruction error can also be introduced due to the multiple sampling steps in the full-shot denoising process. To overcome these challenges, we investigate the arbitrary-shot denoising process allowing fewer denoising steps, with the details shown in Appendix B.1. Based on our results (see Appendix D.2) we observe that one-shot denoising (reducing the for loop in Line 4 of Algorithm 1 into one iteration) is sufficient to produce an accurate reconstruction result with O(1) inference-time efficiency. Under such cases, the robust reconstruction in FreeRAD reduces to the simple 3-step version shown in Algorithm 2. Such a one-shot idea has also been adopted in Carlini et al. (2022) for robust image classification. By default, we use one-shot robust reconstruction for all experiments in Section 5.

Algorithm 2 One-shot Robust Reconstruction in FreeRAD

Input: Test images: x, diffusion step: k(k ≤ T ) Output: Reconstructions of x: x 1: ϵ ∼ N (0, I) 2: x k = √ α k x + √ 1 -α k ϵ 3: x = 1 √ α k (x k - √ 1 -α k ϵ θ (x k , k)) ▷ one-shot denoising process

5. EXPERIMENTS

We compare our proposed FreeRAD with five state-of-the-art anomaly detectors on both clean input and adversarially perturbed input. FreeRAD shows a competitive robustness performance compared with defense-enabled anomaly detector baselines, and maintains robust even under stronger adaptive attacks. Finally, we further extend FreeRAD for certified robustness to l 2 norm perturbations.

5.1. EXPERIMENTAL SETTINGS

Dataset and Model Implementation. We perform experiments on widely used MVTec Anomaly Detection benchmark (Bergmann et al., 2019). MVTec AD comprises 15 sub-datasets with a total of 5354 high-resolution images from the real world. Among these sub-datasets, the category for 10 of them are about specific objects (e.g., toothbrush, transistor, hazelnut), and the other 5 sub-datasets are about specific textures (e.g., leather, wood). We resize all images to 256×256 resolution in our experiments. We implement the diffusion model based on Nichol & Dhariwal (2021) using U-Net backbone (Ronneberger et al., 2015). We set the total iteration step as T = 1000 for all experiments. During inference stage, we choose the diffusion step k ∈ {50, 100, 200, 300} for different categories (see Appendix D.1 for sensitivity test). More hyperparameters are described in Appendix C.1. Adversarial Attacks. We adopt commonly used PGD attack (Madry et al., 2018) to compare with the state-of-the-art anomaly detection models and defense-enabled anomaly detectors. Additionally, we also consider the BPDA and EOT attack (Athalye et al., 2018a) for better robustness evaluations on defense-enabled anomaly detectors. We set the attack strength ϵ = 2/255 for l ∞ -norm attacks and ϵ = 0.2 for l 2 -norm attacks to ensure imperceptible attack perturbations. Evaluation Metric. We use the widely-adopted AUC (area under the receiver operating characteristic curve) to evaluate the performance of anomaly detection. Specifically, we consider standard AUC and robust AUC. The standard AUC evaluates the performance on the clean test data, while the robust AUC evaluates the performance on the adversarially perturbed test examples.

5.2. COMPARISON WITH THE STATE-OF-THE-ART ANOMALY DETECTORS

We compare our method FreeRAD with five state-of-the-art methods for image anomaly detection: SPADE (Cohen & Hoshen, 2020), OCR-GAN (Liang et al., 2022), CFlow (Gudovskiy et al., 2022), FastFlow (Yu et al., 2021), and CFA (Lee et al., 2022a), against the l ∞ -PGD and l 2 -PGD attacks. 0 (92.8) 0 (98.6) 0 (99.7) 0 (99.4) 70.5 (82.7) Grid 0 (97) 0 (47.3) 0 (96.6) 0 (100) 0 (99.6) 99.8 (100) Leather 0 (90.7) 0 (95.4) 0 (100) 6.6 (100) 2.0 (100) 97.8 (100) Tile 0 (95.6) 0 (96.5) 0 (99.6) 1.3 (100) 0.1 (99.3) 93.9 (99.2) Wood 0 (95.4) 0 (95.8) 0 (99.7) 0 (99.9) 0 (99.7) 95.2 (98.3) Object Bottle 0 (97.7) 0 (97.2) 0 (100) 0 (100) 0.1 (100) 88.1 (100) Cable 0 (71.5) 0 84.8) 0 (98.7) 0 (67.4) 0.8 (99.8) 38.9 (79.5) Capsule 0 (80.4) 0 (89.7) 0 (93.7) 8.9 (99.2) 0 (97) 53.5 (93.9) Hazelnut 0 (97.7) 0 (88.1) 0 (99.9) 0 (99.5) 0.1 (100) 91.5 (97.5) Metal Nut 0 (82.6) 0 (71) 0 (100) 0 (98.2) 0 (100) 85.9 (93.5) Pill 0 (80.8) 0 (80.1) 0 (93.2) 0 (97.8) 0 (98) 39 (97.2) Screw 0 (99.4) 0 (66.7) 0 (79) 6.6 (91.1) 0 (95.5) 87.6 (99.3 ) Toothbrush 0 (96.7) 0 (88.9) 0 (85.3) 0 (94.7) 0 (100) 95.8 (100) Transistor 0 (75) 0 (90.3) 0 (98.3) 0 (99.4 ) 0 (100) 74.5 (93.7) Zipper 0 (80.4) 0 (96.6) 0 (97.5) 17.5 (99.6) 0 (99.7) 96.2 (100) Average 0 (87.8) 0 (85.4) 0 (96.0) 2.3 (98.5) 0.2 (99.2) 80.5 (95.7) (76.6) 27.1 (92.8) 13.5 (98.6) 18 (99.7) 65.1 (99.4) 76.6 (82.7) Grid 0 (97) 4.1 (47.3) 0 (96.6) 0 (100) 50 (99.6) 99.9 (100) Leather 0 (90.7) 16.5 (95.4) 9.4 (100) 35.4 (100) 77.6 (100) 99.9 (100) Tile 7.4 (95.6) 45.9 (96.5) 7.8 (99.6) 30.5 (100) 72.4 (99.3) 93.1 (99.2) Wood 0 (95.4) 11 (95.8) 18.1 (99.7) 22 (99.9) 61.8 (99.7) 95.5 (98.3) Object Bottle 0.1 (97.7) 0 (97.2) 48.5 (100) 2.2 (100) 74.6 (100) 95.5 (100) Cable 3.2 (71.5) 0.9 84.8) 19.2 (98.7) 0.3 (67.4) 69.5 (99.8) 65.7 (79.5) Capsule 0 (80.4) 0 (89.7) 1.6 (93.7) 13.8 (99.2) 1.7 (97) 68.1 (93.9) Hazelnut 18.5 (97.7) 0 (88.1) 4.9 (99.9) 0.8 (99.5) 47.2 (100) 94.3 (97.5) Metal Nut 2.8 (82.6) 0 (71) 4.4 (100) 1.7 (98.2) 14.3 (100) 87.9 (93.5) Pill 2.7 (80.8) 0.4 (80.1) 0 (93.2) 0 (97.8) 3.3 (98) 80.3 (97.2) Screw 0 (99.4) 0 (66.7) 0 (79) 6.6 (91.1) 0 (95.5) 91.8 (99.3 ) Toothbrush 0 (96.7) 0 (88.9) 18.3 (85.3) 3.6 (94.7) 38.3 (100) 99.4 (100) Transistor 1.7 (75) 4.8 (90.3) 8.8 (98.3) 0.4 (99.4 ) 53.7 (100) 84.3 (93.7) Zipper 0 (80.4) 3.2 (96.6) 0 (97.5) 19.3 (99.6) 29.2 (99.7) 99.2 (100) Average 3.7 (87.8) 7.59 (85.4) 10.3 (96.0) 9.9 (98.5) 43.9 (99.2) 88.8 (95.7) Table 2 presents the robustness performance against l ∞ -PGD attacks (ϵ = 2/255) on MVTec AD dataset. Table 3 shows the robustness performance against l 2 -PGD attacks (ϵ = 0.2). From Table 2 we observe that our method largely outperforms previous methods regarding robust AUC against l ∞ -PGD attacks (ϵ = 2/255). Specifically, our method improves robust AUC on all 15 categories of MVTec AD and obtains the average robust AUC 80.5% with the improvement of at least 78.2%. In Table 3 , we can see that our method improves average robust AUC against l 2 -PGD attacks (ϵ = 0.2) by 44.9% and achieves 88.8% robust AUC. In the meantime, we can observe that in terms of anomaly detection performance on clean data, the average standard AUC obtained by our method is on par with the state-of-the-art methods such as CFlow (Gudovskiy et al., 2022), FastFlow (Yu et al., 2021), and CFA (Lee et al., 2022a), while beating OCR-GAN (Liang et al., 2022) and SPADE Cohen & Hoshen (2020). These results clearly demonstrate the effectiveness of our proposed method in defending against l ∞ -PGD and l 2 -PGD attacks, while also maintaining strong anomaly detection performance on benchmark datasets.

5.3. COMPARISON WITH DEFENSE-ENABLED ANOMALY DETECTORS

In this section, We compare our method FreeRAD with APAE (Goodge et al., 2020) and PLS (Lo et al., 2022), two defense-enabled anomaly detection methods. We perform the same PGD attacks as in Section 5.2. Additionally, since APAE has an optimization loop in their defense process which is hard to backpropagate, we further adopt the BPDA attack (Athalye et al., 2018a) designed specifically for obfuscated gradient defenses to evaluate both our FreeRAD and APAE for a fair comparison. Table 4 shows the comparison between our FreeRAD method and defense-enabled baselines against PGD and BPDA attacks. We can clearly observe that FreeRAD outperforms them under all attacks, with a substantial improvement of 27.6% ∼ 58.3% regarding the average robust AUC over all categories of MVTec AD. Furthermore, our method even largely improves the average standard AUC by 31.0% on clean data compared with APAE and PLS. 

5.4. DEFENDING AGAINST STRONGER ADAPTIVE ATTACKS

So far we have shown that FreeRAD is indeed robust to PGD and BPDA attacks in Section 5.2 and 5.3. To further verify its robustness in more challenging settings, we test FreeRAD against adaptive attacks where the attacker is assumed to already know about our diffusion model-based anomaly detection method and design attacks against our defense adaptively. Since the diffusion process in our method introduces extra stochasticity, which plays an important role in defending against adversarial perturbations, we consider applying EOT to PGD, which is designed for circumventing randomized defenses. In particular, EOT calculates the expected gradients over the randomization as a proxy for the true gradients of the inference model using Monte Carlo estimation (Athalye et al., 2018b;a; Lee et al., 2022b). We set the number of samples n = 20 for the EOT attacks following Nie et al. (2022). Table 5 : Robust AUC against l ∞ -PGD, l ∞ -EOT-PGD (ϵ = 2/255, EOT=20), and l 2 -PGD, l 2 -EOT-PGD attacks (ϵ = 0.2, EOT=20) on Bottle, Grid, Toothbrush, Wood from MVTec AD. We also show the difference between the results of PGD and EOT-PGD attacks. Table 5 shows the robust AUC against EOT-PGD attacks and the difference between the results of standard PGD attacks and EOT-PGD attacks on Bottle, Grid, Toothbrush, Wood categories of MVTec AD. We observe that the adversarial robustness is not affected too much by EOT. Specifically, the average robust AUC slightly drops 4.5% and 0.9% compared against standard l ∞ -PGD and l 2 -PGD attacks, respectively. These results suggest that our method has empirically strong robustness against adaptive attacks with EOT. Since other baselines use deterministic inference models, it is unnecessary to apply EOT to evaluate their adversarial robustness.

5.5. EXTENSION: CERTIFIED ADVERSARIAL ROBUSTNESS

In this section, we apply randomized smoothing (Cohen et al., 2019) to our diffusion-based anomaly detector and construct a new "smoothed" detector for certified robustness. Given a well-trained FreeRAD detector A θ (•) that outputs the anomaly score, we can construct a binary anomaly classi-fier with any defined threshold h: f (x) = normal, if A θ (x) ≤ h anomaly, otherwise (5.1) Then we can make predictions by constructing a Gaussian smoothed FreeRAD and compare with h. The smoothed FreeRAD enjoys provable robustness, which is summarized in the following theorem: Theorem 5.1. [Smoothed FreeRAD] Given a well-trained FreeRAD detector A θ (x), for any given threshold h and δ ∼ N (0, σ 2 I), if it satisfies P[A θ (x + δ) > h] ≥ p anomaly (h) > 1/2, then E δ [A θ (x + δ)] > h for all ||δ|| 2 < R(h) where R(h) = σΦ -1 (p anomaly (h)). On the other hand, if it satisfies P[A θ (x + δ) < h] ≥ p normal (h) > 1/2, then E δ [A θ (x + δ)] < h for all ||δ|| 2 < R(h) where R(h) = σΦ -1 (p normal (h)). Theorem 5.1 can be used to certify the robustness of a sample x given any threshold h. The estimation of p normal (h) and p anomaly (h) can be done using Monte Carlo sampling similar to Cohen et al. (2019). However, the obtained certified radius is highly related to the threshold h. Thus the certified accuracy metric cannot fully represent the quality of the anomaly detection if the inappropriate threshold is selected. To solve this issue, we also propose the new certified AUC metric for measuring the certified robustness performance at multiple distinct thresholds. Specifically, for each threshold candidate, we can make predictions by E δ [A θ (x+δ)] and compute certified TPR and FPR according to prediction results and their certified radius. After iterating all possible thresholds, we calculate final AUC scores based on the collection of certified TPRs and FPRs on various thresholds. Table 6 shows the certified robustness achieved by FreeRAD. For example, we achieve 98.2% certified AUC at l 2 radius 0.2 on gird sub-dataset, which indicates that there does not exist any adversarial perturbations δ (||δ|| ≤ 0.2) that can make the AUC lower than 98.2%. One major limitation of randomized smoothing on anomaly detection tasks is that the noise level can not be much high, otherwise the anomalous features might be covered by the Gaussian noise such that the detector can not distinguish anomalous samples from normal samples. For instance, there is only 12.4% certified AUC on Bottle sub-dataset under the noise level σ = 0.25. The performance gap on different datasets (e.g., 98.2% vs. 12.4%) under the same noise level (σ = 0.25) indicates that the selection of the noise level might depend on specific anomaly features. 

6. CONCLUSION

Adversarial robustness is a critical factor for the practical deployment of deep anomaly detection models. In this work, we propose an adversarially robust anomaly detector based on the diffusion model that leverages reconstruction error to detect anomalies and utilizes the diffusion process to gradually remove adversarial perturbations for better robustness. We empirically show that our method provides outstanding adversarial robustness while also maintaining strong anomaly detection performance on benchmark datasets. One major advantage of our method is that it gains adversarial robustness for free: the diffusion model functions both as an anomaly detector and an adversarial defender, thus no extra adversarial training or data purification is needed as in standard robust image classification tasks.

A TRAINING OBJECTIVE OF THE DIFFUSION MODEL

In this section, we introduce the hybrid training objective proposed by (Nichol & Dhariwal, 2021). Specifically, training diffusion models can be performed by optimizing the commonly used variational bound on negative log-likelihood as follows (Ho et al., 2020):  L vb := L 0 + L 1 + . . . + L T -1 + L T (A.1) L 0 := -log p θ (x 0 |x 1 ) (A.2) L t-1 := D KL (q(x t-1 |x t , x 0 )||p θ (x t-1 |x t )) (A.3) L T := D KL (q(x T |x 0 )||p(x T )) (A.4) L simple = E t,x0,ϵ [∥ϵ -ϵ θ (x t , t)∥]. (A.5) However, this L simple model suffers from sample quality loss when using a reduced number of denoising steps (Nichol & Dhariwal, 2021). Nichol & Dhariwal (2021) find that training diffusion models via a hybrid objective: L hybrid = L simple + λL vb (A.6) greatly improves its practical applicability by generating high-quality samples with fewer denoising steps, which is helpful for using diffusion models on applications with high-efficiency requirements such as real-time anomaly detection (Sun et al., 2021). In particular, we parameterize the variance term Σ θ (x t , t) in Eq.4.1 as an interpolation between β t and β t in the log domain following (Nichol & Dhariwal, 2021): Σ θ (x t , t) = exp(v log β t + (1 -v) log β t ) (A.7) where v is the model output. Following Nichol & Dhariwal (2021), we set λ = 0.001 and apply a stop-gradient to the µ θ (x t , t) output for L vb to prevent L vb from overwhelming L simple B ADDITIONAL ALGORITHMS

B.1 ARBITRARY-SHOT ROBUST RECONSTRUCTION IN FREERAD

In this section, we attach the complete algorithm for arbitrary-shot robust reconstruction motivated by (Nichol & Dhariwal, 2021). Given an arbitrary denoising steps S = {S m , S m-1 , . . . , S 1 }(m ≤ k, k = S m > S m-1 > • • • > S 1 >= 1), in each step t ∈ [1, m], we iteratively predict the true point x given the current diffused data x St , termed x 0 , them sampling new iterate x St-1 according to the current prediction x 0 and current diffused data x St . Algorithm 3 Arbitrary-shot Robust Reconstruction in FreeRAD Input: Test images: x, diffusion steps: k, arbitrary generation steps: S = {S m , S m-1 , . . . , S 1 }(m ≤ k, k = S m > S m-1 > • • • > S 1 >= 1) Output: Reconstructions of x: x 1: x 0 = x 2: ϵ ∼ N (0, I) 3: x k = √ α k x 0 + √ 1 -α k ϵ 4: for t = m, . . . , 1 do ▷ arbitrary-shot denoising 5: x 0 = 1 √ α S t (x St - √ 1 -α St ϵ θ (x St , S t )) 6: if t > 1 then 7: z ∼ N (0, I) 8: x St-1 = √ α S t-1 β S t 1-α S t x 0 + √ α S t (1-α S t-1 ) 1-α S t x St + Σ θ (x St , S t ))z 9: end if 10: end for 11: x = x 0 Under review as a conference paper at ICLR 2023

B.2 ANOMALY SCORE CALCULATION

In this section, we attach the complete algorithm for anomaly score calculation. Given test image x ∈ R C×H×W and its reconstruction x ∈ R C×H×W obtained by FreeRAD, we first calculate the Multiscale Reconstruction Error Map. In particular, we choose a scale schedule L = {1, 1 2 , 1 4 , 1 8 }. For each scale l, we compute the error map Err(x, x) l between the downsampled input x l and the downsampled reconstruction c,:,:] where the square operator here refers to element-wise square operation, then unsampled to the original resolution. The final Err ms is obtained by averaging each scale's error map and applying a mean filter for better stability similar to Zavrtanik et al. (2021): Err ms (x, x) = ( 1 N L l∈L Err(x, x) l ) * f s×s where f s×s is the mean filter of size s × s, * is the convolution operation. Similar to Pirnay & Chai (2022), we take the pixel-wise maximum of the absolute deviation of the Err ms (x, x) to the normal training data as the scalar anomaly score. x l with 1 C C c=1 (x l -x l ) 2 [

Algorithm 4 Anomaly Score Calculation in FreeRAD

Input: Test image: x ∈ R C×H×W , Reconstructed image: x ∈ R C×H×W , Output: Anomaly score: A(x) 1: for l in L = {1, 1 2 , 1 4 , 1 8 } do ▷ L is a downsampling scale schedule 2: x l = downsample(l, x) ∈ R C×(l×H)×(l×W ) 3: The diffusion model in our experiments uses the linear noise schedule (Ho et al., 2020). The number of channels in the first layer is 128, and the number of heads is 1. The attention resolution is 16 × 16. We adopt PyTorch as the deep learning framework for implementations. We train the model using Adam optimizer with the learning rate of 10 -4 and the batch size of 2. The model is trained for 30000 iterations for all categories of data. We set diffusion steps T = 1000 for training. We list the choice of k for each category as follows: data is that k should be chosen such that the amount of Gaussian noise is dominating the anomaly signals while the high-level features of the input data are still preserved for reconstruction. In terms of the adversarial data, k should also be large enough to add sufficient Gaussian noise to dominate adversarial perturbation. As in Table 9 , we can see that our method obtains the best performance on clean data at k = 25. However, the robust AUC is not satisfying, since the noise added in the diffusion process cannot dominate the adversarial perturbations. Therefore, we can choose larger k (e.g., 50) to obtain better robust performance with slight performance loss on clean data. In this section, we provide the anomaly detection performance of FreeRAD on clean data at varying denoising steps in Table 10 by running Algorithm 3 for reconstruction and using Algorithm 4 to compute anomaly score. Specifically, we test with several denoising steps schedules from oneshot denoising (1-step) to full-shot denoising (k-step) and intermediate settings such as 0.05k, 0.1k, 0.25k, and 0.5k. We can see that one-shot denoising obtains the highest AUC scores on all four datasets. Moreover, we report the inference time (in seconds) at varying denoising steps in Table 11 on an NVIDIA TESLA K80 GPU, where the inference time increases linearly with denoising steps. We show that the inference with one-shot denoising could process a single image in 0.5 seconds, which demonstrates the applicability of our method FreeRAD on real-time tasks. These experimental results clearly indicate that FreeRAD with reconstruction by one-shot denoising achieves both the best detection effectiveness and time efficiency. x l = downsample(l, x) ∈ R C×(l×H)×(l×W ) 4: Err(x, x) l = upsample( 1 l , 1 C C c=1 (x l -x l ) 2 [c,:,:] ) ∈ R H×W ▷ element-wise square 5: end for 6: Err ms (x, x) = ( 1 N L l∈L Err(x, x) l ) * f s×s ∈ R H×W ▷ f s×s is a mean filter of size (s × s) 7: A(x) = max(|Err ms (x, x) -1 N Z z∈Z Err ms (z, z)|) ▷ Z

D.3 COMPARISON WITH ROBUST ANOMALY DETECTION METHODS

In this section, we compare our method FreeRAD with robust anomaly detection methods such as Robust Autoencoder (Zhou & Paffenroth, 2017), which was proposed to handle noise and outlier data points, although the adversarial perturbation was not explicitly considered in their work. Table 12 clearly shows that our method still largely outperforms RAE no both clean data and adversarial data. (Yang et al., 2021). We perform experiments with novelty detection on the CIFAR-10 dataset (Krizhevsky et al., 2009) which has 10 categories with 60000 natural images. Under the setting of novelty detection, one category is regarded as a known class, and other categories are considered novel classes. Hence we train the corresponding model for each category respectively. We evaluate and compare our proposed FreeRAD with several SOTA methods that include FastFlow (Yu et al., 2021), and CFA (Lee et al., 2022a). We summarize the standard AUC and robust AUC against l ∞ -PGD and l 2 -PGD attacks in Table 14 . The results show that our method still largely outperforms the baselines method regarding robust AUC while maintaining a strong novelty detection performance on clean data. 57.9 (71.9) 2.1 (63.0) 1.7 (68.1) 62.2 (71.9) Plane 4.1 (74.2) 1.2 (71.8) 54.9 (79.3) 10.3 (74.2) 3.5 (71.8) 64.8 (79.3) Car 1.9 (81.7) 0.0 (76.3) 50.6 (70.4) 6.9 (81.7) 3.9 (76.3) 61.7 (70.4) Cat 0.2 (45.6) 0.3 (58.7) 28.7 (56.7) 0.8 (45.6) 1.3 (58.7) 38.9 (56.7) Deer 0.5 (56.1) 1.0 (74.6) 60.5 (71.5) 1.6 (56.1) 5.4 (74.6) 63.2 (71.5) Dog 1.0 (72.7) 1.0 (64.5) 39.1 (56.4) 3.7 (72.7) 3.0 (64.5) 45.4 (56.4) Frog 0 (79.7) 0.9 (81.7) 59.6 (71.3) 1.5 (79.7) 5.2 (81.7) 62.5 (71.3) Horse 1.2 (76.4) 1.6 (74.9) 47.1 (60.4) 4.6 (76.4) 5.0 (74.9) 51.7 (60.4) Ship 1.8 (81.2) 1.3 (81.0) 65.2 (77.5) 8.0 (81.2) 5.6 (81.0) 68.9 (77.5) Truck 3.7 (83.7) 0.4 (74.8) 23.2 (45.0) 13.1 (83.7) 3.9 (74.8) 28.5 (45.0) Average 1.5 (71.4) 0.8 (72.6) 48.7 (66.0) 5.3 (71.4) 3.9 (72.6) 54.8 (66.0)

E MORE ANALYSIS OF THE ADVERSARIAL ROBUSTNESS OF DDPMS

In this section, we provide more theoretical analysis on the defense mechanism of DDPMs. Since the adversarial perturbations would be dominated by the added noise from the diffusion process, such that the clean data distribution and adversarially perturbed data distribution get closer. Intuitively, after performing the full diffusion process, any data would converge to pure standard Gaussian as mentioned in Section 4.1. This suggests that the tiny adversarial perturbations will be gradually washed out and have little effect on the final output after denoising. Moreover, the stochasticity introduced from the sampling in the diffusion process (Eq.4.2) makes it well-suited for combining with randomized smoothing strategies and building certified robustness without much loss on anomaly detection performances. The following theorem confirms that the diffusion process in DDPMs could make the KL-divergence of diffused clean data distribution and diffused adversarially perturbed data distribution decreases gradually. Theorem E.1. Given any clean data distribution p(x) and adversarially perturbed data distribution q(x), we denote by p t the distribution of x t derived from the t-step diffusion process in Eq. 4.2 when x 0 ∼ p(x). Accordingly, we denote by q t the distribution of x t derived from the t-step diffusion process when x 0 ∼ q(x). If t ∈ [0, T ] and T → ∞, the diffusion process in DDPM converges to a continuous process and ∂D KL (p t ||q t ) ∂t ≤ 0, i.e., the KL-divergence of p t and q t monotonically decreases during the diffusion process. Proof: The proof mainly follow from Song et al. (2021); Nie et al. (2022). Following derivations from Song et al. (2021), the discrete Markov chain used in DDPM x i = √ 1 -β i x i-1 + √ β i ϵ i-1 , i = 1, • • • , T , can be re-written as x i = 1 - β i T x i-1 + β i T ϵ i-1 , i = 1, • • • , T (E.1) where β i = T β i . When T → ∞, β i becomes a function β(t) indexed by t ∈ [0, 1]. Denote β i , x i , and ϵ i as β( i T ), x( i T ), and ϵ( i T ), respectively, we can rewrite Eq. E.1 as below: x 



In fact, the strategy of using the diffusion model as a purifier before another anomaly detector will not work, as the purifier will break the anomaly signals.



Figure 1: An adversarial example on OCR-GAN. δ refers to invisible perturbations. "GT" denotes "Ground Truth".

Figure 2: Reconstruction results of normal data, anomalous data, and adversarially perturbed data using our model. The observed reconstruction is robust to adversarial noise.

is the set of normal training images C MORE DETAILS OF EXPERIMENTAL SETTINGS C.1 HYPERPARAMETERS OF THE DIFFUSION MODEL

Goodge et al. (2020) consider perturbations to anomalous data that make the model to categorize them as the normal class by reducing reconstruction error. For defense, they propose APAE using approximate projection and feature weighting to improve adversarial robustness.Lo et al. (2022)  extend the similar attack strategy to both normal and anomalous data and propose Principal Latent Space as a defense strategy to perform adversarially robust novelty detection (i.e., only semantic shift anomalies are considered). While they achieve a certain level of robustness, their performances on clean anomaly detection tasks are yet far from satisfactory.

Standard AUC (in parenthesis) and robust AUC against l ∞ -PGD attacks (ϵ = 2/255) on MVTec AD dataset, obtained by different state-of-the-art anomaly detectors and ours.

Standard AUC (in parenthesis) and Robust AUC against l 2 -PGD attacks (ϵ = 0.2) on MVTec AD dataset, obtained by different state-of-the-art anomaly detectors and ours.

Average standard AUC and robust AUC against l ∞ -PGD/BPDA (ϵ = 2/255), l 2 -PGD/BPDA (ϵ = 0.2) attacks on MVTec AD, obtained by PLS, APAE and ours.

Certified AUC on Bottle, Grid, Toothbrush, Wood datasets from MVTec AD benchmark at varying levels of Gaussion noise σ.

Ho et al. (2020) suggest that directly optimizing this variational bound L vb would produce much more gradient noise during training and propose a reweighted simplified objective L simple :

The choices of k for each category of MVTec AD dataset Here we first provide anomaly detection performance of proposed FreeRAD on clean data at varying diffusion steps k at inference time. We test with t ∈ {25, 50, 100, 200, 300}. As shown in Table8, different datasets may not have the same optimal k. A principle for anomaly detection on clean

AUC results on 15 categories from MVTec AD at varying diffusion steps k at inference time

Standard AUC and robust AUC against l 2 -PGD attacks (ϵ = 0.2) at varying diffusion step k on Capsule and Pill from MVTec AD.

AUC results on Screw, Toothbrush, Wood, Transistor at varying denoising steps. The choice of k for each category follows Table7.

Inference time (in seconds) for a single image on Toothbrush and Transistor by varying denoising steps, where the inference time increases over one-shot denoising is given in parenthesis.The choice of k for each category follows Table7.

Average standard AUC and robust AUC against l ∞ -PGD(ϵ = 2/255), l 2 -PGD(ϵ = 0.2)) attacks on MVTec AD, obtained by RAE and ours.We have shown that FreeRAD is robust to adaptive attacks EOT-PGD in Section 5.4. In this section, we incorporate additional strong attack baselines, AutoAttack(Croce & Hein, 2020) which ensemble multiple white-box and black-box attacks such as APGD attacks and Square attacks. Specifically, we used two versions of AutoAttack: (i) standard AutoAttack and (ii) random AutoAttack (EOT+AutoAttack), which is used for evaluating stochastic defense methods. We summarize the standard AUC and robust AUC of our proposed FreeRAD in the following Table13. The robust AUC scores of FreeRAD against AutoAttack are still largely higher than other SOTAs against relatively weaker PGD attacks as shown in Table2 and 3, thus there is no need to evaluate other methods' robustness against stronger AutoAttack.

Standard AUC and robust AUC against l ∞ -AutoAttack(ϵ = 2/255), l 2 -AutoAttack(ϵ = 0.2)) on Bottle, Grid, Toothbrush, Wood from MVTec AD D.5 EXPERIMENTS ON NOVELTY DETECTION DATASET Novelty Detection (i.e., semantic anomaly detection) refers to the problem of determining if test data is from the known class (normal) or novel class (anomalous)

Standard AUC (in parenthesis) and robust AUC against l ∞ -PGD attacks (ϵ = 2/255) and l 2 -PGD attacks (ϵ = 0.2) on CIFAR-10 dataset, obtained by different state-of-the-art anomaly detectors and ours.

(t + ∆t) = 1 -β(t + ∆t)∆tx(t) + β(t + ∆t)∆tϵ(t) T , t ∈ {0, 1 N , • • • , T -1T }, and the approximate equality holds when ∆t ≪ 1. Hence in the limit of T → ∞ and 1 T → 0, Eq. E.2 converges to a continuous time SDE: Following the same proof as in Theorem 3.1 inNie et al. (2022), we have∂D KL (p t ||q t ) ∂t = -1 2 g 2 (t)D F (p t ||q t ),where D F (p t ||q t ) := p t (x)|| log p t (x) -log q t (x)|| 2 dx ≥ 0 and D F (p t ||q t ) = 0 iff p t = q t , thus we have ∂D KL (p t ||q t ) ∂t ≤ 0.

