DENSEPURE: UNDERSTANDING DIFFUSION MODELS FOR ADVERSARIAL ROBUSTNESS

Abstract

Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. This deeper understanding allows us to propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. classifier). Given an (adversarial) input, DensePure consists of multiple runs of denoising via the reverse process of the diffusion model (with different random seeds) to get multiple reversed samples, which are then passed through the classifier, followed by majority voting of inferred labels to make the final prediction. This design of using multiple runs of denoising is informed by our theoretical analysis of the conditional distribution of the reversed sample. Specifically, when the data density of a clean sample is high, its conditional density under the reverse process in a diffusion model is also high; thus sampling from the latter conditional distribution can purify the adversarial example and return the corresponding clean sample with a high probability. By using the highest density point in the conditional distribution as the reversed sample, we identify the robust region of a given instance under the diffusion model's reverse process. We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works. In practice, DensePure can approximate the label of the high density region in the conditional distribution so that it can enhance certified robustness. We conduct extensive experiments to demonstrate the effectiveness of DensePure by evaluating its certified robustness given a standard model via randomized smoothing. We show that DensePure is consistently better than existing methods on ImageNet, with 7% improvement on average. Project page:https://densepure.github.io/.

1. INTRODUCTION

Diffusion models have been shown to be a powerful image generation tool (Ho et al., 2020; Song et al., 2021b) owing to their iterative diffusion and denoising processes. These models have achieved state-of-the-art performance on sample quality (Dhariwal & Nichol, 2021; Vahdat et al., 2021) as well as effective mode coverage (Song et al., 2021a) . A diffusion model usually consists of two processes: (i) a forward diffusion process that converts data to noise by gradually adding noise to the input, and (ii) a reverse generative process that starts from noise and generates data by denoising one step at a time (Song et al., 2021b) . Given the natural denoising property of diffusion models, empirical studies have leveraged them for adversarial purification (Nie et al., 2022; Wu et al., 2022; Carlini et al., 2022) . For instance, Nie et al. (2022) employed diffusion models for model purification, DiffPure. They empirically show that by carefully choosing the amount of Gaussian noises added during the diffusion process, adversarial perturbations can be removed while preserving the true label semantics. Despite the significant empirical result, there is no provable guarantee of the achieved robustness. A concurrent work (Carlini et al., 2022) instantiated the randomized smoothing approach with the diffusion model to offer a provable guarantee of model robustness against L 2 -norm bounded adversarial example. However, they do not provide a theoretical understanding of why and how diffusion models contribute to such nontrivial certified robustness. Our Approach. We are the first to theoretically analyze the fundamental properties of diffusion models to understand why and how diffusion models enhance certified robustness. This deeper understanding allows us to propose a new method DensePure to improve the certified robustness of any given classifier more effectively using diffusion models. An illustration of the DensePure framework is provided in Figure 1 , where it consists of a pretrained diffusion model and a pretrained classifier. DensePure incorporates two steps: (i) using the reverse process of the diffusion model to obtain a sample of the posterior data distribution conditioned on the adversarial input; and (ii) repeating the reverse process multiple times with different random seeds to approximate the label of the high-density region in the conditional distribution via a simple majority vote strategy. In particular, given an adversarial input, we repeatedly feed it into the reverse process of the diffusion model to get multiple reversed examples and feed them into the classifier to calculate their labels. We then apply the majority vote on the set of labels to get the final predicted label. DensePure is inspired by our theoretical analysis, where we show that the reverse process of the diffusion model provides a conditional distribution of the reversed sample given an adversarial input. Sampling from this conditional distribution can enhance the certified robustness. Specifically, we prove that when the data density of clean samples is high, it is a sufficient condition for the conditional density of the reversed samples to be also high. Therefore, in DensePure, samples from the conditional distribution can recover the ground-truth labels with a high probability. For understanding and rigorous analysis conveniently, we use the highest density point in the conditional distribution as the deterministic reversed sample for the classifier prediction. We show that the robust region for a given sample under the diffusion model's reverse process is the union of multiple convex sets, each surrounding a region around the ground-truth label. Compared with the robust region of previous work (Cohen et al., 2019) , which only focuses on only one region with the ground-truth label, such the union of multiple convex sets has the potential to provide a much larger robust region, resulting in higher certified robustness. Moreover, the characterization implies that the size of robust regions is affected by the relative density and the distance between data regions with the ground-truth label and those with other labels. We conduct extensive experiments on ImageNet and CIFAR-10 datasets under different settings to evaluate the certifiable robustness of DensePure. In particular, we follow the setting from Carlini et al. (2022) and rely on randomized smoothing to certify the robustness of the adversarial perturbations bounded in the L 2 -norm. We show that DensePure achieves a new state-of-the-art certified robustness on the standard pretrained model without further tuning any model parameters (e.g., smooth augmentation Cohen et al. (2019) ). On ImageNet, it achieves a consistently higher certified accuracy, 7% improvement on average, than the existing methods among every σ at every radius ϵ . Technical Contributions. In this paper, we take the first step to understand why and how diffusion models contribute to certified robustness. We make contributions on both theoretical and empirical fronts: (1)in theory, we prove that an adversarial example can be recovered back to the original clean sample with a high probability via the reverse process of a diffusion model. (2) In theory, we characterized the robust region for each point by further taking the highest density point in the conditional distribution generated by the reverse process as the reversed sample. We show that the robust region for a given sample under the diffusion model's reverse process has the potential to provide a larger robust region. To the best of our knowledge, this is the first work that characterizes the robust region of using the reverse process of the diffusion model for adversarial purification (3) In practice, we proposed DensePurebased on our theoretical analysis. We demonstrated DensePureis consistently better than existing methods on ImageNet, with 7% improvement on average.

2. PRELIMINARIES AND BACKGROUNDS

Continuous-Time Diffusion Model. The diffusion model has two components: the diffusion process followed by the reverse process. Given an input random variable x 0 ∼ p, the diffusion process adds isotropic Gaussian noises to the data so that the diffused random variable at time t is x t = √ α t (x 0 + ϵ t ), s.t., ϵ t ∼ N (0, σ 2 t I) , and σ 2 t = (1 -α t )/α t , and we denote x t ∼ p t . The forward diffusion process can also be defined by the stochastic differential equation dx = h(x, t)dt + g(t)dw, (SDE) where x 0 ∼ p, h : R d × R → R d is the drift coefficient, g : R → R is the diffusion coefficient, and w(t) ∈ R n is the standard Wiener process. Under mild conditions B.1, the reverse process exists and removes the added noise by solving the reverse-time SDE (Anderson, 1982 ) d x = [h( x, t) -g(t) 2 ▽ x log p t ( x)]dt + g(t)dw, (reverse-SDE) where dt is an infinitesimal reverse time step, and w(t) is a reverse-time standard Wiener process. In our context, we use the conventions of VP-SDE (Song et al., 2021b) where h(x; t) := -1 2 γ(t)x and g(t) := γ(t) with γ(t) positive and continuous over [0, 1], such that x(t) = √ α t x(0) + √ 1 -α t ϵ where α t = e -t 0 γ(s)ds and ϵ ∼ N (0, I). We use {x t } t∈[0,1] and {x t } t∈[0,1] to denote the diffusion process and the reverse process generated by SDE and reverse-SDE respectively, which follow the same distribution. Discrete-Time Diffusion Model (or DDPM (Ho et al., 2020) ). DDPM constructs a discrete Markov chain {x 0 , x 1 , • • • , x i , • • • , x N } as the forward process for the training data x 0 ∼ p, such that P( x i |x i-1 ) = N (x i ; √ 1 -β i x i-1 , β i I), where 0 < β 1 < β 2 < • • • < β N < 1 are predefined noise scales such that x N approximates the Gaussian white noise. Denote α i = N i=1 (1 -β i ), we have P(x i |x 0 ) = N (x i ; √ α i x 0 , (1 -α i )I), i.e., x t (x 0 , ϵ) = √ α i x 0 + (1 -α i )ϵ, ϵ ∼ N (0, I). The reverse process of DDPM learns a reverse direction variational Markov chain p θ (x i- Ho et al. (2020) defines ϵ θ as a function approximator to predict 1 |x i ) = N (x i-1 ; µ θ (x i , i), Σ θ (x i , i)). ϵ from x i such that µ θ (x i , i) = 1 √ 1-βi x i -βi √ 1-αi ϵ θ (x i , i) . Then the reverse time samples are generated by xi-1 = 1 √ 1-βi xi -βi √ 1-αi ϵ θ * (x i , i) + √ β i ϵ, ϵ ∼ N (0 0 0, I) , and the optimal parameters θ * are obtained by solving θ * := arg min θ E x0,ϵ ||ϵ -ϵ θ ( √ α i x 0 + (1 -α i ), i)|| 2 2 . Randomized Smoothing. Randomized smoothing is used to certify the robustness of a given classifier against L 2 -norm based perturbation. It transfers the classifier f to a smooth version g(x) = arg max c P ϵ∼N (0,σ 2 I) (f (x + ϵ) = c), where g is the smooth classifier and σ is a hyperparameter of the smooth classifier g, which controls the trade-off between robustness and accuracy. Cohen et al. (2019) shows that g(x) induces the certifiable robustness for x under the L 2 -norm with radius R, where R = σ 2 Φ -1 (p A ) -Φ -1 (p B ) ; p A and p B are probability of the most probable class and "runner-up" class respectively; Φ is the inverse of the standard Gaussian CDF. The p A and p B can be estimated with arbitrarily high confidence via Monte Carlo method (Cohen et al., 2019) .

3. THEORETICAL ANALYSIS

In this section, we theoretically analyze why and how the diffusion model can enhance the robustness of a given classifier. We will analyze directly on SDE and reverse-SDE as they generate the same stochastic processes {x t } t∈[0,T ] and the literature works establish an approximation on reverse-SDE (Song et al., 2021b; Ho et al., 2020) . We first show that given a diffusion model, solving reverse-SDE will generate a conditional distribution based on the scaled adversarial sample, which will have high density on data region with high data density and near to the adversarial sample in Theorem 3.1. See detailed conditions in B.1. Theorem 3.1. Under conditions B.1, solving equation reverse-SDE starting from time t and sample x a,t = √ α t x a will generate a reversed random variable x0 with density P (x 0 = x|x t = x a,t ) ∝ p(x) • 1 √ (2πσ 2 t ) n exp -||x-xa|| 2 2 2σ 2 t , where p is the data distribution, σ 2 t = 1-αt αt is the variance of Gaussian noise added at time t in the diffusion process. Proof. (sketch) Under conditions B.1, we know {x t } t∈[0,1] and {x t } t∈[0,1] follow the same distribution, and then the rest proof follows Bayes' Rule. Please see the full proofs of this and the following theorems in Appendix B.2. Remark 1. Note that P (x 0 = x|x t = x a,t ) > 0 if and only if p(x) > 0, thus the generated reverse sample will be on the data region where we train classifiers. In Theorem 3.1, the conditional density P (x 0 = x|x t = x a,t ) is high if both p(x) and the Gaussian term have high values, i.e., x has high data density and is close to the adversarial sample x a . The latter condition is reasonable since adversarial perturbations are typically bounded due to budget constraints. So the above argument implies that a reversed sample will more likely to have the ground-truth label if data region with the ground-truth label has high data density. For the sake of theoretical analysis and understanding, we take the point with highest conditional density P (x 0 = x|x t = x a,t ) as the reversed sample, defined as P(x a ; t) := arg max x P (x 0 = x|x t = x a,t ). P(x a ; t) is a representative of the high density data region in the conditional distribution and P(•; t) is a deterministic purification model. In the following, we characterize the robust region for data region with ground-truth label under P (•; t). The robust region and robust radius for a general deterministic purification model given a classifier are defined below. Definition 3.2 (Robust Region and Robust Radius). Given a classifier f and a point x 0 , let G(x 0 ) := {x : f (x) = f (x 0 )} be the data region where samples have the same label as x 0 . Then given a deterministic purification model P(• ; ψ) with parameter ψ, we define the robust region of G(x 0 ) under P and f as D f P (G(x 0 ); ψ) := {x : f (P(x; ψ)) = f (x 0 )}, i.e., the set of x such that purified sample P(x; ψ) has the same label as x 0 under f . Further, we define the robust radius of x 0 as r f P (x 0 ; ψ) := max r : x 0 + ru ∈ D f P (x 0 ; ψ) , ∀||u|| 2 ≤ 1 , i.e., the radius of maximum inclined ball of D f P (x 0 ; ψ) centered around x 0 . We will omit P and f when it is clear from the context and write D (G(x 0 ); ψ) and r(x 0 ; ψ) instead. Remark 2. In Definition 3.2, the robust region (resp. radius) is defined for each class (resp. point). When using the point with highest P (x 0 = x|x t = x a,t ) as the reversed sample, ψ := t. Now given a sample x 0 with ground-truth label, we are ready to characterize the robust region D (G(x 0 ); ψ) under purification model P(•; t) and classifier f . Intuitively, if the adversarial sample x a is near to x 0 (in Euclidean distance), x a keeps the same label semantics of x 0 and so as the purified sample P(x a ; t), which implies that f (P(x a ; ψ)) = f (x 0 ). However, the condition that x a is near to x 0 is sufficient but not necessary since we can still achieve f (P(x a ; ψ)) = f (x 0 ) if x a is near to any sample x0 with f (P( xa ; ψ)) = f (x 0 ). In the following, we will show that the robust region D (G(x 0 ); ψ) is the union of the convex robust sub-regions surrounding every x0 with the same label as x 0 . The following theorem characterizes the convex robust sub-region and robust region respectively. Theorem 3.3. Under conditions B.1 and classifier f , let x 0 be the sample with ground-truth label and x a be the adversarial sample, then (i) the purified sample P(x a ; t) will have the ground-truth label if x a falls into the following convex set, D sub (x 0 ; t) := {x ′ 0 :f (x ′ 0 )̸ =f (x0)} x a : (x a -x 0 ) ⊤ (x ′ 0 -x 0 ) < σ 2 t log p(x 0 ) p(x ′ 0 ) + ||x ′ 0 -x 0 || 2 2 2 , and further, (ii) the purified sample P(x a ; t) will have the ground-truth label if and only if x a falls into the following set, D (G(x 0 ); t) := x0:f ( x0)=f (x0) D sub ( x0 ; t). In other words, D (G(x 0 ); t) is the robust region for data region G(x 0 ) under P(•; t) and f . Proof. (sketch) (i). Each convex half-space defined by the inequality corresponds to a x ′ 0 such that f (x ′ 0 ) ̸ = f (x 0 ) where x a within satisfies P (x 0 = x 0 |x t = x a,t ) > P (x 0 = x ′ 0 | xt = x x x a,t ). This implies that P(x a ; t) ̸ = x ′ 0 and f (P(x a ; ψ)) = f (x 0 ). The convexity is due to that the intersection of convex sets is convex. (ii). The "if" follows directly from (i). The "only if" holds because if x a / ∈ D (G(x 0 ); t), then exists x1 such that f ( x1 ) ̸ = f (x 0 ) and P (x 0 = x1 |x t = x a,t ) > P (x 0 = x0 |x t = x a,t ) , ∀ x0 s.t. f ( x0 ) = f (x 0 ), and thus f (P(x a ; ψ)) ̸ = f (x 0 ). Remark 3. Theorem 3.3 implies that when data region G(x 0 ) has higher data density and larger distances to data regions with other labels, it tends to have larger robust region and points in data region tends to have larger radius. Since adversarial attack typically has small magnitude, with large robust region, the adversarial sample can be recovered to the clean sample with a high probability. In the literature, people focus more on the robust radius (lower bound) r (G(x 0 ); t) (Cohen et al., 2019; Carlini et al., 2022) , which can be obtained by finding the maximum inclined ball inside D (G(x 0 ); t) centering x 0 . Note that although D sub (x 0 ; t) is convex, D (G(x 0 ); t) is generally not. Therefore, finding r (G(x 0 ); t) is a non-convex optimization problem. In particular, it can be formulated into a disjunctive optimization problem with integer indicator variables, which is typically NP-hard to solve. One alternative could be finding the maximum inclined ball in D sub (x 0 ; t), which can be formulated into a convex optimization problem whose optimal value provides a lower bound for r (G(x 0 ); t). However, D (G(x 0 ); t) has the potential to provide much larger robustness radius because it might connect different convex robust sub-regions into one, as shown in Figure 2 .  ; t) = 3 i=1 D sub (x i ; t) , where x 0 , x 1 , x 2 are samples with ground-truth label and x 3 is a sample with another label. x a = x 0 +ϵ a is an adversarial sample such that P(x a ; t) = x 1 ̸ = x 0 and thus the classification is correct but x a is not reversed back to x 0 . r sub (x 0 ) < r(x 0 ) shows our claim that the union leads to a larger robust radius. In practice, we cannot guarantee to establish an exact reverse process like reverse-SDE but instead try to establish an approximate reverse process to mimic the exact one. As long as the approximate reverse process is close enough to the exact reverse process, they will generate close enough conditional distributions based on the adversarial sample. Then the density and locations of the data regions in two conditional distributions will not differ much and so is the robust region for each data region. We take the score-based diffusion model in Song et al. (2021b) for an example and demonstrate Theorem 3.4 to bound the KL-divergnece between conditional distributions generated by reverse-SDE and score-based diffusion model. Ho et al. (2020) showed that using variational inference to fit DDPM is equivalent to optimizing an objective resembling score-based diffusion model with a specific weighting scheme, so the results can be extended to DDPM.  = x | xt = x a,t )∥P(x θ 0 = x | x θ t = x a,t )) = J SM (θ, t; λ(•)), where { xτ } τ ∈[0,t] and {x θ τ } τ ∈[0,t] are stochastic processes generated by reverse-SDE and score-based diffusion model respectively, J SM (θ, t; λ(•) ) := 1 2 t 0 E pτ (x) λ(τ ) ∥∇ x log p τ (x) -s θ (x, τ )∥ 2 2 dτ, s θ (x, τ ) is the score function to approximate ∇ x log p τ (x), and λ : R → R is any weighting scheme used in the training score-based diffusion models. Proof. (sketch) Let µ t and ν t be the path measure for reverse processes {x τ } τ ∈[0,t] and {x θ τ } τ ∈[0,t] respectively based on the x a,t . Under conditions B.1, µ t and ν t are uniquely defined and the KLdivergence can be computed via the Girsanov theorem Oksendal (2013) . Remark 4. Theorem 3.4 shows that if the training loss is smaller, the conditional distributions generated by reverse-SDE and score-based diffusion model are closer, and are the same if the training loss is zero. Furthermore, by the Pinsker's inequality, the total variation (a distance metric) is upper bounded by D TV (P( x 0 = x | xt = x a,t )∥P(x θ 0 = x | x θ t = x a,t )) ≤ 1 2 J SM (θ, t; λ(•)).

4. DENSEPURE

Inspired by the theoretical analysis, we introduce DensePure and show how to calculate its certified robustness radius via the randomized smoothing algorithm. Framework. Our framework, DensePure, consists of two components: (1) an off-the-shelf diffusion model with reverse process rev and (2) an off-the-shelf base classifier f . The pipeline of DensePure is shown in Figure 1 . Given an input x, we feed it into the reverse process rev of the diffusion model to get the reversed sample rev(x) and then repeat the above process K times to get K reversed samples {rev(x) 1 , • • • , rev(x) K }. We feed the above K reversed samples into the classifier to get the corresponding prediction {f (rev(x) 1 ), • • • , f (rev(x) K )} and then apply the majority vote, termed MV, on these predictions to get the final predicted label ŷ = MV({f (rev(x) 1 ), • • • , f (rev(x) K )}) = arg max c K i=1 1 1 1{f (rev(x) i ) = c} . Certified Robustness of DensePure with Randomized Smoothing. In this paragraph, we will illustrate the algorithm to calculate certified robustness of DensePure via RS, which offers robustness guarantees for a model under a L 2 -norm ball. In particular, we follow the similar setting of Carlini et al. (2022) which uses a DDPM-based diffusion model. The overall algorithm contains three steps: (1) Our framework estimates n, the number of steps used for the reverse process of DDPM-based diffusion model. Since Randomized Smoothing (Cohen et al., 2019) adds Gaussian noise ϵ, where ϵ ∼ N (0, σ 2 I), to data input x to get the randomized data input, x rs = x + ϵ, we map between the noise required by the randomized example x rs and the noise required by the diffused data x n (i.e., x n ∼ N (x n ; √ α n x 0 , (1 -α n )I)) with n step diffusion processing so that α n = 1 1+σ 2 . In this way, we can compute the corresponding timestep n, where n = arg min s {|α s -1 1+σ 2 | | s ∈ [N ]}. (2). Given the above calculated timestep n, we scale x rs with √ α n to obtain the scaled randomized smoothing sample √ α n x rs . Then we feed √ α n x rs into the reverse process of the diffusion model by K-times to get the reversed sample set { x1 0 , x2 0 , • • • , xi 0 , • • • , xK 0 }. ( ). We feed the obtained reversed sample set into a standard off-the-shelf classifier f to get the corresponding predicted labels {f ( x1 0 ), f ( x2 0 ), . . . , f ( xi 0 ), . . . , f ( xK 0 )}, and apply majority vote, denoted MV(• • •), on these predicted labels to get the final label for x rs . Fast Sampling. To calculate the reversed sample, the standard reverse process of DDPM-based models require repeatedly applying a "single-step" operation n times to get the reversed sample x0 (i.e., x0 = Reverse(• • • Reverse(• • • Reverse(Reverse( √ α n x rs ; n); n -1); • • • ; i); • • • 1)). Here xi-1 = Reverse( xi ; i) is equivalent to sample xi-1 from N ( xi-1 ; µ θ ( xi , i), Σ θ ( xi , i)), where µ θ ( xi , i) = 1 √ 1-βi xi -βi √ 1-αi ϵ θ ( xi , i) and Σ θ := exp(v log β i + (1 -v) log β i ). Here v is a parameter learned by DDPM and β i = 1-αi-1 1-αi . To reduce the time complexity, we use the uniform sub-sampling strategy from Nichol & Dhariwal (2021). We uniformly sample a subsequence with size b from the original N -step the reverse process. Note that Carlini et al. (2022) set b = 1 for the "one-shot" sampling, in this way, x0 = 1 √ αn (x n - √ 1 -α n ϵ θ ( √ α n x rs , n) ) is a deterministic value so that the reverse process does not obtain a posterior data distribution conditioned on the input. Instead, we can tune the number of the sub-sampled DDPM steps to be larger than one (b > 1) to sample from a posterior data distribution conditioned on the input. The details about the fast sampling are shown in appendix C.2. 

5. EXPERIMENTS

In this section, we use DensePure to evaluate certified robustness on two standard datasets, CIFAR-10 ( Krizhevsky et al., 2009) and ImageNet (Deng et al., 2009) . 

5.1. MAIN RESULTS

We perform DensePure on the subset of CIFAR-10 or ImageNet. We choose the same subset as in Cohen et al. (2019) , 500 samples for CIFAR-10 and 100 samples for ImageNet ( the results with 500 samples are shown in the appendix D.10). The results are shown in Table 1 . For CIFAR-10, comparing with the models which are carefully trained with randomized smoothing techniques in an end-to-end manner (i.e., w/o off-the-shelf classifier), we observe that our method with the standard off-the-shelf classifier outperforms them at smaller ϵ = {0.25, 0.5} on both CIFAR-10 and ImageNet datasets while achieves comparable performance at larger ϵ = {0. than them. These results verify the non-trivial adversarial robustness improvements introduced from the diffusion model. For ImageNet, our method is consistently better than all priors with a large margin. Since both Carlini et al. (2022) and DensePure use the diffusion model, to better understand the importance of our design, that approximates the label of the high density region in the conditional distribution, we compare DensePure with Carlini et al. ( 2022) in a more fine-grained manner. We show detailed certified robustness of the model among different σ at different radius for CIFAR-10 in Figure 3 -left and for ImageNet in Figure 3 -right. We also present our results of certified accuracy at different ϵ in Appendix D.3. From these results, we find that our method is still consistently better at most ϵ (except ϵ = 0) among different σ. The performance margin between ours and Carlini et al. ( 2022) will become even larger with a large ϵ. These results further indicate that although the diffusion model improves model robustness, leveraging the posterior data distribution conditioned on the input instance (like DensePure ) via reverse process instead of using single sample ( (Carlini et al., 2022) ) is the key for better robustness. Additionally, we use the off-the-shelf classifiers, which are the VIT-based architectures trained a larger dataset. In the later ablation study section, we select the CNN-based architecture wide-ResNet trained on standard dataset from scratch. Our method still achieves non-trivial robustness. Further, our experiments in Appendix D.7 shows that removing the diffusion model from DensePure deteriorates the performance. It further verifies that our design is non-trivial.

5.2. ABLATION STUDY

Voting samples (K) We first show how K affects the certified accuracy. For efficiency, we select b = 10. We conduct experiments for both datasets. We show the certified accuracy among different r at σ = 0.25 in Figure 4 . The results for σ = 0.5, 1.0 and CIFAR-10 are shown in the Appendix D.4. Comparing with the baseline (Carlini et al., 2022) , we find that a larger majority vote number leads to a better certified accuracy. It verifies that DensePure indeed benefits the adversarial robustness and making a good approximation of the label with high density region requires a large number of voting samples. We find that our certified accuracy will almost converge at r = 40. Thus, we set r = 40 for our experiments. The results with other σ show the similar tendency. To further improve the time efficiency, we can use K-Consensus (Horváth et al., 2021) . It accelerates the majority vote process by 45% ∼ 60% with a negligible performance drop. The experimental details and results are in Appendix D.8.

Fast sampling steps (b)

To investigate the role of b, we conduct additional experiments with b ∈ {2, 5} at σ = 0.25. The results on ImageNet are shown in Figure 4 and results for σ = 0.5, 1.0 and CIFAR-10 are shown in the Appendix D.5. By observing results with majority vote, we find that a larger b can lead to a better certified accuracy since a larger b generates images with higher quality. By observing results without majority vote, the results show opposite conclusions where a larger b leads to a lower certified accuracy, which contradicts to our intuition. We guess the potential reason is that though more sampling steps can normally lead to better image recovery quality, it also brings more randomness, increasing the probability that the reversed image locates into a data region with the wrong label. These results further verify that majority vote is necessary for a better performance. 

6. RELATED WORK

Using an off-the-shelf generative model to purify adversarial perturbations has become an important direction in adversarial defense. Previous works have developed various purification methods based on different generative models, such as GANs (Samangouei et al., 2018) , autoregressive generative models (Song et al., 2018) , and energy-based models (Du & Mordatch, 2019; Grathwohl et al., 2020; Hill et al., 2021) . More recently, as diffusion models (or score-based models) achieve better generation quality than other generative models (Ho et al., 2020; Dhariwal & Nichol, 2021) , many works consider using diffusion models for adversarial purification (Nie et al., 2022; Wu et al., 2022; Sun et al., 2022) Although they have found good empirical results in defending against existing adversarial attacks (Nie et al., 2022) , there is no provable guarantee about the robustness about such methods. On the other hand, certified defenses provide guarantees of robustness (Mirman et al., 2018; Cohen et al., 2019; Lecuyer et al., 2019; Salman et al., 2020; Horváth et al., 2021; Zhang et al., 2018; Raghunathan et al., 2018a; b; Salman et al., 2019b; Wang et al., 2021) . They provide a lower bounder of model accuracy under constrained perturbations. Among them, approaches Lecuyer et al. 2022) based on randomized smoothing (Cohen et al., 2019) show the great scalability and achieve promising performance on large network and dataset. The most similar work to us is Carlini et al. (2022) , which uses diffusion models combined with standard classifiers for certified defense. They view diffusion model as blackbox without having a theoretical under-standing of why and how the diffusion models contribute to such nontrivial certified robustness.

7. CONCLUSION

In this work, we theoretically prove that the diffusion model could purify adversarial examples back to the corresponding clean sample with high probability, as long as the data density of the corresponding clean samples is high enough. Our theoretical analysis characterizes the conditional distribution of the reversed samples given the adversarial input, generated by the diffusion model reverse process. Using the highest density point in the conditional distribution as the deterministic reversed sample, we identify the robust region of a given instance under the diffusion model reverse process, which is potentially much larger than previous methods. Our analysis inspires us to propose an effective pipeline DensePure, for adversarial robustness. We conduct comprehensive experiments to show the effectiveness of DensePure by evaluating the certified robustness via the randomized smoothing algorithm. Note that DensePure is an off-the-shelf pipeline that does not require training a smooth classifier. Our results show that DensePure achieves the new SOTA certified robustness for perturbation with L 2 -norm. We hope that our work sheds light on an in-depth understanding of the diffusion model for adversarial robustness. Limitations. The time complexity of DensePure is high since it requires repeating the reverse process multiple times. In this paper, we use fast sampling to reduce the time complexity and show that the setting (b = 2 and K = 10) can achieve nontrivial certified accuracy. We leave the more advanced fast sampling strategy as the future direction.

ETHICS STATEMENT

Our work can positively impact the society by improving the robustness and security of AI systems. We have not involved human subjects or data set releases; instead, we carefully follow the provided licenses of existing data and models for developing and evaluating our method.

APPENDIX

Here is the appendix. A NOTATIONS  [||x|| 2 2 ] < ∞. (ii) ∀t ∈ [0, T ] : h(•, t) ∈ C 1 , ∃C > 0, ∀x ∈ R n , t ∈ [0, T ] : ||h(x, t)|| 2 ⩽ C (1 + ||x|| 2 ). (iii) ∃C > 0, ∀x, y ∈ R n : ||h(x, t) -h(y, t)|| 2 ⩽ C∥x -y∥ 2 . (iv) g ∈ C and ∀t ∈ [0, T ], |g(t)| > 0. (v) ∀t ∈ [0, T ] : s θ (•, t) ∈ C 1 , ∃C > 0, ∀x ∈ R n , t ∈ [0, T ] : ||s θ (x, t)|| 2 ⩽ C (1 + ||x|| 2 ). (vi) ∃C > 0, ∀x, y ∈ R n : ||s θ (x, t) -s θ (y, t)|| 2 ⩽ C∥x -y∥ 2 .

B.2 THEOREMS AND PROOFS

Theorem 3.1. Under conditions B.1, solving equation reverse-SDE starting from time t and point x a,t = √ α t x a will generate a reversed random variable x0 with conditional distribution P (x 0 = x|x t = x a,t ) ∝ p(x) • 1 (2πσ 2 t ) n e -||x-xa || 2 2 2σ 2 t where σ 2 t = 1-αt αt is the variance of the Gaussian noise added at timestamp t in the diffusion process SDE. Proof. Under the assumption, we know {x t } t∈[0,1] and {x t } t∈[0,1] follow the same distribution, which means P (x 0 = x|x t = x a,t ) = P(x 0 = x, xt = x a,t ) P(x t = x a,t ) = P(x 0 = x, x t = x a,t ) P(x t = x a,t ) = P (x 0 = x) P(x t = x a,t |x 0 = x) P(x t = x a,t ) ∝ P (x 0 = x) 1 (2πσ 2 t ) n e -||x-xa || 2 2 2σ 2 t = p(x) • 1 (2πσ 2 t ) n e -||x-xa || 2 2 2σ 2 t where the third equation is due to the chain rule of probability and the last equation is a result of the diffusion process. Theorem 3.3. Under conditions B.1 and classifier f , let x 0 be the sample with ground-truth label and x a be the adversarial sample, then (i) the purified sample P(x a ; t) will have the ground-truth label if x a falls into the following convex set, D sub (x 0 ; t) := {x ′ 0 :f (x ′ 0 )̸ =f (x0)} x a : (x a -x 0 ) ⊤ (x ′ 0 -x 0 ) < σ 2 t log p(x 0 ) p(x ′ 0 ) + ||x ′ 0 -x 0 || 2 2 2 , and further, (ii) the purified sample P(x a ; t) will have the ground-truth label if and only if x a falls into the following set, D (G(x 0 ); t) := x0:f ( x0)=f (x0) D sub ( x0 ; t). In other words, D (G(x 0 ); t) is the robust region for data region G(x 0 ) under P(•; t) and f . Proof. We start with part (i). The main idea to prove that a point x ′ 0 such that f (x ′ 0 ) ̸ = f (x 0 ) should have lower density than x 0 in the conditional distribution in Theorem 3.1 so that P(x a ; t) cannot be x ′ 0 . In other words, we should have P (x 0 = x 0 |x t = x a,t ) > P (x 0 = x ′ 0 | xt = x x x a,t ) . By Theorem 3.1, this is equivalent to p(x 0 ) • 1 (2πσ 2 t ) n e -||x 0 -xa || 2 2 2σ 2 t > p(x ′ 0 ) • 1 (2πσ 2 t ) n e -||x ′ 0 -xa|| 2 2 2σ 2 t ⇔ log p(x 0 ) p(x ′ 0 ) > 1 2σ 2 t ||x 0 -x a || 2 2 -||x ′ 0 -x a || 2 2 ⇔ log p(x 0 ) p(x ′ 0 ) > 1 2σ 2 t ||x 0 -x a || 2 2 -||x ′ 0 -x 0 + x 0 -x a || 2 2 ⇔ log p(x ) p(x ′ 0 ) > 1 2σ 2 t 2(x a -x 0 ) ⊤ (x ′ 0 -x 0 ) -∥x ′ 0 -x 0 ∥ 2 2 . Re-organizing the above inequality, we obtain (x a -x 0 ) ⊤ (x ′ 0 -x 0 ) < σ 2 t log p(x 0 ) p(x ′ 0 ) + 1 2 ||x ′ 0 -x 0 || 2 2 . Note that the order of x a is at most one in every term of the above inequality, so the inequality actually defines a half-space in R n for every (x 0 , x ′ 0 ) pair. Further, we have to satisfy the inequality for every x ′ 0 such that f (x ′ 0 ) ̸ = f (x 0 ), therefore, by intersecting over all such half-spaces, we obtain a convex D sub (x 0 ; t).

Then we prove part (ii).

On the one hand, if x a ∈ D (G(x 0 ); t), then there exists one x0 such that f ( x0 ) = f (x 0 ) and x a ∈ D sub ( x0 ; t). By part (i), x0 has higher probability than all other points with different labels from x 0 in the conditional distribution P (x 0 = x|x t = x a,t ) characterized by Theorem 3.1. Therefore, P(x a ; t) should have the same label as x 0 . On the other hand, if x a / ∈ D (G(x 0 ); t), then there is a point x1 with different label from x 0 such that for any x0 with the same label as x 0 , P (x 0 = x1 |x t = x a,t ) > P (x 0 = x0 |x t = x a,t ). In other words, P(x a ; t) would have different label from x 0 . Theorem 3.4. Under score-based diffusion model Song et al. (2021b) and conditions B.1, we can bound D KL (P(x 0 = x | xt = x a,t )∥P(x θ 0 = x | x θ t = x a,t )) = J SM (θ, t; λ(•)) where { xτ } τ ∈[0,t] and {x θ τ } τ ∈[0,t] are stochastic processes generated by reverse-SDE and scorebased diffusion model respectively, J SM (θ, t; λ(•)) := 1 2 t 0 E pτ (x) λ(τ ) ∥∇ x log p τ (x) -s θ (x, τ )∥ 2 2 dτ, s θ (x, τ ) is the score function to approximate ∇ x log p τ (x), and λ : R → R is any weighting scheme used in the training score-based diffusion models. Proof. Similar to proof of (Song et al., 2021a , Theorem 1), let µ t and ν t be the path measure for reverse processes {x τ } τ ∈[0,t] and {x θ τ } τ ∈[0,t] respectively based on the scaled adversarial sample x a,t . Under conditions B.1, the KL-divergence can be computed via the Girsanov theorem Oksendal (2013): D KL P(x 0 = x | xt = x a,t )∥P(x θ 0 = x | x θ t = x a,t ) = -E µt dν t dµ t (i) = E µt t 0 g(τ ) (∇ x log p τ (x) -s θ (x, τ )) dw τ + 1 2 t 0 g(τ ) 2 ∥∇ x log p τ (x) -s θ (x, τ )∥ 2 2 dτ = E µt 1 2 t 0 g(τ ) 2 ∥∇ x log p τ (x) -s θ (x, τ )∥ 2 2 dτ = 1 2 τ 0 E pτ (x) g(τ ) 2 ∥∇ x log p τ (x) -s θ (x, τ )∥ 2 2 dτ = J SM θ, t; g(•) 2 where (i) is due to Girsanov Theorem and (ii) is due to the martingale property of Itô integrals. , where S j = j is the j-th element in S). Within this context, we adapt the original α schedule α S = {α 1 , • • • , α i , • • • , α n } used for single- step to the new schedule α S b = {α S b 1 , • • • , α S b j , • • • , α S b b } (i.e., α S b i = α S b i = α S ⌊n-in b ⌋ is the i-th element in α S b ). We calculate the corresponding  β S b = {β S b 1 , β S b 2 , • • • , β S b i , • • • , β S b b } and β S b = { β S b 1 , β S b 2 , • • • , β S b i , • • • , β S b b } schedules, where β S b i = β S b i = 1 - α S b i α S b i-1 , β S b i = β S b i = 1-α S b i-1 1-α S b i β S b i . x0 = Reverse(• • • Reverse(Reverse(x n ; S b b ); S b b-1 ); • • • ; 1) b . Since Σ θ (x S b i , S b i ) is parameterized as a range between β S b and β S b , it will automatically be rescaled. Thus, xS b i-1 = Reverse( xS b i ; S b i ) is equivalent to sample x S b i-1 from N (x S b i-1 ; µ θ (x S b i , S b i ), Σ θ (x S b i , S b i )).

D.1 IMPLEMENTATION DETAILS

We select three different noise levels σ ∈ {0.25, 0.5, 1.0} for certification. For the parameters of DensePure , The sampling numbers when computing the certified radius are n = 100, 000 for CIFAR-10 and n = 10, 000 for ImageNet. We evaluate the certified robustness on 500 samples subset of CIFAR-10 testset and 100 samples subset of ImageNet validation set. we set K = 40 and b = 10 except the results in ablation study. D.2 BASELINES. We select randomized smoothing based methods including PixelDP (Lecuyer et al., 2019) , RS (Cohen et al., 2019 ), SmoothAdv (Salman et al., 2019a) , Consistency (Jeong & Shin, 2020) , MACER (Zhai et al., 2020 ), Boosting (Horváth et al., 2021) , SmoothMix (Jeong et al., 2021 ), Denoised (Salman et al., 2020) , Lee (Lee, 2021) , Carlini (Carlini et al., 2022) . We find that our method outperforms (Carlini et al., 2022) for all σ among different classifiers.

D.7 EXPERIMENTS FOR RANDOMIZED SMOOTHING WITHOUT DIFFUSION MODEL

To show the effectiveness of our diffusion model design, we remove the diffusion model from our pipeline and conduct experiments. Specifically, first, we remove the diffusion model and perform randomized smoothing only on the pretrained classifier that we used in DensePure (i.e., ViT-B/16 for CIFAR-10 and BEiT for ImageNet). The results are shown in Table C and Table D . The number in the bracket is the robust accuracy of pretrained classifier -the robust accuracy of DensePure. From the result, we conclude that without the help of diffusion models, neither ViT nor BEiT could reach high certified accuracy. Second, we conduct additional experiments to fairly compare with randomized smoothing without diffusion models under majority vote settings. Specifically, we activate droppath in BEiT at the inference stage to support majority votes. The other settings are the same as DensePure. The results are shown in Table E . The number in the bracket is calculated by the robust accuracy of BeiT with majority votes -the robust accuracy of DensePure. We find that simply performing majority votes on the BeiT classifier will not result in higher certified robustness. Third, to compare with randomized smoothing without diffusion model, we also evaluate certified accuracy with Gaussian augmentation-trained ViT models on CIFAR-10. The results shown in the table F prove that DensePure can still achieve higher certified accuracy than randomized smoothing on even Gaussian augmented models without diffusion models. The numbers in the bracket are the difference between the robust accuracy of Gaussian augmentation randomized smoothing and DensePure.

D.8 EXPERIMENTS FOR K-CONSENSUS AGGREGATION

To improve the efficient of our algorithm, we try the K-consensus Aggregation, where an early stop will be triggered if the classification results of the K consecutive reversed samples are the same. Here we calculate the certified robustness for 100 subsamples of CIFAR-10 and ImageNet with 2 sampling steps, a maximum 10 majority votes and consensus threshold k=3. Results are shown in Table G and Table H . The column of "Avg MV" in the tables means the average of the actual number of majority votes required for our algorithm. For instance, if the predicted labels of the first 3 reversed samples are the same, the actual majority vote numbers will be 3. The numbers in the bracket are the difference between certified accuracy w/o K-Consensus Aggregation.

NUMBERS

We also conduct additional experiments with 2 sampling steps and 5 majority votes. The results are shown in Table I . We find that our method still achieves better results than the existing method. We increase the ImageNet test sampling number from 100 to 500 and update the experiment results in Table J and Table K . We can draw the similar conclusion.



Figure 1: Pipeline of DensePure.

Figure 2: An illustration of the robust region D(x 0 ; t) =

Theorem 3.4. Under score-based diffusion model Song et al. (2021b) and conditions B.1, we have D KL (P(x 0

Figure 3: Comparing our method vs Carlini et al. (2022) on CIFAR-10 and ImageNet. The lines represent the certified accuracy with different L 2 perturbation bound with different Gaussian noise σ ∈ {0.25, 0.50, 1.00}.

We follow the experimental setting fromCarlini et al. (2022). Specifically, for CIFAR-10, we use the 50-M unconditional improved diffusion model fromNichol & Dhariwal  (2021)  as the diffusion model. We select ViT-B/16 modelDosovitskiy et al. (2020) pretrained on ImageNet-21k and finetuned on CIFAR-10 as the classifier, which could achieve 97.9% accuracy on CIFAR-10. For ImageNet, we use the unconditional 256×256 guided diffusion model fromDhariwal & Nichol (2021)  as the diffusion model and pretrained BEiT large model(Bao et al., 2021) trained on ImageNet-21k as the classifier, which could achieve 88.6% top-1 accuracy on validation set of ImageNet-1k. We select three different noise levels σ ∈ {0.25, 0.5, 1.0} for certification. For the parameters of DensePure , we set K = 40 and b = 10 except the results in ablation study. The details about the baselines are in the appendix.

Figure 4: Ablation study on ImageNet. The left image shows the certified accuracy among different vote numbers with different radius ϵ ∈ {0.0, 0.25, 0.5, 0.75}. The right image shows the certified accuracy with different fast sampling steps b.

accuracy of our method among different classifier. BeiT and ViT are pre-trained on a larger dataset ImageNet-22k and fine-tuned at ImageNet-1k and CIFAR-10 respectively. WideRes-Net is trained on ImageNet-1k for ImageNet and trained on CIFAR-10 from scratch for CIFAR-10.Different architecturesOne advantage of DensePure is to use the off-the-shelf classifier so that it can plug in any classifier. We choose Convolutional neural network (CNN)-based architectures: Wide-ResNet28-10 (Zagoruyko & Komodakis, 2016) for CIFAR-10 with 95.1% accuracy and Wide-ResNet50-2 for ImageNet with 81.5% top-1 accuracy, at σ = 0.25. The results are shown in Table 2 and Figure E in Appendix D.6. Results for more model architectures and σ of ImageNet are also shown in Appendix D.6. We show that our method can enhance the certified robustness of any given classifier trained on the original data distribution. Noticeably, although the performance of CNNbased classifier is lower than Transformer-based classifier, DensePure with CNN-based model as the classifier can outperform Carlini et al. (2022) with ViT-based model as the classifier (except ϵ = 0 for CIFAR-10).

(2019); Cohen et al. (2019); Salman et al. (2019a); Jeong & Shin (2020); Zhai et al. (2020); Horváth et al. (2021); Jeong et al. (2021); Salman et al. (2020); Lee (2021); Carlini et al. (

MORE DETAILS ABOUT DENSEPURE C.1 PSEUDO-CODE We provide the pseudo code of DensePure in Algo. 1 and Alg. 2 DensePure pseudo-code with the highest density point 1: Initialization: choose off-the-shelf diffusion model and classifier f , choose ψ = t, 2: Input sample x a = x 0 + ϵ a 3: Compute x0 = P(x a ; ψ) 4: ŷ = f ( x0 ) Algorithm 2 DensePure pseudo-code with majority vote 1: Initialization: choose off-the-shelf diffusion model and classifier f , choose σ2: Compute α n = 1 1+σ 2 , n = arg min s α s -1 1+σ 2 | s ∈ {1, 2, • • • , N } 3: Generate input sample x rs = x 0 + ϵ, ϵ ∼ N (0, σ 2 I) 4: Choose schedule S b , get xi 0 ← rev( √ α n x rs ) i , i = 1, 2, . . . , K with Fast Sampling 5: ŷ = MV({f ( x1 0 ), . . . , f ( xK 0 )}) = arg max c K i=1 1 1 1{f ( xi 0 ) = c}C.2 DETAILS ABOUT FAST SAMPLING Applying single-step operation n times is a time-consuming process. In order to reduce the time complexity, we follow the method used in (Nichol & Dhariwal, 2021) and sample a subsequence S b with b values (i.e., S b = {n, ⌊n -n b ⌋, • • • , 1} b , where S b j is the j-th element in S b and S b j = ⌊n -jn b ⌋, ∀j < b and S b b = 1) from the original schedule S (i.e., S = {n, n -1, • • • , 1} n

Figure A: Certified accuracy among different vote numbers with different radius. Each line in the figure represents the certified accuracy among different vote numbers K with Gaussian noise σ = 0.50.

Figure B: Certified accuracy among different vote numbers with different radius. Each line in the figure represents the certified accuracy among different numbers K with Gaussian noise σ = 1.00.

Figure D: Certified accuracy with different fast sampling steps b. Each line in the figure shows the certified accuracy among different L 2 adversarial perturbation bound with Gaussian σ = 1.00.

The data distribution p ∈ C 2 and E x∼p

With these new schedules, we can use b times reverse steps to calculate

Certified accuracy compared withCarlini et al. (2022) for CIFAR-10 at all σ. The numbers in the bracket are the difference of certified accuracy between two methods. Our diffusion model and classifier are the same asCarlini et al. (2022).

as our baselines. Among them, PixelDP, RS, SmoothAdv, Consistency, MACER, and SmoothMix require training a smooth classifier for a better certification performance while the others do not. Salman et al. and Lee use the off-the-shelf classifier but without using the diffusion model. The most similar one compared with us is Carlini et al., which also uses both the off-the-shelf diffusion model and classifier. The above two settings mainly refer toCarlini et al. (2022), which makes us easier to compared with their results.D.3 MAIN RESULTS FOR CERTIFIED ACCURACYWe compare withCarlini et al. (2022) in a more fine-grained version. We provide results of certified accuracy at different ϵ in TableAfor CIFAR-10 and TableBfor ImageNet. We include the accuracy difference between ours andCarlini et al. (2022) in the bracket in Tables. We can observe from the tables that the certified accuracy of our method outperforms Carlini et al.

8. ACKNOWLEDGMENT

We thank the support of NSF grant No.1910100, NSF CNS 2046726, C3 AI and DHS under grant No. 17STQAC00001-06-00, DARPA under grant N66001-15-C-4066, the Center for Long-Term Cybersecurity, and Berkeley Deep Drive. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors, and do not necessarily reflect the views of the sponsors.

REPRODUCIBILITY STATEMENT

For theoretical analysis, all necessary assumptions are listed in B.1 and the complete proofs are included in B.2. The experimental setting and datasets are provided in section 5. The pseudo-code for DensePure is in C.1 and the fast sampling procedures are provided in C.2.

annex

✗ (77.8) 68.8 (75.8) 58.1 (72.9) 48.5 (52.3) 37.8 (55.0) 50.0 (55.0) 44.0 (55.0) 34.0 (41.0) 24.0 (41.0) 17.0 MACER (Zhai et al., 2020) ✗ (81.0) 71.0 (81.0) 59.0 (66.0) 46.0 (66.0) 38.0 (68.0) 57.0 (64.0) 43.0 (64.0) 31.0 (48.0) 25.0 (48.0) 14.0 Boosting (Horváth et al., 2021) ✗ (83.4) 70.6 (76.8) 60.4 (71.6) 52.4 (73.0) 38.8 (65.6) 57.0 (57.0) 44.6 (57.0) 38.4 (44.6) 28.6 (38.6) 21.2 SmoothMix (Jeong et al., 2021) ✓ (77.1) 67.9 (77.1) 57.9 (74.2) 47.7 (61.8) 37.2 (55.0) 50.0 ✓ (88.0) 73.8 (88.0) 56.2 (88.0) 41.6 (74.2) 31.0 (82.0) 74.0 (77.2.0) 59.8 (77.2) 47.0 (64.6) 31.0 (64.6) 19.0 Ours ✓ (87.6) 76.6 (87.6) 64.6 (87.6) 50.4 (73.6) 37.4 (84.0) 77.8 (80.2) 67.0 (80.2) 54.6 (67.8) 42.2 (67.8) 25.8Table J : Certified accuracy compared with existing works. The certified accuracy at ϵ = 0 for each model is in the parentheses. The certified accuracy for each cell is from the respective papers except Carlini et al. (2022) . Our diffusion model and classifier are the same as Carlini et al. (2022) , where the off-the-shelf classifier uses ViT-based architectures trained on a large dataset (ImageNet-22k). 

