EFFICIENT RANDOMIZED SMOOTHING BY DENOISING WITH LEARNED SCORE FUNCTION

Abstract

The randomized smoothing with various noise distributions is a promising approach to protect classifiers from p adversarial attacks. However, it requires an ensemble of classifiers trained with different noise types and magnitudes, which is computationally expensive. In this work, we present an efficient method for randomized smoothing that does not require any re-training of classifiers. We built upon denoised smoothing, which prepends denoiser to the pre-trained classifier. We investigate two approaches to the image denoising problem for randomized smoothing and show that using the score function suits for both. Moreover, we present an efficient algorithm that can scale to randomized smoothing and can be applied regardless of noise types or levels. To validate, we demonstrate the effectiveness of our methods through extensive experiments on CIFAR-10 and ImageNet, under various p adversaries.

1. INTRODUCTION

The deep image classifiers are susceptible to deliberate noises as known as adversarial attacks (Szegedy et al., 2013; Goodfellow et al., 2014; Carlini & Wagner, 2017) . Even though many works proposed heuristics that can annul or mitigate adversarial attacks, most of them were broken by stronger attacks (Athalye et al., 2018; Athalye & Carlini, 2018) . The vulnerability of empirical defenses had led the researchers to scrutinize on certified defenses, which ensure the models to have constant output within the allowed set around given input. Unfortunately, many provable defenses are not feasible to large-scale neural networks because of their constraints on the architecture. On the other hand, randomized smoothing is a practical method that does not restrain the choice of neural networks. The randomized smoothing converts any base classifier to a smoothed classifier by making predictions over randomly perturbed samples. Then the smoothed classifiers are guaranteed to have a p certified radius, which is theoretically derived by the noise type used for smoothing. Since Cohen et al. (2019) derived tight 2 certified radius for Gaussian randomized smoothing, sequential works studied the certification bounds for various distributions (Teng et al., 2020; Yang et al., 2020) . As base classifiers are required to predict randomly perturbed samples, natural classifiers are not sufficient for randomized smoothing. Therefore, many works proposed training ensemble of base classifiers accustomed for randomized smoothing. However, since each trained classifier only applies to specific noise distribution and level, it is expensive to protect against various p adversaries and robustness strength. In this work, we tackle the inefficiency of training random-ensemble of base classifiers by using one universal image denoiser to the pre-trained classifier. The idea of using denoiser for randomized smoothing was first introduced by Salman et al. ( 2020) and is refer to denoised smoothing. One step further, we study general image denoising problem for randomized smoothing with two different approaches: 1) direct training of image denoiser, and 2) solve the optimization problem by using a generative model to project to the learned data manifold. Then, we show that the score function, which is the gradient of log-density, is crucial for both approaches. We exploit multi-scale denoising score matching (Song & Ermon, 2019) for score estimation, and propose an efficient algorithm simulated annealing for image denoising. Remark that we only require one score network to certify various noise distributions and levels. We provide experimentations on ImageNet and CIFAR-10 datasets to show the efficacy of our methods. Specifically, our denoisers perform better than original denoised smoothing, while can be applied to various noise types without any re-training. Further-more, we compare with the random-ensemble based method, which we refer to white-box smoothing, and show that our method works are comparable to them. In sum, we list our contributions: • We propose novel score-based image denoisers for randomized smoothing. • We improve denoised smoothing, which was originally proposed by Salman et al. (2020) and generalize to other distributions without training any neural networks. 2 RANDOMIZED SMOOTHING AND DENOISED SMOOTHING 2.1 BACKGROUNDS ON RANDOMIZED SMOOTHING Let f : R d → Y be a classifier and q be a distribution on R d . Then the randomized smoothing with q is a method that converts the base classifier f to the associated smoothed classifier g, where g(x) returns the class which is most likely to be predicted by the base classifier f when x is perturbed by a random noise sampled from q, i.e., g(x) = arg max c∈Y Pr u∼q(u) f (x + u) = c . The noise distribution is usually a symmetric log-concave distribution, i.e. q(u) = exp(-φ(u)) for some even and convex φ. Note that to control the robustness/accuracy tradeoff, we embed the noise level λ to q, then we have q λ (u) = exp(-φ( u λ )). We mix the notations q and q λ throughout the paper. Robustness guarantee for smoothed classifiers Suppose an adversary can perturb the input x inside the allowed set B, which is usually an p ball centered at x. For the case when B is 2 ball and q is Gaussian distribution N (0, σ 2 I), g(x) is robust within the radius R = σ 2 Φ -1 (p 1 ) -Φ -1 (p 2 ) ( ) where Φ is inverse cumulative distribution function, and (2019a) showed alternative derivation using the Lipschitz property of smoothed classifier. Furthermore when q is a centered Laplace distribution, the robustness certificate for 1 radius was derived by Teng et al. (2020) . Later, the proof methods are generalized to various distributions (may not be log-concave) that can certify various p radius (Yang et al., 2020) . Remark that the robustness guarantee depends on the noise distribution q λ and the performance of base classifier f under random perturbation with q λ . (4) where L CE is the cross-entropy loss and F is soft version of hard classifier f . They showed that training with CLF loss makes perform better than denoiser with only MSE loss. Alternatively, Saremi & Srivastava (2020) trained neural empirical bayes estimator that can refine the white noise. Nonetheless, those methods still suffer from expensive training of numerous denoisers with respect to each noise types and levels. p 1 = max c Pr[f (x + u) = c



] and p 2 = max c =g(x) Pr[f (x + u) = c]. Cohen et al. (2019) first derived the certified radius by using Neyman-Pearson lemma, and later Salman et al.

The idea of prepending denoiser to the classifier was first introduced by Salman et al.(2020). By training denoiser D θ : R d → R d , the smoothed classifier converted from f • D θ outperforms 'no-denoiser' baseline. They proposed training denoisers with mean squared error (MSE) loss or classification (CLF) loss, or combining both methods. Formally, they areL MSE (θ) = E x∼p,u∼q [ D θ (x + u) -x 2 ],(3)L CLF (θ) = E x∼p,u∼q [L CE (F (D θ (x + u)), f (x))].

VIA IMAGE DENOISINGEven though the randomized smoothing can convert any classifier to a provably robust classifier, the smoothed classifier from natural classifiers are below the standard as they are not capable of predicting randomly perturbed samples. Many previous studies focused on training classifiers accustomed to randomized smoothing, which spans from noisy data augmentation(Cohen et al., 2019; Li et al.,  2019)  to its variants such as adversarial training(Salman et al., 2019a)  or stability training(Lee  et al., 2019; Zhai et al., 2019). However, such methods are computationally expensive and require a massive number of classifiers per noise types and levels.

