EFFICIENT RANDOMIZED SMOOTHING BY DENOISING WITH LEARNED SCORE FUNCTION

Abstract

The randomized smoothing with various noise distributions is a promising approach to protect classifiers from p adversarial attacks. However, it requires an ensemble of classifiers trained with different noise types and magnitudes, which is computationally expensive. In this work, we present an efficient method for randomized smoothing that does not require any re-training of classifiers. We built upon denoised smoothing, which prepends denoiser to the pre-trained classifier. We investigate two approaches to the image denoising problem for randomized smoothing and show that using the score function suits for both. Moreover, we present an efficient algorithm that can scale to randomized smoothing and can be applied regardless of noise types or levels. To validate, we demonstrate the effectiveness of our methods through extensive experiments on CIFAR-10 and ImageNet, under various p adversaries.

1. INTRODUCTION

The deep image classifiers are susceptible to deliberate noises as known as adversarial attacks (Szegedy et al., 2013; Goodfellow et al., 2014; Carlini & Wagner, 2017) . Even though many works proposed heuristics that can annul or mitigate adversarial attacks, most of them were broken by stronger attacks (Athalye et al., 2018; Athalye & Carlini, 2018) . The vulnerability of empirical defenses had led the researchers to scrutinize on certified defenses, which ensure the models to have constant output within the allowed set around given input. Unfortunately, many provable defenses are not feasible to large-scale neural networks because of their constraints on the architecture. On the other hand, randomized smoothing is a practical method that does not restrain the choice of neural networks. The randomized smoothing converts any base classifier to a smoothed classifier by making predictions over randomly perturbed samples. Then the smoothed classifiers are guaranteed to have a p certified radius, which is theoretically derived by the noise type used for smoothing. Since Cohen et al. ( 2019) derived tight 2 certified radius for Gaussian randomized smoothing, sequential works studied the certification bounds for various distributions (Teng et al., 2020; Yang et al., 2020) . As base classifiers are required to predict randomly perturbed samples, natural classifiers are not sufficient for randomized smoothing. Therefore, many works proposed training ensemble of base classifiers accustomed for randomized smoothing. However, since each trained classifier only applies to specific noise distribution and level, it is expensive to protect against various p adversaries and robustness strength. In this work, we tackle the inefficiency of training random-ensemble of base classifiers by using one universal image denoiser to the pre-trained classifier. The idea of using denoiser for randomized smoothing was first introduced by Salman et al. ( 2020) and is refer to denoised smoothing. One step further, we study general image denoising problem for randomized smoothing with two different approaches: 1) direct training of image denoiser, and 2) solve the optimization problem by using a generative model to project to the learned data manifold. Then, we show that the score function, which is the gradient of log-density, is crucial for both approaches. We exploit multi-scale denoising score matching (Song & Ermon, 2019) for score estimation, and propose an efficient algorithm simulated annealing for image denoising. Remark that we only require one score network to certify various noise distributions and levels. We provide experimentations on ImageNet and CIFAR-10 datasets to show the efficacy of our methods. Specifically, our denoisers perform better than original denoised smoothing, while can be applied to various noise types without any re-training. Further-

