A DISTRIBUTIONAL ROBUSTNESS CERTIFICATE BY RANDOMIZED SMOOTHING

Abstract

The robustness of deep neural networks against adversarial example attacks has received much attention recently. We focus on certified robustness of smoothed classifiers in this work, and propose to use the worst-case population loss over noisy inputs as a robustness metric. Under this metric, we provide a tractable upper bound serving as a robustness certificate by exploiting the duality. To improve the robustness, we further propose a noisy adversarial learning procedure to minimize the upper bound following the robust optimization framework. The smoothness of the loss function ensures the problem easy to optimize even for non-smooth neural networks. We show how our robustness certificate compares with others and the improvement over previous works. Experiments on a variety of datasets and models verify that in terms of empirical accuracies, our approach exceeds the state-of-the-art certified/heuristic methods in defending adversarial examples.

1. INTRODUCTION

Deep neural networks (DNNs) have been known to be vulnerable to adversarial example attacks: by feeding the DNN with slightly perturbed inputs, the attack alters the prediction output. The attack can be fatal in performance-critical systems such as autonomous vehicles or automated tumor diagnosis. A DNN is robust when it can resist such an attack that, as long as the range of the perturbation is not too large (usually invisible by human), the model produces an expected output despite of the specific perturbation. Various approaches have been proposed for improving the robustness of DNNs, with or without a performance guarantee. Although a number of approaches have been proposed for certified robustness, it is vague how robustness should be defined. For example, works including Cohen et al. (2019); Pinot et al. (2019); Li et al. (2019); Lecuyer et al. (2019) propose smoothed classifiers to ensure the inputs with adversarial perturbation to be classified into the same class as the inputs without. However, since both inputs are inserted randomized noise, it cannot be guaranteed that the inputs are classified into the correct class. It is possible that the adversarially perturbed input has the same label as the original one which is wrongly classified by the DNN. In this case, the robustness guarantee does not make sense any more. Further, the robustness guarantee is provided at the instance level, i.e., within a certain perturbation range, the modification of an input instance cannot affect the prediction output. But a DNN is a statistical model to be evaluated on the input distribution, rather than a single instance. Instead of counting the number of input instances meeting the robustness definition, it is desired to evaluate the robustness of a DNN over the input distribution. We introduce the distributional risk as a DNN robustness metric, and propose a noisy adversarial learning (NAL) procedure based on distributional robust optimization, which provides a provable guarantee. Assume a base classifier f trying to map instance x 0 to corresponding label y. It is found that when fed with the perturbed instance x (within a l 2 ball centered at x 0 ), a smoothed classifier g(x) = E Z [f (x + z)] with z ∼ Z = N (0, σ 2 I) can provably return the same label as g(x 0 ) does. However, we think such a robustness guarantee cannot ensure g(x 0 ) to be correctly classified as y, resulting in unsatisfying performance in practice. Instead, we evaluate robustness as the worst-case loss over the distribution of noisy inputs. For simplicity, we jointly express the input instance and the label as x 0 ∼ P 0 where P 0 is the distribution of the original input. By using (•) as the loss function, we evaluate DNNs by the worst-case distributional risk: sup S E S [ (θ; s)]. The classifier is parameterized by θ ∈ Θ, and s = x + z ∼ S where S is a distribution within a certain distance from P 0 . We prove such a loss is upper bounded by a data-dependent certificate, which can be optimized by the noisy adversarial training procedure: minimize θ∈Θ sup S E S [ (θ; s)]. (1) Compared to previous robustness certificates via smoothed classifiers, our method provides a provable guarantee w.r.t. the ground truth input distribution. Letting the optimized θ be the parameter of g(•) and f (•) respectively, we further show that the smoothed classifier g(•) provides an improved robustness certificate than that of f (•), due to a tighter bound on the worst-case loss. The key is that, for mild perturbations, we adopt a Lagrangian relaxation for the usual loss (θ; x+z) as the robust surrogate, and the surrogate is strongly concave in x and hence easy to optimize. Our approach enjoys convergence guarantee similar to the method in Sinha et al. ( 2018), but different from Sinha et al. ( 2018), our approach does not require to be smooth, and thus can be applied to arbitrary neural networks. The advantage of the smoothed classifier also lies in a tighter robustness certificate than the base classifier. The intuition is that, in the inner maximization step, instead of seeking one direction which maximizes the loss, our approach performs gradient ascent along the direction which maximizes the total loss of examples sampled from the neighborhood of the original input. The noisy adversarial training procedure produces smoothed classifiers robust against the neighborhood of the worst-case adversarial examples with a certified bound. Highlights of our contribution are as follows. First, we review the drawbacks in the previous definition of robustness, and propose to evaluate robustness by the worst-case loss over the input distribution. Second, we derive a data-dependent upper bound for the worst-case loss, constituting a robustness certificate. Third, by minimizing the robustness certificate in the training loop, we propose noisy adversarial learning for enhancing model robustness, in which the smoothness property entails the computational tractability of the certificate. Through both theoretical analysis and experimental results, we verify that our certified DNNs enjoy better accuracies compared with the state-of-the-art defending adversarial example attacks.

2. RELATED WORK

Works proposed to defend against adversarial example attacks can be categorized into the following categories. In 



Certified defences are certifiably robust against any adversarial input within an p -norm perturbation range from the original input. A line of works construct a computationally tractable relaxation for computing an upper bound on the worst-case loss over all valid attacks. Lagrangian relaxation of the loss function, and it is provably robust against adversarial input distributions within a Wasserstein ball centered around the original input distribution. The certificate of our work is constructed on a Lagrangian relaxation form of the worst-case loss, but has a broader applicability than Sinha et al. (2018) with a tighter loss bound due to the smoothness property. An alternative line of works propose to select appropriate surrogates for each neuron activation layer by layer (Weng et al. (2018); Zhang et al. (2018)) to facilitate the search for a certified lower bound. By integrating with interval bound propagation (Gowal et al. (2018)), Zhang et al. (2020)

