LOCALIZED RANDOMIZED SMOOTHING FOR COLLECTIVE ROBUSTNESS CERTIFICATION

Abstract

Models for image segmentation, node classification and many other tasks map a single input to multiple labels. By perturbing this single shared input (e.g. the image) an adversary can manipulate several predictions (e.g. misclassify several pixels). Collective robustness certification is the task of provably bounding the number of robust predictions under this threat model. The only dedicated method that goes beyond certifying each output independently is limited to strictly local models, where each prediction is associated with a small receptive field. We propose a more general collective robustness certificate for all types of models. We further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). The certificate is based on our novel localized randomized smoothing approach, where the random perturbation strength for different input regions is proportional to their importance for the outputs. Localized smoothing Paretodominates existing certificates on both image segmentation and node classification tasks, simultaneously offering higher accuracy and stronger certificates.

1. INTRODUCTION

There is a wide range of tasks that require models making multiple predictions based on a single input. For example, semantic segmentation requires assigning a label to each pixel in an image. When deploying such multi-output classifiers in practice, their robustness should be a key concern. After all -just like simple classifiers (Szegedy et al., 2014) -they can fall victim to adversarial attacks (Xie et al., 2017; Zügner & Günnemann, 2019; Belinkov & Bisk, 2018) . Even without an adversary, random noise or measuring errors can cause predictions to unexpectedly change. We propose a novel method providing provable guarantees on how many predictions can be changed by an adversary. As all outputs operate on the same input, they have to be attacked simultaneously by choosing a single perturbed input, which can be more challenging for an adversary than attacking them independently. We must account for this to obtain a proper collective robustness certificate. The only dedicated collective certificate that goes beyond certifying each output independently (Schuchardt et al., 2021) is only beneficial for models we call strictly local, where each output depends on a small, pre-defined subset of the input. Multi-output classifiers , however, are often only softly local. While all their predictions are in principle dependent on the entire input, each output may assign different importance to different subsets. For example, convolutional networks for image segmentation can have small effective receptive fields (Luo et al., 2016; Liu et al., 2018) , i.e. primarily use a small region of the image in labeling each pixel. Many models for node classification are based on the homophily assumption that connected nodes are mostly of the same class. Thus, they primarily use features from neighboring nodes. Transformers, which can in principle attend to arbitrary parts of the input, may in practice learn "sparse" attention maps, with the prediction for each token being mostly determined by a few (not necessarily nearby) tokens (Shi et al., 2021) .

