COLLECTIVE ROBUSTNESS CERTIFICATES: EXPLOITING INTERDEPENDENCE IN GRAPH NEURAL NETWORKS

Abstract

In tasks like node classification, image segmentation, and named-entity recognition we have a classifier that simultaneously outputs multiple predictions (a vector of labels) based on a single input, i.e. a single graph, image, or document respectively. Existing adversarial robustness certificates consider each prediction independently and are thus overly pessimistic for such tasks. They implicitly assume that an adversary can use different perturbed inputs to attack different predictions, ignoring the fact that we have a single shared input. We propose the first collective robustness certificate which computes the number of predictions that are simultaneously guaranteed to remain stable under perturbation, i.e. cannot be attacked. We focus on Graph Neural Networks and leverage their locality property -perturbations only affect the predictions in a close neighborhood -to fuse multiple single-node certificates into a drastically stronger collective certificate. For example, on the Citeseer dataset our collective certificate for node classification increases the average number of certifiable feature perturbations from 7 to 351.

1. INTRODUCTION

Most classifiers are vulnerable to adversarial attacks (Akhtar & Mian, 2018; Hao-Chen et al., 2020) . Slight perturbations of the data are often sufficient to manipulate their predictions. Even in scenarios where attackers are not present it is critical to ensure that models are robust since data can be noisy, incomplete, or anomalous. We study classifiers that collectively output many predictions based on a single input. This includes node classification, link prediction, molecular property prediction, image segmentation, part-of-speech tagging, named-entity recognition, and many other tasks. Various techniques have been proposed to improve the adversarial robustness of such models. One example is adversarial training (Goodfellow et al., 2015) , which has been applied to part-of-speech tagging (Han et al., 2020) , semantic segmentation (Xu et al., 2020b) and node classification (Feng et al., 2019) . Graph-related tasks in particular have spawned a rich assortment of techniques. These include Bayesian models (Feng et al., 2020) , data-augmentation methods (Entezari et al., 2020) and various robust network architectures (Zhu et al., 2019; Geisler et al., 2020) . There are also robust loss functions which either explicitly model an adversary trying to cause misclassifications (Zhou & Vorobeychik, 2020) or use regularization terms derived from robustness certificates (Zügner & Günnemann, 2019) . Other methods try to detect adversarially perturbed graphs (Zhang et al., 2019; Xu et al., 2020a) or directly correct perturbations using generative models (Zhang & Ma, 2020) . However, none of these techniques provide guarantees and they can only be evaluated based on their ability to defend against known adversarial attacks. Once a technique is established, it may subsequently be defeated using novel attacks (Carlini & Wagner, 2017) . We are therefore interested in deriving adversarial robustness certificates which provably guarantee that a model is robust. In this work we focus on node classification. 1 Here, the goal is to assign a label to each node in a single (attributed) graph. Node classification can be the target of either local or global adversarial attacks. Local attacks, such as Nettack (Zügner et al., 2018; Zügner et al., 2020) (2019) , attempt to alter the predictions of many nodes at once. With global attacks, the attacker is constrained by the fact that all predictions are based on a single shared input. To successfully attack some nodes the attacker might need to insert certain edges in the graph, while for another set of nodes the same edges must not be inserted. With such mutually exclusive adversarial perturbations, the attacker is forced to make a choice and can attack only one subset of nodes (see Fig. 1 ). Existing certificates (Zügner & Günnemann, 2019; Bojchevski & Günnemann, 2019; Bojchevski et al., 2020) are designed for local attacks, i.e. to certify the predictions of individual nodes. So far, there is no dedicated certificate for global attacks, i.e. to certify the predictions of many nodes at oncefoot_1 . A naïve certificate for global attacks can be constructed from existing single-node certificates as follows: One simply certifies each node's prediction independently and counts how many are guaranteed to be robust. This, however, implicitly assumes that an adversary can use different perturbed inputs to attack different predictions, ignoring the fact that we have a single shared input. We propose a collective robustness certificate for global attacks that directly computes the number of simultaneously certifiable nodes for which we can guarantee that their predictions will not change. This certificate explicitly models that the attacker is limited to a single shared input and thus accounts for the resulting mutual exclusivity of certain attacks. Specifically, we fuse multiple single-node certificates, which we refer to as base certificates, into a drastically (and provably) stronger collective one. Our approach is independent of how the base certificates are derived, and any improvements to the base certificates directly translate to improvements to the collective certificate. The key property which we exploit is locality. For example, in a k-layer message-passing graph neural network (Gilmer et al., 2017) the prediction for any given node depends only on the nodes in its k-hop neighborhood. Similarly, the predicted segment for any pixel depends only on the pixels in its receptive field, and the named-entity assigned to any word only depends on words in its surrounding. For classifiers that satisfy locality, perturbations to one part of the graph do not affect all nodes. Adversaries are thus faced with a budget allocation problem: It might be possible to attack different subsets of nodes via perturbations to different subgraphs, but performing all perturbations at once could exceed their adversarial budget. The naïve approach discussed above ignores this, overestimating how many nodes can be attacked. We design a simple (mixed-integer) linear program (LP) that enforces a single perturbed graph. It leverages locality by only considering the amount of perturbation within each receptive field when evaluating the single-node certificates (see Fig. 1 ). We evaluate our approach on different datasets and with different base certificates. We show that incorporating locality alone is sufficient to obtain significantly better results. Our proposed certificate: • Is the first collective certificate that explicitly models simultaneous attacks on multiple outputs. • Fuses individual certificates into a provably stronger certificate by explicitly modeling locality. • Is the first node classification certificate that can model not only global and local budgets, but also the number of adversary-controlled nodes, regardless of whether the base certificates support this.



While we focus on node classification, our approach can easily be applied to other multi-output classifiers. Chiang et al. (2020) certify multi-object detection, but they still treat each detected object independently.



, attempt to alter the Previous certificates consider each node independently. Most nodes cannot be certified since the adversary can choose a different perturbed graph per node (left). This is impossible in practice due to mutually exclusive perturbations. Our collective certificate enforces a single perturbed graph (center). It aggregates the amount of perturbation within each receptive field and then evaluates a single-node certificate to determine whether the corresponding prediction is robust (right). prediction of a particular node in the graph. Global attacks, as proposed by Zügner & Günnemann

