LIMITATIONS OF PIECEWISE LINEARITY FOR EFFICIENT ROBUSTNESS CERTIFICATION

Abstract

Certified defenses against small-norm adversarial examples have received growing attention in recent years; though certified accuracies of state-of-the-art methods remain far below their non-robust counterparts, despite the fact that benchmark datasets have been shown to be well-separated at far larger radii than the literature generally attempts to certify. In this work, we offer insights that identify potential factors in this performance gap. Specifically, our analysis reveals that piecewise linearity imposes fundamental limitations on the tightness of leading certification techniques. These limitations are felt in practical terms as a greater need for capacity in models hoped to be certified efficiently. Moreover, this is in addition to the capacity necessary to learn a robust boundary, studied in prior work. However, we argue that addressing the limitations of piecewise linearity through scaling up model capacity may give rise to potential difficulties-particularly regarding robust generalization-therefore, we conclude by suggesting that developing smooth activation functions may be the way forward for advancing the performance of certified neural networks.

1. INTRODUCTION

Since the discovery of adversarial examples (Szegedy et al., 2014) , defenses against malicious input perturbations to deep learning systems have received notable attention. While many early-proposed defenses-such as adversarial training (Madry et al., 2018) -are heuristic in nature, a growing body of work seeking provable defenses has arisen (Cohen et al., 2019; Croce et al., 2019; Fromherz et al., 2021; Huang et al., 2021; Jordan et al., 2019; Lee et al., 2020; Leino & Fredrikson, 2021; Leino et al., 2021; Li et al., 2019; Singla et al., 2022; Trockman & Kolter, 2021; Wong et al., 2018; Zhang et al., 2018) . Generally, such defenses attempt to provide a certificate of local robustness (given formally in Definition 1), which guarantees a network's prediction on a given point is stable under small perturbations (typically in Euclidean or sometimes ∞ space); this precludes the possibility of small-norm adversarial examples on certified points. The success of a certified defense is typically measured empirically using verified robust accuracy (VRA), which reflects the fraction of points that are both (i) classified correctly and (ii) certified as locally robust. Despite the fact that perfect robust classification (i.e., 100% VRA) is known to be possible on standard datasets at the adversarial perturbation budgets used in the literature (Yang et al., 2020b) , this possibility is far from realized in the current state of the art. For example, on the benchmark dataset CIFAR-10, state-of-the-art methods offering deterministic guarantees of 2 robustnessfoot_0 have remained at approximately 60% VRA (Huang et al., 2021; Leino et al., 2021; Singla et al., 2022; Trockman & Kolter, 2021) , while non-robust models handily eclipse 95% accuracy. It is difficult to precisely account for this discrepancy; though among other reasons, state-of-the-art methods typically use loose bounds to perform certification-as exact certification is (for general ReLU networks) NP-complete (Katz et al., 2017; Sinha et al., 2018) -which conceivably leads to falsely flagging truly robust points or to over-regularization of the learned model. While conservative approximations may be necessary to perform efficient certification (and to facilitate efficient robust training), it is certainly possible that they foil reasonable hopes for "optimality." In this work, we



In this work we primarily consider certified defenses that provide a deterministic guarantee of local robustness, as opposed to a statistical guarantee. For further discussion of this point, see Section 4.

