CERTIFIED TRAINING: SMALL BOXES ARE ALL YOU NEED

Abstract

To obtain, deterministic guarantees of adversarial robustness, specialized training methods are used. We propose, SABR, a novel such certified training method, based on the key insight that propagating interval bounds for a small but carefully selected subset of the adversarial input region is sufficient to approximate the worst-case loss over the whole region while significantly reducing approximation errors. We show in an extensive empirical evaluation that SABR outperforms existing certified defenses in terms of both standard and certifiable accuracies across perturbation magnitudes and datasets, pointing to a new class of certified training methods promising to alleviate the robustness-accuracy trade-off.

1. INTRODUCTION

As neural networks are increasingly deployed in safety-critical domains, formal robustness guarantees against adversarial examples (Biggio et al., 2013; Szegedy et al., 2014) are becoming ever more important. However, despite significant progress, specialized training methods that improve certifiability at the cost of severely reduced accuracies are still required to obtain deterministic guarantees. Given an input region defined by an adversary specification, both training and certification methods compute a network's reachable set by propagating a symbolic over-approximation of this region through the network (Singh et al., 2018; 2019a; Gowal et al., 2018a) . Depending on the propagation method, both the computational complexity and approximation-tightness can vary widely. For certified training, an over-approximation of the worst-case loss is computed from this reachable set and then optimized (Mirman et al., 2018; Wong et al., 2018) . Surprisingly, the least precise propagation methods yield the highest certified accuracies as more precise methods induce harder optimization problems (Jovanovic et al., 2021) . However, the large approximation errors incurred by these imprecise methods lead to over-regularization and thus poor accuracy. Combining precise worst-case loss approximations and a tractable optimization problem is thus the core challenge of certified training. In this work, we tackle this challenge and propose a novel certified training method, SABR, Small Adversarial Bounding Regions, based on the following key insight: by propagating small but carefully selected subsets of the adversarial input region with imprecise methods (i.e., BOX), we can obtain both well-behaved optimization problems and precise approximations of the worst-case loss. This yields less over-regularized networks, allowing SABR to improve on state-of-the-art certified defenses in terms of both standard and certified accuracies across settings, thereby pointing to a new class of certified training methods.

Main Contributions Our main contributions are:

• A novel certified training method, SABR, reducing over-regularization to improve both standard and certified accuracy ( §3). • A theoretical investigation motivating SABR by deriving new insights into the growth of BOX relaxations during propagation ( §4). • An extensive empirical evaluation demonstrating that SABR outperforms all state-of-theart certified training methods in terms of both standard and certifiable accuracies on MNIST, CIFAR-10, and TINYIMAGENET ( §5). * Equal contribution 1

