EXPLOITING VERIFIED NEURAL NETWORKS VIA FLOATING POINT NUMERICAL ERROR

Abstract

Motivated by the need to reliably characterize the robustness of deep neural networks, researchers have developed verification algorithms for deep neural networks. Given a neural network, the verifiers aim to answer whether certain properties are guaranteed with respect to all inputs in a space. However, little attention has been paid to floating point numerical error in neural network verification. We exploit floating point errors in the inference and verification implementations to construct adversarial examples for neural networks that a verifier claims to be robust with respect to certain inputs. We argue that, to produce sound verification results, any verification system must accurately (or conservatively) model the effects of any float point computations in the network inference or verification system.

1. INTRODUCTION

Deep neural networks (DNNs) are known to be vulnerable to adversarial inputs (Szegedy et al., 2014) , which are images, audio, or texts indistinguishable to human perception that cause a DNN to give substantially different results. This situation has motivated the development of network verification algorithms that claim to prove the robustness of a network (Bunel et al., 2020; Tjeng et al., 2019; Salman et al., 2019) , specifically that the network produces identical classifications for all inputs in a perturbation space around a given input. Verification algorithms typically reason about the behavior of the network assuming real-valued arithmetic. In practice, however, the computation of both the verifier and the neural network is performed on physical computers that use floating point numbers and floating point arithmetic to approximate the underlying real-valued computations. This use of floating point introduces numerical error that can potentially invalidate the guarantees that the verifiers claim to provide. Moreover, the existence of multiple software and hardware systems for DNN inference further complicates the situation, because different implementations exhibit different numerical error characteristics. We present concrete instances where numerical error leads to unsound verification of real-valued networks. Specifically, we train robust networks on the MNIST and CIFAR10 datasets. We work with the MIPVerify complete verifier (Tjeng et al., 2019) and several inference implementations included in the PyTorch (Paszke et al., 2019) framework. For each implementation, we construct image pairs (x 0 , x adv ) where x 0 is a brightness modified natural image, such that the implementation classifies x adv differently from x 0 , x adv falls in a ∞ -bounded perturbation space around x 0 , and the verifier incorrectly claims that no such adversarial image x adv exists for x 0 within the perturbation space. Moreover, we show that the incomplete verifier CROWN is also vulnerable to floating point error. Our method of constructing adversarial images is not limited to our setting, and it is applicable to other verifiers that do not soundly model floating point arithmetic.

2. BACKGROUND AND RELATED WORK

Training robust networks: Researchers have developed various techniques to train robust networks (Madry et al., 2018; Mirman et al., 2018; Tramer & Boneh, 2019; Wong et al., 2020) . Madry et al. formulate the robust training problem as minimizing the worst loss within the input perturbation and propose to train robust networks on the data generated by the Projected Gradient Descent

