FINDING PHYSICAL ADVERSARIAL EXAMPLES FOR AUTONOMOUS DRIVING WITH FAST AND DIFFERENTIABLE IMAGE COMPOSITING

Abstract

There is considerable evidence that deep neural networks are vulnerable to adversarial perturbations applied directly to their digital inputs. However, it remains an open question whether this translates to vulnerabilities in real-world systems. Specifically, in the context of image inputs to autonomous driving systems, an attack can be achieved only by modifying the physical environment, so as to ensure that the resulting stream of video inputs to the car's controller leads to incorrect driving decisions. Inducing this effect on the video inputs indirectly through the environment requires accounting for system dynamics and tracking viewpoint changes. We propose a scalable and efficient approach for finding adversarial physical modifications, using a differentiable approximation for the mapping from environmental modifications-namely, rectangles drawn on the road-to the corresponding video inputs to the controller network. Given the color, location, position, and orientation parameters of the rectangles, our mapping composites them onto pre-recorded video streams of the original environment. Our mapping accounts for geometric and color variations, is differentiable with respect to rectangle parameters, and uses multiple original video streams obtained by varying the driving trajectory. When combined with a neural network-based controller, our approach allows the design of adversarial modifications through end-to-end gradient-based optimization. We evaluate our approach using the Carla autonomous driving simulator, and show that it is significantly more scalable and far more effective at generating attacks than a prior black-box approach based on Bayesian Optimization.

1. INTRODUCTION

Computer vision has made revolutionary advances in recent years by leveraging a combination of deep neural network architectures with abundant high-quality perceptual data. One of the transformative applications of computational perception is autonomous driving, with autonomous cars and trucks already being evaluated for use in geofenced settings, and partial autonomy, such as highway assistance, leveraging state-of-the-art perception embedded in vehicles available to consumers. However, a history of tragic crashes involving autonomous driving, most notably Tesla (Thorbecke, 2020) and Uber (Hawkins, 2019) reveals that modern perceptual architectures still have some limitations even in non-adversarial driving environments. In addition, and more concerning, is the increasing abundance of evidence that state-of-the-art deep neural networks used in perception tasks are highly vulnerable to adversarial perturbations, or imperceptible noise that is added to an input image and deliberately designed to cause misclassification (Goodfellow et al., 2014; Yuan et al., 2019; Modas et al., 2020) . Furthermore, several lines of work consider specifically physical adversarial examples which modify the scene being captured by a camera, rather than the image (Kurakin et al., 2016; Eykholt et al., 2018; Sitawarin et al., 2018; Dutta, 2018; Duan et al., 2020) . Despite this body of evidence demonstrating vulnerabilities in deep neural network perceptual architectures, it is nevertheless not evident that such vulnerabilities are consequential in realistic autonomous driving, even if primarily using cameras for perception. First, most such attacks involve independent perturbations to a given input image. Autonomous driving is a dynamical system, so that a fixed adversarial perturbation to a scene is perceived through a series of distinct, but highly interdependent perspectives. Second, self-driving is a complex system that maps perceptual inputs

