FINDING PHYSICAL ADVERSARIAL EXAMPLES FOR AUTONOMOUS DRIVING WITH FAST AND DIFFERENTIABLE IMAGE COMPOSITING

Abstract

There is considerable evidence that deep neural networks are vulnerable to adversarial perturbations applied directly to their digital inputs. However, it remains an open question whether this translates to vulnerabilities in real-world systems. Specifically, in the context of image inputs to autonomous driving systems, an attack can be achieved only by modifying the physical environment, so as to ensure that the resulting stream of video inputs to the car's controller leads to incorrect driving decisions. Inducing this effect on the video inputs indirectly through the environment requires accounting for system dynamics and tracking viewpoint changes. We propose a scalable and efficient approach for finding adversarial physical modifications, using a differentiable approximation for the mapping from environmental modifications-namely, rectangles drawn on the road-to the corresponding video inputs to the controller network. Given the color, location, position, and orientation parameters of the rectangles, our mapping composites them onto pre-recorded video streams of the original environment. Our mapping accounts for geometric and color variations, is differentiable with respect to rectangle parameters, and uses multiple original video streams obtained by varying the driving trajectory. When combined with a neural network-based controller, our approach allows the design of adversarial modifications through end-to-end gradient-based optimization. We evaluate our approach using the Carla autonomous driving simulator, and show that it is significantly more scalable and far more effective at generating attacks than a prior black-box approach based on Bayesian Optimization.

1. INTRODUCTION

Computer vision has made revolutionary advances in recent years by leveraging a combination of deep neural network architectures with abundant high-quality perceptual data. One of the transformative applications of computational perception is autonomous driving, with autonomous cars and trucks already being evaluated for use in geofenced settings, and partial autonomy, such as highway assistance, leveraging state-of-the-art perception embedded in vehicles available to consumers. However, a history of tragic crashes involving autonomous driving, most notably Tesla (Thorbecke, 2020) and Uber (Hawkins, 2019) reveals that modern perceptual architectures still have some limitations even in non-adversarial driving environments. In addition, and more concerning, is the increasing abundance of evidence that state-of-the-art deep neural networks used in perception tasks are highly vulnerable to adversarial perturbations, or imperceptible noise that is added to an input image and deliberately designed to cause misclassification (Goodfellow et al., 2014; Yuan et al., 2019; Modas et al., 2020) . Furthermore, several lines of work consider specifically physical adversarial examples which modify the scene being captured by a camera, rather than the image (Kurakin et al., 2016; Eykholt et al., 2018; Sitawarin et al., 2018; Dutta, 2018; Duan et al., 2020) . Despite this body of evidence demonstrating vulnerabilities in deep neural network perceptual architectures, it is nevertheless not evident that such vulnerabilities are consequential in realistic autonomous driving, even if primarily using cameras for perception. First, most such attacks involve independent perturbations to a given input image. Autonomous driving is a dynamical system, so that a fixed adversarial perturbation to a scene is perceived through a series of distinct, but highly interdependent perspectives. Second, self-driving is a complex system that maps perceptual inputs Figure 1 : Overview. We collect and calibrate frames from the unmodified environment (shown in the green box), and given a choice of attack pattern parameters, composite the pattern to create approximate renderings of frames corresponding to placing the pattern in the environment. Our composition function is differentiable with respect to the attack pattern parameters, and we are thus able to use end-to-end gradient-based optimization when attacking a differentiable control network, to cause the network to output incorrect controls that cause the vehicle to deviate from its intended trajectory (from the green to the blue trajectory, as shown in the right column), and crash. to control outputs. Consequently, even if we succeed in causing the control outputs to deviate from normal, the vehicle will now perceive a sequence of frames that is different from those encountered on its normal path, and typically deploy self-correcting behavior in response. For example, if the vehicle is driving straight and then begins swerving towards the opposite lane, its own perception will inform the control that it's going in the wrong direction, and the controller will steer it back on course. To address these limitations, Bayesian Optimization (BO) (Archetti and Candelieri, 2019) was recently proposed as a way to design physical adversarial examples (2 black rectangles on road pavement) in Carla autonomous driving simulations (Dosovitskiy et al., 2017) against end-to-end autonomous driving architectures (Boloor et al., 2020) . The key challenge with this approach, however, is that attack design must execute actual experiments (e.g., simulations, or actual driving) for a larger number of iterations (1000 in the work above), making it impractical for large-scale or physical driving evaluation. Furthermore, it is not clear how well BO scales as we increase the complexity of the adversarial space beyond 2 black rectangles. We propose a highly scalable framework for designing physically realizable adversarial examples against end-to-end autonomous driving architectures. Our framework is illustrated in Figure 1 , and develops a differentiable pipeline for digitally approximating driving scenarios. The proposed approximation makes use of image compositing, learning homography and color mappings from a birds-eye view of embedded adversarial examples to projections of these in images based on actual driving frames, and sampling sequences of actual frames with small added random noise to control to ensure adequate sampling of possible perspectives. The entire process can then be fed into automatic differentiators to obtain adversarial examples that maximize a car's deviation from its normal sequence of controls (e.g., steering angle) for a target driving scenario. We evaluate the proposed framework using Carla simulations in comparison with the state-of-the-art BO method. Our experiments show that the resulting attacks are significantly stronger, with effects on induced deviations and road infractions often considerably outperforming BO, at a small fraction of actual driving runs required for training. Furthermore, we show that our approach yields attacks that are robust to unforeseen variations in weather and visibility. Related Work: Attacks on deep neural networks for computer vision tasks has been a subject of extensive prior research (Goodfellow et al., 2014; Yuan et al., 2019; Modas et al., 2020; Vorobeychik and Kantarcioglu, 2018) . The most common variation introduces imperceptible noise to pixels of an image in order to induce error in predictions, such as misclassification of the image or failure to detect an object in it. A more recent line of research has investigated physical adversarial examples (Kurakin et al., 2016; Athalye et al., 2017; Eykholt et al., 2018; Sitawarin et al., 2018; Dutta, 2018; Duan et al., 2020) , where the explicit goal is to implement these in the physical scene, so that the images of the scene subsequently captured by the camera and fed into a deep neural network result in a prediction error. In a related effort, Liu et al. ( 2019) developed a differentiable renderer that allows the attacker to devise higher-level perturbations of an image scene, such as geometry and lighting, through a differentiable renderer. However, most of these approaches attack a fixed input

