DRIVING THROUGH THE LENS: IMPROVING GENERAL-IZATION OF LEARNING-BASED STEERING USING SIM-ULATED ADVERSARIAL EXAMPLES

Abstract

To ensure the wide adoption and safety of autonomous driving, vehicles need to be able to drive under various lighting, weather, and visibility conditions in different environments. These external and environmental factors, along with internal factors associated with sensors, can pose significant challenges to perceptual data processing, hence affecting the decision-making of the vehicle. In this work, we address this critical issue by analyzing the sensitivity of the learning algorithm with respect to varying quality in the image input for autonomous driving. Using the results of sensitivity analysis, we further propose an algorithm to improve the overall performance of the task of "learning to steer". The results show that our approach is able to enhance the learning outcomes up to 48%. A comparative study drawn between our approach and other related techniques, such as data augmentation and adversarial training, confirms the effectiveness of our algorithm as a way to improve the robustness and generalization of neural network training for self-driving cars.

1. INTRODUCTION

Autonomous driving is a complex task that requires many software and hardware components to operate reliably under highly disparate and often unpredictable conditions. While on the road, vehicles are likely to experience day and night, clear and foggy conditions, sunny and rainy days, as well as bright cityscapes and dark tunnels. All these external factors can lead to quality variations in image data, which are then served as inputs to autonomous systems. Adding to the complexity are internal factors of the camera (e.g., those associated with hardware), which can also result in varying-quality images as input to learning algorithms. One can harden machine learning systems to these degradations by simulating them at train time (Chao et al., 2019) . However, there currently lacks algorithmic tools for analyzing the sensitivity of real-world neural network performance on the properties of simulated training images and, more importantly, a mechanism to leverage such a sensitivity analysis for improving learning outcomes. In this work, we quantify the influence of varying-quality images on the task of "learning to steer" and provide a systematic approach to improve the performance of the learning algorithm based on quantitative analysis. Image degradations can often be simulated at training time by adjusting a combinations of image quality attributes, including blur, noise, distortion, color representations (such as RGB or CMY) hues, saturation, and intensity values (HSV), etc. However, identifying the correct combination of simulated attribute parameters to obtain optimal performance on real data during training is a difficult-if not impossible-task, requiring exploration of a high dimensional parameterized space. The first goal of this work is to design a systematic method for studying, predicting, and quantifying the impact of an image degradation on system performance after training. We do this by measuring the similarity between real-world datasets and simulated datasets with degradations using the well known Fréchet Inception Distance (FID). We find that the FID between simulated and real datasets is a good predictor of whether training on simulated data will produce good performance in the real world. We also use FID between different simulated datasets as a unified metric to parameterize the severity of various image quality degradations on the same FID-based scale (see Section 3). Our second goal is to borrow concepts from the adversarial training literature (Madry et al., 2018; Shafahi et al., 2019; Xie et al., 2020) to build a scalable training scheme to improve the robustness of autonomous driving systems against multi-faceted image degradations, while increasing the overall accuracy of the steering task for self-driving cars. Our proposed method builds a dataset of adversarially degraded images by apply evolutionary optimization within the space of possible degredations during training. The method begins by training on a combination of real and simulated/degraded data using arbitrary degradation parameters. On each training iteration, the parameters are updated to generate a new degradation combination so that system performance is (approximately) minimized. The network is then trained on these adversarially simulated images to promote robustness. Our proposed algorithm speeds up the combinatorial degradation updates by discretizing the search space using our FID-based parameterization. See details in Section 4. Experiments show that the algorithm improves the task performance of "learning to steer" up to 48% in mean accuracy over strong baselines. We compare our approach with other related techniques, such as data augmentation and adversarial training, and the results show that our method consistently achieves higher performance. The method also improves the performance on datasets contaminated with complex combinations of perturbations (up to 33%), and additionally boosts performance on degradations that are not seen during training, like simulated snow, fog, and frost (up to 29%), as discussed in Section 4.2. For evaluation, we propose a more complete robustness evaluation standard under 4 different scenarios: clean data, single perturbation data, multiple perturbation data, and unseen perturbation data. While previous works usually use one or two scenarios, our work is among the first to test results under all 4 meaningful scenarios that are important for evaluating robustness of algorithms. We also plan to release "learn to steer under perturbations" datasets for benchmarking. These datasets will contain a base dataset; simulated adversarial datasets with multiple levels of image degradation due to either single or multiple image quality attributes; and simulated adversarial datasets with five levels of combinatorial perturbations due to a different set of unseen factors on images with corruptions in ImageNet-C (Hendrycks & Dietterich, 2019), totaling about 1.2M images and 120 datasets in all. (See Section 4.5.)

2. RELATED WORK

The influence of the noise and distortion effects on real images on learning tasks has been explored in the last five years. For example, researchers have examined the impact of optical blur on convolutional neural networks and present a fine-tuning method for recovering lost accuracy using blurred images (Vasiljevic et al., 2016) . This fine-tuning method resolved lost accuracy when images were distorted instead of blurred (Zhou et al., 2017) . While these fine tuning methods are promising, (Dodge & Karam, 2017b) find that tuning to one type of image quality reduction effect would cause poor generalization to other types of quality reduction effects. Comparison on image classification performance between deep neural networks and humans has been done by (Dodge & Karam, 2017a) , and found to be similar with images of good quality. However, Deep Neural Networks struggle significantly more than humans on low-quality, distorted or noisy images. Color spaces have also been shown to negatively affect performance of learning models. It was observed that perturbations affect learning performance more in the Y channel of YCbCr color space that is analogous to the YUV color space (Pestana et al., 2020) . Another work (Wu et al., 2020) studies the effect of Instagram filters, which mostly change the coloring of an image, on learning tasks. In this work, we study nine common factors characterizing image quality, i.e., blur, noise, distortion, three-color (Red-Green-Blue or RGB) channels, and hues, saturation, and intensity values (HSV). Not only does our study analyze a more comprehensive set of image attributes that could influence the learning-based steering task, but we also parameterize these nine factors into one integrated image-quality space using the Fréchet Inception Distance as the unifiying metric, thus enabling us to conduct sensitivity analysis. Researchers have also explored how to improve the robustness of learning algorithms under various image quality degradations. One recent work (Tran et al., 2017) provides a novel Bayesian formulation for data augmentation. Cubuk et al. (2018) proposes an approach to automatically search for improved data augmentation policies. Ghosh et al. (2018) performs analyses on the performance of Convolutional Neural Networks on quality degradations due to common causes like compression loss, noise, blur, and contrast, etc. and introduces a method to improve the learning outcomes using

