EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness Syndrome

Abstract

The ability to detect whether an object is a 2D or 3D object is extremely important in autonomous driving, since a detection error can have lifethreatening consequences, endangering the safety of the driver, passengers, pedestrians, and others on the road. Methods proposed to distinguish between 2 and 3D objects (e.g., liveness detection methods) are not suitable for autonomous driving, because they are object dependent or do not consider the constraints associated with autonomous driving (e.g., the need for real-time decision-making while the vehicle is moving). In this paper, we present EyeDAS , a novel few-shot learning-based method aimed at securing an object detector (OD) against the threat posed by the stereoblindness syndrome (i.e., the inability to distinguish between 2D and 3D objects). We evaluate EyeDAS 's real-time performance using 2,000 objects extracted from seven YouTube video recordings of street views taken by a dash cam from the driver's seat perspective. When applying EyeDAS to seven stateof-the-art ODs as a countermeasure, EyeDAS was able to reduce the 2D misclassification rate from 71.42-100% to 2.4% with a 3D misclassification rate of 0% (TPR of 1.0). Also, EyeDAS outperforms the baseline method and achieves an AUC of over 0.999.

1. Introduction

After years of research and development, automobile technology is rapidly approaching the point at which human drivers can be replaced, as commercial cars are now capable of supporting semi-autonomous driving. To create a reality that consists of commercial semi-autonomous cars, scientists had to develop the computerized driver intelligence required to: (1) continuously create a virtual perception of the physical surroundings (e.g., detect pedestrians, road signs, cars, etc.), (2) make decisions, and (3) perform the corresponding action (e.g., notify the driver, turn the wheel, stop the car). While computerized driver intelligence brought semi-autonomous driving to new heights in terms of safety (1), recent incidents have shown that semi-autonomous cars suffer from the stereoblindness syndrome: they react to 2D objects as if they were 3D objects due to their inability to distinguish between these two types of objects. This fact threatens autonomous car safety, because a 2D object (e.g., an image of a car, dog, person) in a nearby advertisement that is misdetected as a real object can trigger a reaction from a semi-autonomous car (e.g., cause it to stop in the middle of the road), as shown in Fig. 1 . Such undesired reactions may endanger drivers, passengers, and nearby pedestrians as well. As a result, there is a need to secure semi-autonomous cars against the perceptual challenge caused by the stereoblindness syndrome. The perceptual challenge caused by the stereoblindness syndrome stems from object detectors' (which obtain data from cars' video cameras) misclassification of 2D objects. One might argue that the stereoblindness syndrome can be addressed by adopting a sensor fusion approach: by cross-correlating data from the video cameras with data obtained by sensors aimed at detecting depth (e.g., ultrasonic sensors, radar). However, due to safety concerns, a "safety first" policy is implemented in autonomous vehicles, which causes them to consider a detected object as a real object even when it is detected by a single sensor without additional validation from another sensor (2; 3). This is also demonstrated in Fig. 1 In addition, while various methods have used liveness detection algorithms to detect whether an object is 2D/3D (4; 5; 6), the proposed methods do not provide the functionality required to distinguish between 2D/3D objects in an autonomous driving setup, because they are object dependent (they cannot generalize between different objects, e.g., cars and pedestrians) and do not take into account the real-time constraints associated with autonomous driving. As a result, there is a need for dedicated functionality that validates the detections of video camera based object detectors and considers the constraints of autonomous driving.

which shows how

In this paper, we present EyeDAS , a committee of models that validates objects detected by the on-board object detector. EyeDAS aims to secure a single channel object detector that obtains data from a video camera and provides a solution to the stereoblindness syndrome, i.e., distinguishes between 2 and 3D objects, while taking the constraints of autonomous driving (both safety and real-time constraints) into account. EyeDAS can be deployed on existing advanced driver-assistance systems (ADASs) without the need for additional sensors. EyeDAS is based on few-shot learning and consists of four lightweight unsupervised models, each of which utilizes a unique feature extraction method and outputs a 3D confidence score. Finally, a meta-classifier uses the output of the four models to determine whether the given object is a 2 or 3D object. We evaluate EyeDAS using a dataset collected from seven YouTube video recordings of street views taken by a dash cam from the driver's seat perspective; the 2D objects in the dataset were extracted from various billboards that appear in the videos. When applying EyeDAS to seven state-of-the-art ODs as a countermeasure, EyeDAS was able to reduce the 2D misclassification rate from 71.42-100% to 2.4% with a 3D misclassification rate of 0% (TPR of 1.0). We also show that EyeDAS outperforms the baseline method and achieves an AUC of over 0.999. In this research we make the following contributions: (1) we present a practical method for securing object detectors against the stereoblindness syndrome that meets the constraints of autonomous driving (safety and real-time constraints), and (2) we show that the method can be applied using few-shot learning, can be used to detect whether an inanimate object is a 2D or 3D object (i.e., distinguishes between a real car from an advertisement containing an image of a car), and can generalize to different types of objects and between cities. The remainder of this paper is structured as follows: In Section 2, we review related work. In Section 3, we present EyeDAS , explain its architecture, design considerations, and each expert in the committee of models. In Section 4, we evaluate EyeDAS 's performance under the constraints of autonomous driving, based on various YouTube video recordings taken by a dash cam from several places around the world. In Section 5 we discuss the limitations of EyeDAS , and in Section 6, we present a summary.



Figure 1: Two well-known incidents that demonstrate how Teslas misdetect 2D objects as real objects.

