EXPLOITING PLAYBACKS IN UNSUPERVISED DOMAIN ADAPTATION FOR 3D OBJECT DETECTION Anonymous

Abstract

Self-driving cars must detect other vehicles and pedestrians in 3D to plan safe routes and avoid collisions. State-of-the-art 3D object detectors, based on deep learning, have shown promising accuracy but are prone to over-fit to domain idiosyncrasies, causing them to fail in new environments-a serious problem if autonomous vehicles are meant to operate freely. In this paper, we propose a novel learning approach that drastically reduces this gap by fine-tuning the detector on pseudo-labels in the target domain, which our method generates while the vehicle is parked, based on replays of previously recorded driving sequences. In these replays objects are tracked over time and detections are interpolated and extrapolatedcrucially, leveraging future information to catch hard cases. We show, on five autonomous driving datasets, that fine-tuning the detector on these pseudo-labels substantially reduces the domain-gap to new driving environments, yielding drastic improvements in accuracy and detection reliability.

1. INTRODUCTION

One of the fundamental learning problems in the context of self-driving cars, is the detection and localization of other traffic participants, such as cars, cyclists, and pedestrians in 3D. Typically, the input consists of LiDAR or pseudo-LiDAR (Wang et al., 2019b) point clouds (sometimes with accompanying images), and the outputs are sets of tight 3D bounding boxes that envelope the detected objects. The problem is particularly challenging, because the predictions must be highly accurate, reliable, and, importantly, be made in real time. The current state-of-the-art in 3D object detection is based on deep learning approaches (Qi et al., 2018; Shi et al., 2019; Yang et al., 2018; Shi et al., 2020) , trained on short driving segments with labeled bounding boxes (Geiger et al., 2012; 2013) , which yield up to 80% average precision on held-out segments (Shi et al., 2020) . However, as with all machine learning, these techniques succeed when the training data distribution matches the test data distribution. One possibility to ensure train/test consistency is to constrain self-driving cars to a small geo-fenced area. Here, a fleet of self-driving taxis might together collect accurate training data with exhaustive coverage so that the accuracy of the system is guaranteed. This, however, is fundamentally limiting. Ultimately, one would like to allow self-driving cars to be driven freely anywhere, similar to a human driven car. This unconstrained scenario introduces an inherent adaptation problem: The car producer cannot foresee where the owner will ultimately operate the car. The perception system might be trained on urban roads in Germany (Geiger et al., 2013; 2012) , but the car may be driven in the mountain regions in the USA, where other cars may be larger and fewer, the roads may be snowy, and buildings may look different. Past work has shown that such differences can cause > 35% drop in the accuracy of extant systems (Wang et al., 2020) . Closing this adaptation gap is one of the biggest remaining challenges for freely self-driving vehicles. Car owners, however, are likely to spend most of their driving time on similar routes (commuting to work, grocery stores, etc.), and leave their cars parked (e.g., at night) for extended amounts of time. This raises an intriguing possibility: the car can collect training data on these frequent trips; then retrain itself while offline to adapt to this new environment for subsequent online driving. Unfortunately, the data the car collects are unlabeled. The challenge is thus in unsupervised domain adaptation (Gong et al., 2012 ): The detection system, having been previously trained on labeled data from a source domain, must now adapt to a target domain where only unlabeled data are available.

