OOD-ODBENCH: AN OBJECT DETECTION BENCH-MARK FOR OOD GENERALIZATION ALGORITHMS

Abstract

The consensus about machine learning tasks, such as object detection, is still the test data are drawn from the same distribution as the training data, which is known as IID (Independent and Identically Distributed). However, it can not avoid being confronted with OOD (Out-of-Distribution) scenarios in real practice. It is risky to apply an object detection algorithm without figuring out its OOD generalization performance. On the other hand, a plethora of OOD generalization algorithms has been proposed to amortize the gap between the in-house and open-world performances of machine learning systems. However, their effectiveness was only demonstrated in the image classification tasks. It is still an opening question of how these algorithms perform on more complex tasks. In this paper, we first specify the setting of OOD-OD (OOD generalization object detection). Then, we propose OOD-ODBench consisting of four OOD-OD benchmark datasets to evaluate various object detection and OOD generalization algorithms. From extensive experiments on OOD-ODBench, we find that existing OOD generalization algorithms fail dramatically when applied to the more complex object detection tasks. This raises questions over the current progress on a large number of these algorithms and whether they can be effective in practice beyond simple toy examples. For future work, we sincerely hope that OOD-ODBench can serve as a foothold for OOD generalization object detection research.

1. INTRODUCTION

Modern object detection methods (Liu et al., 2021; Huang et al., 2019; Pang et al., 2019; Wu et al., 2019; Zhang et al., 2020a; Sun et al., 2020; Zhu et al., 2021; Ge et al., 2021) have achieved many progresses on various applications, such as autonomous driving and industrial defect detection. Tremendous efforts have been devoted to improving an object detector's performance on standard datasets, such as MS-COCO (Lin et al., 2014) . While these efforts have seen impacts on industry (Redmon et al., 2016; Redmon & Farhadi, 2017; 2018; Bochkovskiy et al., 2020; Ge et al., 2021) , the improvements are becoming marginal recently and most achievements are accompanied by an inherent assumption, i.e. , the training data and the test data are IID (Independent and Identically Distributed). However, this assumption is unlikely to hold in real-world scenarios. For example, an autonomous system suffers from different environmental conditions (Dai & Gool, 2018; Volk et al., 2019 ); a medical system fails to work consistently among hospitals when data are collected from different equipment (de Castro et al., 2019; Albadawy et al., 2018; Perone et al., 2019) . As a consequence, models trained on IID dataset are susceptible to a subtle disturbance in test data distribution (Outof-Distribution) and fail to generalize to real scenarios (Torralba & Efros, 2011) . Previous research devoted to encountering this train-test discrepancy can be summarized as either "less complex" or "complex but not general". From the first perspective, a plethora of Domain Generalization (DG) algorithms (Arjovsky et al., 2019; Ahuja et al., 2021; Li et al., 2018b; Sun & Saenko, 2016; Xu et al., 2020c; Yan et al., 2020; Krueger et al., 2021; Pezeshki et al., 2020; Parascandolo et al., 2021; Koyama & Yamaguchi, 2021; Huang et al., 2020; Sagawa et al., 2019) concentrate on improving OOD generalization ability. But they are simply evaluated on the image classification. The effectiveness is unknown when applied to the complex task (object detection). On the other perspective, numerous Domain Adaption (DA) algorithms (Chen et al., 2018; He & Zhang, 2020; Rodriguez & Mikolajczyk, 2019; Xu et al., 2020a; Su et al., 2020; Xu et al., 2020b; Soviany et al., 2019; Deng et al., 2020; Chen et al., 2021) aim to build an optimal object detector that can be generalized into

