TIB: DETECTING UNKNOWN OBJECTS VIA TWO-STREAM INFORMATION BOTTLENECK

Abstract

Detecting diverse objects, including ones never-seen-before during model training, is critical for the safe application of object detectors. To this end, a task of unsupervised out-of-distribution object detection (OOD-OD) is proposed to detect unknown objects without the reliance on an auxiliary dataset. For this task, it is important to reduce the impact of lacking unknown data for supervision and leverage in-distribution (ID) data to improve the model's discrimination ability. In this paper, we propose a method of Two-Stream Information Bottleneck (TIB), which consists of a standard Information Bottleneck and a dedicated Reverse Information Bottleneck (RIB). Specifically, after extracting the features of an ID image, we first define a standard IB network to disentangle instance representations that are beneficial for localizing and recognizing objects. Meanwhile, we present RIB to obtain simulative OOD features to alleviate the impact of lacking unknown data. Different from standard IB aiming to extract task-relevant compact representations, RIB is to obtain task-irrelevant representations by reversing the optimization objective of the standard IB. Next, to further enhance the discrimination ability, a mixture of information bottlenecks is designed to sufficiently capture object-related information. In the experiments, our method is evaluated on OOD-OD and incremental object detection. The significant performance gains over baselines show the superiorities of our method.

1. INTRODUCTION

With the rejuvenation of deep neural networks, for object detection, many advances Ren et al. (2015) ; Redmon et al. (2016) ; Carion et al. (2020) ; Chen et al. ( 2022) have been achieved. Most existing methods often follow a close-set assumption that the training and testing processes share the same category space. However, the practical scenario is open and filled with unknown objects, presenting significant challenges for object detectors trained based on the close-set assumption. To this end, a task of unsupervised out-of-distribution object detection (OOD-OD) Du et al. ( 2022b) is recently proposed, whose goal is to accurately detect the objects never-seen-before during training without accessing any auxiliary data. Obviously, addressing this task is helpful for promoting the safe deployment of object detectors in real scenes, e.g., autonomous driving. The main challenge of unsupervised OOD-OD is lacking supervision signals from OOD data during training Du et al. (2022b) . In particular, as shown in the left part of Fig. 1 , an object detector is typically optimized only based on the in-distribution (ID) data. During inference, the detector could accurately localize and recognize ID objects but easily produces overconfident incorrect predictions for OOD objects. The reason is that the object detector could not learn a clear discrimination boundary between ID objects and OOD objects in the case of lacking OOD data for supervision. Thus, for this task, one feasible solution is to extract simulative OOD data based on the ID data. And the simulative OOD data could be used to improve the discrimination ability of the object detector. In order to obtain simulative OOD data, it is general to leverage generative methods, e.g., generative adversarial networks Lee et al. (2018a) and mixup Zhang et al. (2018) , to synthesize OOD images. Though these methods have been demonstrated to be effective, using a large number of synthesized images may increase computational costs. Meanwhile, it is difficult to use synthesized images to cover the overall object space, which may weaken the discrimination performance for certain unknown objects.

