TIB: DETECTING UNKNOWN OBJECTS VIA TWO-STREAM INFORMATION BOTTLENECK

Abstract

Detecting diverse objects, including ones never-seen-before during model training, is critical for the safe application of object detectors. To this end, a task of unsupervised out-of-distribution object detection (OOD-OD) is proposed to detect unknown objects without the reliance on an auxiliary dataset. For this task, it is important to reduce the impact of lacking unknown data for supervision and leverage in-distribution (ID) data to improve the model's discrimination ability. In this paper, we propose a method of Two-Stream Information Bottleneck (TIB), which consists of a standard Information Bottleneck and a dedicated Reverse Information Bottleneck (RIB). Specifically, after extracting the features of an ID image, we first define a standard IB network to disentangle instance representations that are beneficial for localizing and recognizing objects. Meanwhile, we present RIB to obtain simulative OOD features to alleviate the impact of lacking unknown data. Different from standard IB aiming to extract task-relevant compact representations, RIB is to obtain task-irrelevant representations by reversing the optimization objective of the standard IB. Next, to further enhance the discrimination ability, a mixture of information bottlenecks is designed to sufficiently capture object-related information. In the experiments, our method is evaluated on OOD-OD and incremental object detection. The significant performance gains over baselines show the superiorities of our method.

1. INTRODUCTION

With the rejuvenation of deep neural networks, for object detection, many advances Ren et al. (2015) ; Redmon et al. (2016) ; Carion et al. (2020) ; Chen et al. ( 2022) have been achieved. Most existing methods often follow a close-set assumption that the training and testing processes share the same category space. However, the practical scenario is open and filled with unknown objects, presenting significant challenges for object detectors trained based on the close-set assumption. To this end, a task of unsupervised out-of-distribution object detection (OOD-OD) Du et al. (2022b) is recently proposed, whose goal is to accurately detect the objects never-seen-before during training without accessing any auxiliary data. Obviously, addressing this task is helpful for promoting the safe deployment of object detectors in real scenes, e.g., autonomous driving. The main challenge of unsupervised OOD-OD is lacking supervision signals from OOD data during training Du et al. (2022b) . In particular, as shown in the left part of Fig. 1 , an object detector is typically optimized only based on the in-distribution (ID) data. During inference, the detector could accurately localize and recognize ID objects but easily produces overconfident incorrect predictions for OOD objects. The reason is that the object detector could not learn a clear discrimination boundary between ID objects and OOD objects in the case of lacking OOD data for supervision. Thus, for this task, one feasible solution is to extract simulative OOD data based on the ID data. And the simulative OOD data could be used to improve the discrimination ability of the object detector. In order to obtain simulative OOD data, it is general to leverage generative methods, e.g., generative adversarial networks Lee et al. (2018a) and mixup Zhang et al. (2018) , to synthesize OOD images. Though these methods have been demonstrated to be effective, using a large number of synthesized images may increase computational costs. Meanwhile, it is difficult to use synthesized images to cover the overall object space, which may weaken the discrimination performance for certain unknown objects. 2017) is defined to decompose an Instance map from the backbone representations, which is instrumental in localizing and recognizing objects accurately. Besides, standard IB struggles to extract maximally compressed features of the input while preserving as much task-relevant information as possible Lee et al. (2021) . Whereas, OOD features could be considered irrelevant to the current task. Thus, we present RIB to obtain an OOD map used to extract task-irrelevant representations via reversing the optimization objective of the standard IB. Concretely, by maximizing the discrepancy between the predictions from the Instance map and that from the OOD map, and simultaneously minimizing the classification loss, the OOD map could be promoted to contain plentiful object-irrelevant information, which is beneficial for extracting simulative OOD features and improves the discrimination ability. Furthermore, recent research Schulz et al. (2020) has shown that IB is an effective mechanism to capture object information. Inspired by this idea, we explore designing a mixture of information bottlenecks to purify object-related information from multiple different facets. Finally, by combining the information, the discrimination ability could be further enhanced. In the experiments, our method is separately evaluated on OOD-OD and incremental object detection Kj et al. (2021) . Extensive experimental results demonstrate the superiorities of our method. The contributions of our work are summarized as follows: • We propose a method of Two-Stream Information Bottleneck consisting of a standard IB and a dedicated RIB. Particularly, RIB aims to obtain simulative OOD features by maximizing the prediction discrepancy between ID features and OOD features, which reduces the impact of lacking unknown data for supervision. • We design a mixture of information bottlenecks to purify object-related information from multiple different facets, which is beneficial for enhancing object-related information in the features for classification and improves the detection performance. • Experimental results show that our method could effectively improve the performance of OOD-OD and incremental object detection. Particularly, for PASCAL VOC Everingham et al. ( 2010), compared with the baseline method Du et al. (2022b), our method significantly reduces FPR95 by around 10.42%.



Figure 1: Two-Stream Information Bottleneck for OOD-OD. 'RPN' is Region-Proposal Network with RoI Alignment. The green boxes are OOD objects. The red and black lines separately indicatethe decision boundary between ID and OOD objects and that between ID objects belonging to different categories. Due to lacking unknown data for supervision, the traditional object detector could not distinguish ID objects from OOD objects effectively. Our method aims to generate simulative OOD features by maximizing the prediction discrepancy between the features extracted by the IB module and that extracted by the RIB module, which enhances the discrimination ability.

