UNBIASED TEACHER FOR SEMI-SUPERVISED OBJECT DETECTION

Abstract

Semi-supervised learning, i.e., training networks with both labeled and unlabeled data, has made significant progress recently. However, existing works have primarily focused on image classification tasks and neglected object detection which requires more annotation effort. In this work, we revisit the Semi-Supervised Object Detection (SS-OD) and identify the pseudo-labeling bias issue in SS-OD. To address this, we introduce Unbiased Teacher 1 , a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner. Together with a class-balance loss to downweight overly confident pseudo-labels, Unbiased Teacher consistently improved state-ofthe-art methods by significant margins on COCO-standard, COCO-additional, and VOC datasets. Specifically, Unbiased Teacher achieves 6.8 absolute mAP improvements against state-of-the-art method when using 1% of labeled data on MS-COCO, achieves around 10 mAP improvements against the supervised baseline when using only 0.5, 1, 2% of labeled data on MS-COCO.

1. INTRODUCTION

The availability of large-scale datasets and computational resources has allowed deep neural networks to achieve strong performance on a wide variety of tasks. However, training these networks requires a large number of labeled examples that are expensive to annotate and acquire. As an alternative, Semi-Supervised Learning (SSL) methods have received growing attention (Sohn et al., 2020a; Berthelot et al., 2020; 2019; Laine & Aila, 2017; Tarvainen & Valpola, 2017; Sajjadi et al., 2016; Lee, 2013; Grandvalet & Bengio, 2005 ). Yet, these advances have primarily focused on image classification, rather than object detection where bounding box annotations require more effort. In this work, we revisit object detection under the SSL setting (Figure 1 ): an object detector is trained with a single dataset where only a small amount of labeled bounding boxes and a large amount of unlabeled data are provided, or an object detector is jointly trained with a large labeled dataset as well as a large external unlabeled dataset. A straightforward way to address Semi-Supervised Object Detection (SS-OD) is to adapt from existing advanced semi-supervised image classification methods (Sohn et al., 2020a) . Unfortunately, object detection has some unique characteristics that interact poorly with such methods. For example, the nature of class-imbalance in object detection tasks impedes the usage of pseudo-labeling. In object detection, there exists foreground-background imbalance and foreground classes imbalance (see Section 3.3). These imbalances make models trained in SSL settings prone to generate biased predictions. Pseudo-labeling methods, one of the most successful SSL methods in image classification (Lee, 2013; Sohn et al., 2020a) To overcome these issues, we propose a general framework -Unbiased Teacher: an approach that jointly trains a Student and a slowly progressing Teacher in a mutually-beneficial manner, in which the Teacher generates pseudo-labels to train the Student, and the Student gradually updates the Teacher via Exponential Moving Average (EMA)foot_0 , while the Teacher and Student are given different augmented input images (see Figure 3 ). Inside this framework, (i) we utilize the pseudo-labels as explicit supervision for both RPN and ROIhead and thus alleviate the overfitting issues in both RPN and ROIhead. (ii) We also prevent detrimental effects due to noisy pseudo-labels by exploiting the Teacher-Student dual models (see further discussion and analysis in Section 4.2). (iii) With the use of EMA training and the Focal loss (Lin et al., 2017b) , we can address the pseudo-labeling bias problem caused by class-imbalance and thus improve the quality of pseudo-labels. As the result, our object detector achieves significant performance improvements. We benchmark Unbiased Teacher with SSL setting using the MS-COCO and PASCAL VOC datasets, namely COCO-standard, COCO-additional, and VOC. When using only 1% labeled data from MS-COCO (COCO-standard), Unbiased Teacher achieves 6.8 absolute mAP improvement against the state-of-the-art method, STAC (Sohn et al., 2020b) . Unbiased Teacher consistently achieves around 10 absolute mAP improvements when using only 0.5, 1, 2, 5% of labeled data compared to supervised baseline. We highlight the contributions of this paper as follows: • By analyzing object detectors trained with limited-supervision, we identify that the nature of class-imbalance in object detection tasks impedes the effectiveness of pseudo-labeling method on SS-OD task. • We thus proposed a simple yet effective method, Unbiased Teacher, to address the pseudolabeling bias issue caused by class-imbalance existing in ground-truth labels and the overfitting issue caused by the scarcity of labeled data. • Our Unbiased Teacher achieves state-of-the-art performance on SS-OD across COCOstandard, COCO-additional, and VOC datasets. We also provide an ablation study to verify the effectiveness of each proposed component.

2. RELATED WORKS

Semi-Supervised Learning. The majority of the recent SSL methods typically consist of (1) input augmentations and perturbations, and (2) consistency regularization. They regularize the model to be invariant and robust to certain augmentations on the input, which requires the outputs given the original and augmented inputs to be consistent. For example, existing approaches apply convention data augmentations (Berthelot et al., 2019; Laine & Aila, 2017; Sajjadi et al., 2016; Tarvainen & 



Note that there have been many works that leverages EMA, e.g., ADAM optimization(Kingma & Ba, 2015), Batch Normalization (Ioffe & Szegedy, 2015), self-supervised learning(He et al., 2020; Grill et al., 2020), and SSL image classification(Tarvainen & Valpola, 2017). We, for the first time, show its effectiveness in combating class imbalance issues and detrimental effect of pseudo-labels for the object detection task.



, may thus be biased towards dominant and overly confident classes (background) while ignoring minor and less confident classes (foreground). As a result, adding biased pseudo-labels into the semi-supervised training aggravates the class-imbalance issue and introduces severe overfitting. As shown in Figure 2, taking a two-stage object detector as an example, there exists heavy overfitting on the fore-(a) Illustration of semi-supervised object detection, where the model observes a set of labeled data and a set of unlabeled data in the training stage. (b) Our proposed model can efficiently leverage the unlabeled data and perform favorably against the existing semi-supervised object detection works, including CSD (Jeong et al., 2019) and STAC (Sohn et al., 2020b). ground/background classification in the RPN and multi-class classification in the ROIhead (but not on bounding box regression).

funding

* Work done partially while interning at Facebook.

