PSEUDOSEG: DESIGNING PSEUDO LABELS FOR SEMANTIC SEGMENTATION

Abstract

Recent advances in semi-supervised learning (SSL) demonstrate that a combination of consistency regularization and pseudo-labeling can effectively improve image classification accuracy in the low-data regime. Compared to classification, semantic segmentation tasks require much more intensive labeling costs. Thus, these tasks greatly benefit from data-efficient training methods. However, structured outputs in segmentation render particular difficulties (e.g., designing pseudo-labeling and augmentation) to apply existing SSL strategies. To address this problem, we present a simple and novel re-design of pseudo-labeling to generate well-calibrated structured pseudo labels for training with unlabeled or weaklylabeled data. Our proposed pseudo-labeling strategy is network structure agnostic to apply in a one-stage consistency training framework. We demonstrate the effectiveness of the proposed pseudo-labeling strategy in both low-data and highdata regimes. Extensive experiments have validated that pseudo labels generated from wisely fusing diverse sources and strong data augmentation are crucial to consistency training for semantic segmentation.

1. INTRODUCTION

Image semantic segmentation is a core computer vision task that has been studied for decades. Compared with other vision tasks, such as image classification and object detection, human annotation of pixel-accurate segmentation is dramatically more expensive. Given sufficient pixellevel labeled training data (i.e., high-data regime), the current state-of-the-art segmentation models (e.g., DeepLabv3+ (Chen et al., 2018) ) produce satisfactory segmentation prediction for common practical usage. Recent exploration demonstrates improvement over high-data regime settings with large-scale data, including self-training (Chen et al., 2020a; Zoph et al., 2020) and backbone pretraining (Zhang et al., 2020a) . In contrast to the high-data regime, the performance of segmentation models drop significantly, given very limited pixel-labeled data (i.e., low-data regime). Such ineffectiveness at the low-data regime hinders the applicability of segmentation models. Therefore, instead of improving high-data regime segmentation, our work focuses on data-efficient segmentation training that only relies on few pixellabeled data and leverages the availability of extra unlabeled or weakly annotated (e.g., image-level) data to improve performance, with the aim of narrowing the gap to the supervised models trained with fully pixel-labeled data. Our work is inspired by the recent success in semi-supervised learning (SSL) for image classification, demonstrating promising performance given very limited labeled data and a sufficient amount of unlabeled data. Successful examples include MeanTeacher (Tarvainen & Valpola, 2017) , UDA (Xie et al., 2019 ), MixMatch (Berthelot et al., 2019b ), FeatMatch (Kuo et al., 2020 ), and FixMatch (Sohn et al., 2020a) . One outstanding idea in this type of SSL is consistency training: making predictions consistent among multiple augmented images. FixMatch (Sohn et al., 2020a) shows that using high-confidence one-hot pseudo labels obtained from weakly-augmented unlabeled data to train strongly-augmented counterpart is the key to the success of SSL in image classification. * Work done during internship at Google Cloud AI Research. 1

availability

The source code is available at https://github.com/googleinterns/wss.

