PUSH AND PULL: COMPETING FEATURE-PROTOTYPE INTERACTIONS IMPROVE SEMI-SUPERVISED SEMAN-TIC SEGMENTATION

Abstract

This paper challenges semi-supervised segmentation with a rethink on the featureprototype interaction in the classification head. Specifically, we view each weight vector in the classification head as the prototype of a semantic category. The basic practice in the softmax classifier is to pull a feature towards its positive prototype (i.e., the prototype of its class), as well as to push it away from its negative prototypes. In this paper, we focus on the interaction between the feature and its negative prototypes, which is always "pushing" to make them dissimilar. While the pushing-away interaction is necessary, this paper reveals a new mechanism that the contrary interaction of pulling close negative prototypes is also beneficial. We have two insights for this counter-intuitive interaction: 1) some pseudo negative prototypes might actually be positive so that the pulling interaction can help resisting the pseudo-label noises, and 2) some true negative prototypes might contain contextual information that is beneficial. Therefore, we integrate these two competing interactions into a Push-and-Pull Learning (PPL) method. On the one hand, PPL introduces the novel pulling-close interaction between features and negative prototypes with a feature-to-prototype attention. On the other hand, PPL reinforces the original pushing-away interaction with a multi-prototype contrastive learning. While PPL is very simple, experiments show that it substantially improves semi-supervised segmentation and sets a new state of the art.

1. INTRODUCTION

This paper considers the semi-supervised semantic segmentation task. We focus on an essential component in the segmentation model, i.e. the classification head, which consists of a set of learnable weight vectors. These weight vectors are usually viewed as a set of prototypes representing the corresponding semantic categories. The essential training process is to pull each deep feature towards its positive prototype (i.e., the prototype of its class), as well as to push it away from its negative prototypes. The "pushing-away" interaction between a feature and its negative prototypes makes them dissimilar to each other and is critical for discriminating different classes. In all the following parts of this paper, our discussion focuses on the interaction between features and their negative prototypes and neglects the positive prototypes (unless explicitly pointed out). While this "pushing away" interaction is necessary, this paper reveals a new mechanism that the contrary interaction of pulling close features and their negative prototypes is also beneficial. This "pulling close" interaction may seem counter-intuitive at the first glance but is actually reasonable from two insights as below: 1) It brings a task-specific benefit for semi-supervised segmentation by resisting the pseudo-label noises. Specifically, the popular pseudo-label-based pipeline is inevitably confronted with the pseudo-label noises: some pseudo negative prototypes might be actually positive. Under this condition, the "pulling close" interaction gives the feature a chance to approach its actual-positive prototype. Experiments confirm that the pulling interaction effectively reduces the pseudo label noises (Section 4.4) and this task-specific benefit is the primary reason for our improvement (Section 4.4). 2) It brings a general benefit for both the semi-supervised and fully-supervised segmentation because some negative prototypes contain contextual information. Specifically, since our prototypes (i.e., Comparison between the standard feature-prototype interaction and our push-and-pull learning. (a) In the standard classification head, the interaction with the negative prototypes is always "pushing-away". (b) This paper reveals that in addition to the pushing-away interaction, a competing pulling-close interaction between features and negative prototypes is also beneficial. Therefore, the proposed push-and-pull learning (PPL) combines the pushing-away and pulling-close interactions for negative prototypes. Moreover, we use multiple prototypes to represent each individual class to better reflect the intra-class diversity, which is also beneficial. Circles and squares represent prototypes and features, respectively. Different classes are in different colors. the weight vectors) are learned from the whole training set through back-propagation, they naturally provide clues for deriving the contextual information. Some prior works (Yuan et al., 2020; Jin et al., 2021; Zhou et al., 2022) already show that exploring the contextual information from mean features improves the segmentation accuracy. In contrast, we find that directly using the weight vectors for contextual information can bring the similar benefit and is relatively simple. Empirically, we show this general benefit is a secondary reason for our improvement (as detailed in Section 4.4). These two insights motivate us to propose Push-and-Pull Learning (PPL) for semi-supervised segmentation, as shown in Fig. 1 . On the one hand, PPL reinforces the basic "pushing negative prototypes" interaction with a multi-prototype contrastive learning, i.e., using multiple (instead of one) prototypes to better represent each category. On the other hand, PPL enforces the "pulling negative prototypes" interaction through a feature-to-prototype attention. Specifically, each feature absorbs information from all the prototypes using a standard cross-attention layer. Although these two interactions share a same effect of pulling close positive prototypes, we name the resulting method as Push-and-Pull Learning to highlight their competing effect on the negative prototypes. Another important advantage of the proposed PPL is that it can be cascaded to enlarge its benefit. It is because, after the pulling-close interaction, the features are refined with the improved resistance against pseudo-label noises. Therefore, we re-evaluate their pseudo labels to increase label accuracy. Based on the refined deep features, we may append another round of PPL for further improvement. Extensive experiments on popular benchmarks validate that cascading multiple PPL accumulates multiple rounds of improvement (Appendix B.2) and sets a new state of the art (e.g., 77.03% and 74.22% mIoU on PASCAL VOC 2012 using 1/8 and 1/16 labels). Moreover, ablation study shows that within the two benefits of PPL, the task-specific benefit is the major reason for the superiority of PPL: while the general benefit improves the fully-supervised segmentation (the upper-bound of semi-supervised segmentation) by a small margin, the benefit of resisting pseudo-label noises largely reduces the gap between the few-label and full-label regimes. To sum up, this paper makes the following contributions: • We investigate the interactions between features and negative prototypes for semi-supervised segmentation and reveal a novel mechanism, i.e., while the standard pushing-away interaction is necessary, the contrary interaction of pulling close negative prototypes is also beneficial. • We correspondingly propose a pull-and-push learning (PPL) method. PPL combines two competing interactions and brings two benefits, i.e., resisting the pseudo-label noises and leveraging the contextual information. Moreover, cascading multiple PPL stages can accumulate its benefits and thus enlarge the improvement. • We empirically show that PPL substantially improves semi-supervised segmentation by reducing the accuracy gap between full supervision and semi supervision. The achieved results set new state of the art on two popular benchmarks.



Figure1: Comparison between the standard feature-prototype interaction and our push-and-pull learning. (a) In the standard classification head, the interaction with the negative prototypes is always "pushing-away". (b) This paper reveals that in addition to the pushing-away interaction, a competing pulling-close interaction between features and negative prototypes is also beneficial. Therefore, the proposed push-and-pull learning (PPL) combines the pushing-away and pulling-close interactions for negative prototypes. Moreover, we use multiple prototypes to represent each individual class to better reflect the intra-class diversity, which is also beneficial. Circles and squares represent prototypes and features, respectively. Different classes are in different colors.

