MODELING THE UNCERTAINTY WITH MAXIMUM DIS-CREPANT STUDENTS FOR SEMI-SUPERVISED 2D POSE ESTIMATION Anonymous

Abstract

Semi-supervised pose estimation is a practically challenging task for computer vision. Although numerous excellent semi-supervised classification methods have emerged, these methods typically use confidence to evaluate the quality of pseudolabels, which is difficult to achieve in pose estimation tasks. For example, in pose estimation, confidence represents only the possibility that a position of the heatmap is a keypoint, not the quality of that prediction. In this paper, we propose a simple yet efficient framework to estimate the quality of pseudo-labels in semi-supervised pose estimation tasks from the perspective of modeling the uncertainty of the pseudo-labels. Concretely, under the dual mean-teacher framework, we construct the two maximum discrepant students (MDSs) to effectively push two teachers to generate different decision boundaries for the same sample. Moreover, we create multiple uncertainties to assess the quality of the pseudolabels. Experimental results demonstrate that our method improves the performance of semi-supervised pose estimation on three datasets.



However, semi-supervised classification methods commonly evaluate the quality of generated pseudo-labels based on confidence, such as Ke et al. (2019) , which cannot be applied to semisupervised pose estimation. In the pose estimation task, confidence can only be used to obtain the keypoint locations of the heatmap predicted by the model, not to evaluate the quality of the prediction. As shown in Fig. 1 , with the increase of epochs, the model's prediction confidence of this key point gradually increases, but the error of this prediction does not effectively decrease. This phenomenon indicates that confidence in pose estimation is not directly related to the quality of pre-1



task in computer vision, pose estimation has been widely used in various fields such as action recognition, security monitoring, animal experimentation. Most pose estimation methods use MSE loss to transform keypoint location predictions into Gaussian heatmap regression. Stateof-the-art methods (Zhang et al. (2020a), Zhang et al. (2020b), Huang et al. (2019), Li et al. (2019), Cheng et al. (2019), Chen et al. (2018), Xiao et al. (2018), Tang et al. (2018), Newell et al. (2017), Cao et al. (2017), Newell et al. (2016), Wei et al. (2016)) require a large amount of labeled data. However, labeling pose keypoints is a costly and time-intensive labor for many practical tasks. How to effectively train a model without sufficient labeled data has attracted attention in the field of pose estimation.A simple idea is to utilize semi-supervised classification methods for the pose estimation task. Numerous outstanding research have emerged in the field of semi-supervised classification, which can be categorized into methods based on consistency constraints and pseudo-labels. For example,Miyato et al. (2018), Sajjadi et al. (2016)  mainly employ rational stochastic transformations and perturbations to improve the gain, which are derived from consistency constraints.Meanwhile, Tarvainen  & Valpola (2017), Laine & Aila (2016) utilize high-quality pseudo-label filters from the predictions of unlabeled samples. In addition, the two methods are combined for better performance, such asZhang et al. (2021), Chen et al. (2020), Berthelot et al. (2019).

availability

The project code and our dataset are publicly available on https://github.com/Qi2019KB/ MDSs/

