DIRICHLET-BASED UNCERTAINTY CALIBRATION FOR ACTIVE DOMAIN ADAPTATION

Abstract

Active domain adaptation (DA) aims to maximally boost the model adaptation on a new target domain by actively selecting limited target data to annotate, whereas traditional active learning methods may be less effective since they do not consider the domain shift issue. Despite active DA methods address this by further proposing targetness to measure the representativeness of target domain characteristics, their predictive uncertainty is usually based on the prediction of deterministic models, which can easily be miscalibrated on data with distribution shift. Considering this, we propose a Dirichlet-based Uncertainty Calibration (DUC) approach for active DA, which simultaneously achieves the mitigation of miscalibration and the selection of informative target samples. Specifically, we place a Dirichlet prior on the prediction and interpret the prediction as a distribution on the probability simplex, rather than a point estimate like deterministic models. This manner enables us to consider all possible predictions, mitigating the miscalibration of unilateral prediction. Then a two-round selection strategy based on different uncertainty origins is designed to select target samples that are both representative of target domain and conducive to discriminability. Extensive experiments on cross-domain image classification and semantic segmentation validate the superiority of DUC.

1. INTRODUCTION

Despite the superb performances of deep neural networks (DNNs) on various tasks (Krizhevsky et al., 2012; Chen et al., 2015) , their training typically requires massive annotations, which poses formidable cost for practical applications. Moreover, they commonly assume training and testing data follow the same distribution, making the model brittle to distribution shifts (Ben-David et al., 2010) . Alternatively, unsupervised domain adaptation (UDA) has been widely studied, which assists the model learning on an unlabeled target domain by transferring the knowledge from a labeled source domain (Ganin & Lempitsky, 2015; Long et al., 2018) . Despite the great advances of UDA, the unavailability of target labels greatly limits its performance, presenting a huge gap with the supervised counterpart. Actually, given an acceptable budget, a small set of target data can be annotated to significantly boost the performance of UDA. With this consideration, recent works (Fu et al., 2021; Prabhu et al., 2021) integrate the idea of active learning (AL) into DA, resulting in active DA. The core of active DA is to annotate the most valuable target samples for maximally benefiting the adaptation. However, traditional AL methods based on either predictive uncertainty or diversity are less effective for active DA, since they do not consider the domain shift. For predictive uncertainty (e.g., margin (Joshi et al., 2009 ), entropy (Wang & Shang, 2014) ) based methods, they cannot measure the target-representativeness of samples. As a result, the selected samples are often redundant and less informative. As for diversity based methods (Sener & Savarese, 2018; Nguyen & Smeulders, 2004) , they may select samples that are already well-aligned with source domain (Prabhu et al., 2021) . Aware of these, active DA methods integrate both predictive uncertainty and targetness into the selection process (Su et al., 2019; Fu et al., 2021; Prabhu et al., 2021 ). Yet, existing focus is on the measurement of targetness, e.g., using domain discriminator (Su et al., 2019) or clustering (Prabhu et al., 2021) . The predictive uncertainty they used is still mainly based on the prediction of deterministic models, which is essentially a point estimate (Sensoy et al., 2018) and can easily be miscalibrated on data with distribution shift (Guo et al., 2017) . As in Fig. 1 (a), standard DNN is wrongly overconfident on most target data. Correspondingly, its predictive uncertainty is unreliable. To solve this, we propose a Dirichlet-based Uncertainty Calibration (DUC) method for active DA, which is mainly built on the Dirichlet-based evidential deep learning (EDL) (Sensoy et al., 2018) . In EDL, a Dirichlet prior is placed on the class probabilities, by which the prediction is interpreted as a distribution on the probability simplex. That is, the prediction is no longer a point estimate and each prediction occurs with a certain probability. The resulting benefit is that the miscalibration of unilateral prediction can be mitigated by considering all possible predictions. For illustration, we plot the expected entropy of all possible predictions using the Dirichlet-based model in Fig. 1 (a). And we see that most target data with domain shift are calibrated to have greater uncertainty, which can avoid the omission of potentially valuable target samples in deterministic model based-methods. Besides, based on Subjective Logic (Jøsang, 2016), the Dirichlet-based evidential model intrinsically captures different origins of uncertainty: the lack of evidences and the conflict of evidences. This property further motivates us to consider different uncertainty origins during the process of sample selection, so as to comprehensively measure the value of samples from different aspects. Specifically, we introduce the distribution uncertainty to express the lack of evidences, which mainly arises from the distribution mismatch, i.e., the model is unfamiliar with the data and lacks knowledge about it. In addition, the conflict of evidences is expressed as the data uncertainty, which comes from the natural data complexity, e.g., low discriminability. And the two uncertainties are respectively captured by the spread and location of the Dirichlet distribution on the probability simplex. As in Fig. 1 (b), the real-world style of the first target image obviously differs from source domain and presents a broader spread on the probability simplex, i.e., higher distribution uncertainty. This uncertainty enables us to measure the targetness without introducing the domain discriminator or clustering, greatly saving computation costs. While the second target image provides different information mainly from the aspect of discriminability, with the Dirichlet distribution concentrated around the center of the simplex. Based on the two different origins of uncertainty, we design a two-round selection strategy to select both target-representative and discriminability-conducive samples for label query. Contributions: 1) We explore the uncertainty miscalibration problem that is ignored by existing active DA methods, and achieve the informative sample selection and uncertainty calibration simultaneously within a unified framework. 2) We provide a novel perspective for active DA by introducing the Dirichlet-based evidential model, and design an uncertainty origin-aware selection strategy to comprehensively evaluate the value of samples. Notably, no domain discriminator or clustering is used, which is more elegant and saves computation costs. 3) Extensive experiments on both cross-domain image classification and semantic segmentation validate the superiority of our method.

2. RELATED WORK

Active Learning (AL) aims to reduce the labeling cost by querying the most informative samples to annotate (Ren et al., 2022) , and the core of AL is the query strategy for sample selection. Committeebased strategy selects samples with the largest prediction disagreement between multiple classifiers (Seung et al., 1992; Dagan & Engelson, 1995) . Representative-based strategy chooses a set of representative samples in the latent space by clustering or core-set selection (Nguyen & Smeulders,



Figure 1: (a): point-estimate entropy of DNN and expected entropy of Dirichlet-based model, where colors of points denote class identities. Both models are trained with source data. (b): examples of the prediction distribution of three "monitor" images on the simplex. The model is trained with images of "keyboard", "computer" and "monitor" from the Clipart domain of Office-Home dataset. For the two images from the Real-World domain, the entropy of expected prediction cannot distinguish them, whereas U dis and U data calculated based on the prediction distribution can reflect what contributes more to their uncertainty and be utilized to guarantee the information diversity of selected data.

