PARTIAL LABEL UNSUPERVISED DOMAIN ADAPTA-TION WITH CLASS-PROTOTYPE ALIGNMENT

Abstract

Partial label learning (PLL) tackles the problem where each instance is associated with a set of candidate labels, only one of which is the ground-truth label. Most existing PLL approaches assume that both the training and test sets share an identical data distribution. However, this assumption does not hold in many realworld scenarios where the training and test data come from different distributions. In this paper, we formalize this learning scenario as a new problem called partial label unsupervised domain adaptation (PLUDA). To address this challenging PLUDA problem, we propose a novel Prototype Alignment based PLUDA method named PAPLUDA, which dynamically refines the pseudo-labels of instances from both the source and target domains by consulting the outputs of a teacher-student model in a moving-average manner, and bridges the cross-domain discrepancy through inter-domain class-prototype alignment. In addition, a teacher-student model based contrastive regularization is deployed to enhance prediction stability and hence improve the class-prototypes in both domains for PLUDA. Comprehensive experimental results demonstrate that PAPLUDA achieves state-of-the-art performance on the widely used benchmark datasets.

1. INTRODUCTION

Partial label learning (PLL) is a typical weakly supervised learning problem, where each training instance is assigned a candidate label set, only one of which is valid. PLL has gained increasing attention from the research community due to its effectiveness in reducing annotation costs in various real-world scenarios, such as face naming (Hüllermeier & Beringer, 2006) , web mining (Luo & Orabona, 2010) , and ecoinformatics (Liu & Dietterich, 2014) . Nevertheless, standard PLL assumes the training and test data are sampled from the same distribution. With this assumption, a model learned from the training data is expected to generalize well on the test data. However, this assumption does not hold in many real-world scenarios where the training and test data come from different distributions-e.g., the training and test data are collected from different sources, or we have an outdated training set due to the fact that data always change over time. In such cases, there would be a discrepancy between the training and test data distributions, and hence naively adopting the off-the-shelf PLL models can lead to significant test performance degradation. Meanwhile, the unavailability of the ground-truth labels prevents the deployment of existing unsupervised domain adaptation (UDA) methods (Tzeng et al., 2017; Dong et al., 2021; Na et al., 2021; Shen et al., 2022) . We formalize this new learning scenario of PLL with training-test distribution gaps as a partial label unsupervised domain adaptation (PLUDA) problem. By integrating the challenges of both PLL and UDA problems, the PLUDA problem has the following characteristics: (1) the source and target domains have different distributions but share the same set of classes; (2) data in the source domain have only partial labels-each instance is associated with a candidate label set, while the target domain only has unlabeled data; (3) the candidate label set for each source instance can contain both the ground-truth and irrelevant noisy labels, while labels outside of the candidate set are true negative labels. The goal of the PLUDA task is to learn a domain-invariant prediction model from the partial-label source domain that can generalize well in the unlabeled target domain. Although both PLL and UDA have been studied intensively in the literature, to the best of our knowledge, there is no research yet to address the integrated challenges of PLUDA in a unified framework. PLUDA is related to but still distinct from the weakly supervised domain adaptation (WSDA) problem studied in the recent literature (Shu et al., 2019; Xie et al., 2022) . WSDA assumes the ground-truth labels of the source domain instances are corrupted (e.g., replaced) with noisy labels, and has been studied as an effort of reducing annotation costs. Some researchers address the WSDA problem based on off-the-shelf technologies and obtain good performance. For example, Xie et al. ( 2022) exploit the bilateral relationships between the source and target domains to construct a universal framework, GearNet, based on the existing domain adaptation methods of TCL (Shu et al., 2019) and DANN (Tzeng et al., 2017) . In this paper, we propose a novel prototype alignment based partial label unsupervised domain adaptation approach, PAPLUDA, to address the combined PLL and UDA problems simultaneously in a newly formalized PLUDA learning scenario. The proposed PAPLUDA approach contains three pseudo-label based components that collaborate with each other to tackle PLUDA learning. First, we conduct soft label disambiguation to dynamically rectify the pseudo-labels of training instances from both domains toward the ground-truth labels, which aims to disambiguate the partial labels and set up the foundation for cross-domain class alignment. Second, we propose an inter-domain classprototype based alignment to minimize the discrepancy between the same class prototypes from the source and target domains while maximizing the gaps between the prototypes of different classes. Finally, we deploy a teacher-student model based contrastive regularization to enhance the reliable pseudo-labels, and hence improve the class-prototypes and the inter-domain prototype alignment. Overall, the contributions of this paper can be summarized as follows: • A new challenging learning problem, PLUDA, is proposed, which is more practical than separate PLL and UDA problems by simultaneously dropping the common supervised learning assumptions of accurate data labels and identical training and testing distributions. • A novel PAPLUDA approach is proposed to tackle the PLUDA problem. • Comprehensive experiments are conducted on benchmarks and the results validate the effectiveness of the proposed PAPLUDA. 



associated with a set of candidate labels, only one of which is valid. Many methods adapt existing learning techniques to the partial-label training data and disambiguate the candidate noisy labels by aggregating predictions. For maximum likelihood techniques, the likelihood of each partial label training instance is produced by consulting the probability of each candidate label associated with the training instance(Jin & Ghahramani, 2002; Liu & Dietterich, 2012). For k-nearest neighbor techniques, the candidate labels from neighbor instances are integrated for final prediction in a weighted voting manner(Hüllermeier & Beringer, 2006; Gong et al., 2017; Zhang &  Yu, 2015). For maximum margin techniques, the classification margin on each partial label instance is defined by discriminating the modeling outputs from candidate labels and non-candidate labels(Nguyen & Caruana, 2008; Yu & Zhang, 2016). Apart from adapting the off-the-shelf techniques to partial-label data, some researchers propose to address PLL by adapting the partial-label data to existing learning techniques. For example,Zhang et al. (2017)  propose to transform a partial-label training set into multiple binary training sets which can then be used to built multiple binary classifiers corresponding to the ECOC coding matrix. Wu & Zhang (2018) adopt a one-vs-one decomposition strategy to enable binary decomposition for learning from partial-label data. Although these works produce competitive performance, they are restricted to linear models and have difficulty to handle large-scale datasets. To alleviate the limitations of the standard PLL methods, deep learning based PLL has recently started gaining attention from the research community. Yao et al. (2020) attempt to address PLL with deep convolutional neural networks by exploiting a temporal-ensembling technique. Meanwhile, Yan & Guo (2020) handle PLL with multilayer perceptrons by means of batch label correction. Wen et al. (2021) present a group of loss functions called leveraged weighed loss, which takes the work of (Lv et al., 2020) as its special case. Feng et al. (2020) present two provably consistent methods from the perspective of partial label generation: a risk-consistent method and

