MUTUAL PARTIAL LABEL LEARNING WITH COMPETITIVE LABEL NOISE

Abstract

Partial label learning (PLL) is an important weakly supervised learning problem, where each training instance is associated with a set of candidate labels that include both the true label and additional noisy labels. Most existing PLL methods assume the candidate noisy labels are randomly chosen, which hardly holds in real-world learning scenarios. In this paper, we consider a more realistic PLL scenario with competitive label noise that is more difficult to distinguish from the true label than the random label noise. We propose a novel Mutual Learning based PLL approach named ML-PLL to address this challenging problem. ML-PLL learns a prediction network based classifier and a class-prototype based classifier cooperatively through interactive mutual learning and label correction. Moreover, we use a transformation network to model the association relationships between the true label and candidate labels, and learn it together with the prediction network to match the observed candidate labels in the training data and enhance label correction. Extensive experiments are conducted on several benchmark PLL datasets, and the proposed ML-PLL approach demonstrates state-of-the-art performance for partial label learning.

1. INTRODUCTION

As it is costly and difficult to annotate each instance with a precise label, weakly supervised learning (WSL) has been widely studied in recent years (Zhou, 2018) , which includes, but not limited to, semi-supervised learning (Van Engelen & Hoos, 2020; Ouali et al., 2020) , noisy label learning (Natarajan et al., 2013; Feng et al., 2021) , positive-unlabeled learning (Kiryo et al., 2017; Shu et al., 2020) , and partial multi-label learning (Xie & Huang, 2018; Yan & Guo, 2021) . Partial label learning (PLL) is a typical WSL problem and aims to learn a model from training samples with overcomplete labels; that is, each training sample is associated with a set of candidate labels that include both the true label and additional label noise-noisy labels. PLL has been widely applied in many real-world learning scenarios, including automatic face naming (Hüllermeier & Beringer, 2006; Zeng et al., 2013) , web mining (Luo & Orabona, 2010) , and multimedia content analysis (Zeng et al., 2013) . Since the ground-truth label is hidden in the candidate label set and not available to the learning algorithms, the main challenge of PLL lies in candidate label disambiguation. To address this challenge, two main label disambiguation strategies have been proposed: average-based disambiguation strategy and identification-based disambiguation strategy. Average-based disambiguation treats each candidate label equally in the model training phase and averages all the modeling outputs from each candidate label in the testing phase (Cour et al., 2011; Hüllermeier & Beringer, 2006; Zhang & Yu, 2015) . Although this strategy is simple and clear, it can make the true label overwhelmed by noisy labels without sufficient differentiation, and lead to poor prediction performance. The identificationbased disambiguation strategy treats the ground-truth label as a latent variable and tries to identify the true label by deriving different confidence scores for the candidate labels (Feng & An, 2018; 2019; Yao et al., 2020b; Yu & Zhang, 2016; Xu et al., 2021) . The identification-based disambiguation approaches are able to achieve relatively better prediction performance than the average-based disambiguation approaches by handling the candidate labels with discrimination, but they can still suffer from the potential drawback of accumulating label identification errors and severely disrupting the subsequent model training. In addition, these existing methods are usually restricted to standard machine learning frameworks with linear or kernel-based approaches, which have difficulties to deal with large-scale datasets. Recently, deep learning has been widely used in addressing PLL problems (Feng et al., 2020b; Wu et al., 2022) . For example, the work in (Feng et al., 2020b) proposes two PLL methods, a risk-consistent and a classifier-consistent algorithms, based on deep neural networks. The methods in (Wen et al., 2021; Lv et al., 2020) try to progressively identify the ground-truth label during training procedures by employing a self-training technique. A more recent work in (Wang et al., 2022b) tackles PLL by using contrastive representation learning with class-prototype based label disambiguation. Although it achieves satisfactory prediction performance, the contrastive learning procedure is time-consuming and resource-demanding. Moreover, all these methods have a common drawback: they automatically assume random noise in the label space; that is, they assume the candidate noisy labels are randomly sampled from a uniform generating procedure. However, during the noisy label creation process in real-world scenarios, such as crowd-sourcing, the noisy labels are typically associated with the true label and dependent on the input sample, which makes the label noise more difficult to distinguish from the ground-truth label than a random label. In this paper, we consider a more realistic and challenging PLL learning scenario, where the noisy labels are competitive and difficult to distinguish from the true label given the input data sample. Intuitively, competitive label noise can demonstrate stronger association relationships with a true label than a random label noise, and hence are more likely to be chosen as candidate labels. For example, for online image annotation, when the object contained in an image is an "alpaca", a competitive label "camel" could have a large probability to be chosen by an annotator with limited expertise due to the similar appearances of "alpaca" and "camel", while labels such as "dog" or "duck" are less likely to be picked as part of the candidate label set due to their relatively weak association with the ground-truth "alpaca". Motivated by the above consideration, we propose a novel and effective Mutual Partial Label Learning approach (ML-PLL) under the competitive label noise learning scenario, which learns a prediction network based classifier and a class-prototype based classifier interactively through label correction and mutual learning. Specifically, ML-PLL performs noisy label correction by integrating the outputs of both the prediction network classifier and the class-prototype based classifier, while using the corrected pseudo-labels as the targets for the prediction network training and using the output of the prediction network as target for the class-prototype based classifier training. In addition, a transformation network is proposed to model the association relationships between the true label and the noisy candidate labels, which can further enhance the classifier training with respect to ground-truth label disambiguation. Extensive experiments are conducted on several benchmark PLL datasets, while the proposed approach, ML-PLL, demonstrates state-of-the-art performance.

2.1. STANDARD PARTIAL LABEL LEARNING

The main challenge for addressing the PLL problem lies in how to disambiguate the candidate labels. Average-based disambiguation strategy and identification-based disambiguation strategy are two main strategies deployed in PLL. The average-based disambiguation strategy treats each candidate label equally in the model induction and averages all the modeling outputs from all the candidate labels as the final prediction. For example, the works in (Cour et al., 2011; Zhang & Yu, 2015) distinguish the averaged candidate label prediction from the non-candidate ones. The identification-based disambiguation strategy treats the ground-truth label as a latent variable and identifies the true label by deriving confidence scores for all the candidate labels (Feng & An, 2018; 2019; Yu & Zhang, 2016) . For example, the works in (Zhang et al., 2016; Xu et al., 2019; Wang et al., 2022a) try to identify the true label by employing iterative label refining procedures and leveraging the topological information in the feature space. However, these methods may suffer from the cumulative errors induced in the error-prone label confidence estimation along the topological structure. In addition, a number of PLL methods propose to employ the off-the-shelf learning techniques, such as maximum likelihood, k-nearest-neighbor, maximum margin, boosting, and error-correcting output codes (ECOC), to tackle PLL problems. For the maximum likelihood technique, the likelihood of each PL training sample is defined over its candidate label set instead of its implicit ground-truth label (Liu & Dietterich, 2012) . For the k-nearest neighbor technique, the candidate labels from neighbor

