DECOMPOSITIONAL GENERATION PROCESS FOR INSTANCE-DEPENDENT PARTIAL LABEL LEARNING

Abstract

Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instancedependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior (MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Extensive experiments on manually corrupted benchmark datasets and real-world datasets validate the effectiveness of the proposed method.

1. INTRODUCTION

Partial label learning (PLL) aims to deal with the problem where each instance is provided with a set of candidate labels, only one of which is the correct label. The problem of learning from partial label examples naturally arises in a number of real-world scenarios such as web data mining Luo & Orabona (2010) It is challenging to avoid overfitting on candidate labels, especially when the candidate labels depend on instances. Therefore, the previous methods assume that the candidate labels are instanceindependent. Unfortunately, this often tends to be the case that the incorrect labels related to the feature are more likely to be picked as candidate label set for each instance. Recent work Xu et al. (2021) has also shown that the presence of instance-dependent PLL imposes additional challenges but is more realistic in practice than the instance-independent case. In this paper, we focus on the instance-dependent PLL via considering the essential generating process of candidate labels in PLL. To begin with, let us rethink meticulously how candidate labels arise in most manual annotation scenarios. When one annotates an instance, though the correct label has already emerged in the mind of the annotator first, the incorrect labels which are related to the feature of the instance confuse the annotator, then leading to the result that the correct label and some incorrect labels are packed together as the candidate labels. Therefore, the generating process of the candidate labels in instance-dependent PLL could be decomposed into two stages, i.e., the generation of the correct label of the instance and the generation of the incorrect labels related to the instance, which could be described by Categorical distribution and Bernoulli distribution, respectively. Motivated by the above consideration, we propose a novel PLL method named IDGP, i.e., Instancedependent partial label learning via Decompositional Generation Process. Before performing IDGP, the distributions of the correct label and the incorrect label given the training example should be modeled explicitly by decoupled probability distributions Categorical distribution and Bernoulli distribution. Then we perform Maximum A Posterior (MAP) estimation on the PLL training dataset to deduce a risk minimizer. To optimize the risk minimizer, Dirichlet distribution and Beta distribution are leveraged to model the condition prior inside and estimate the parameters of Categorical distribution and Bernoulli distribution due to the conjugacy. Finally, we refine prior information by updating the parameters of the corresponding conjugate distributions iteratively to improve the performance of the predictive model in each epoch. Our contributions can be summarized as follows: • We for the first time explicitly model the generation process of candidate labels in instancedependent PLL. The entire generating process is decomposed into the generation of the correct label of the instance and the generation of the incorrect labels, which could be described by Categorical distribution and Bernoulli distribution, respectively. • We optimize the models of Categorical distribution and Bernoulli distribution via the MAP technique, where the corresponding conjugate distributions, i.e., Dirichlet distribution and Beta distribution are induced. • We derive an estimation error bound of our approach, which demonstrates that the empirical risk minimizer would approximately converge to the optimal risk minimizer as the number of training data grows to infinity.

2. RELATED WORK

In this section, we briefly review the literature for PLL from two aspects, i.e., traditional PLL and deep PLL. The former absorbs many classical machine learning techniques and usually utilizes linear models while the latter embraces deep learning and builds upon deep neural networks. We focus on the underlying assumptions on the generation of candidate labels behind part of them. 2020) propose a progressive identification method that allows PLL to be compatible with arbitrary models and optimizers while also performing impressively on image classification benchmarks for the first



, multimedia content analysis Zeng et al. (2013); Chen et al. (2017), and ecoinformatics Liu & Dietterich (2012); Tang & Zhang (2017). A number of methods have been proposed to improve the practical performance of PLL. Identificationbased PLL approaches Jin & Ghahramani (2002); Nguyen & Caruana (2008); Liu & Dietterich (2012); Chen et al. (2014); Yu & Zhang (2016) regard the correct label as a latent variable and try to identify it. Average-based approaches Hüllermeier & Beringer (2006); Cour et al. (2011); Zhang & Yu (2015) treat all the candidate labels equally and average the modeling outputs as the prediction. In addition, risk-consistent methods Feng et al. (2020); Wen et al. (2021) and classifier-consistent methods Lv et al. (2020); Feng et al. (2020) are proposed for deep models. Furthermore, aimed at deep models, Wang et al. (2022) investigate contrastive representation learning, Zhang et al. (2021) adapt the class activation map, and Wu et al. (2022) revisit consistency regularization in PLL.

Xu et al. (2019b)  use topological information in the feature space to iteratively update the confidence of each candidate label or the label distribution. It should be noted that most traditional PLL methods ignore the process of generating candidate labels. The Logistic Stick-Breaking Conditional Multinomial Model is proposed byLiu & Dietterich (2012)   to depict the generation process, but the candidate labels are assumed to be instance-independent.Deep PLL has recently been studied and advanced the practical application of PLL, where the PLL approaches are not restricted to linear models and low-efficiency optimization.Yao et al.  (2020a)  pioneer the use of deep convolutional neural networks and employ a regularization term of uncertainty and a temporal-ensembling term to train the deep model.Lv et al. (

availability

Source code is available at https://github.com/

