MULTI-LEVEL GENERATIVE MODELS FOR PARTIAL LABEL LEARNING WITH NON-RANDOM LABEL NOISE Anonymous

Abstract

Partial label (PL) learning tackles the problem where each training instance is associated with a set of candidate labels that include both the true label and irrelevant noise labels. In this paper, we propose a novel multi-level generative model for partial label learning (MGPLL), which tackles the PL problem by learning both a label level adversarial generator and a feature level adversarial generator under a bi-directional mapping framework between the label vectors and the data samples. MGPLL uses a conditional noise label generation network to model the nonrandom noise labels and perform label denoising, and uses a multi-class predictor to map the training instances to the denoised label vectors, while a conditional data feature generator is used to form an inverse mapping from the denoised label vectors to data samples. Both the noise label generator and the data feature generator are learned in an adversarial manner to match the observed candidate labels and data features respectively. We conduct extensive experiments on both synthesized and real-world partial label datasets. The proposed approach demonstrates the state-of-the-art performance for partial label learning.

1. INTRODUCTION

Partial label (PL) learning is a weakly supervised learning problem with ambiguous labels (Hüllermeier & Beringer, 2006; Zeng et al., 2013) , where each training instance is assigned a set of candidate labels, among which only one is the true label. Since it is typically difficult and costly to annotate instances precisely, the task of partial label learning naturally arises in many real-world learning scenarios, including automatic face naming (Hüllermeier & Beringer, 2006; Zeng et al., 2013) , and web mining (Luo & Orabona, 2010) . As the true label information is hidden in the candidate label set, the main challenge of PL lies in identifying the ground truth labels from the candidate noise labels, aiming to learn a good prediction model. Some previous works have made effort on adjusting the existing effective learning techniques to directly handle the candidate label sets and perform label disambiguation implicitly (Gong et al., 2018; Nguyen & Caruana, 2008; Wu & Zhang, 2018) . These methods are good at exploiting the strengths of the standard classification techniques and have produced promising results on PL learning. Another set of works pursue explicit label disambiguation by trying to identify the true labels from the noise labels in the candidate label sets. For example, the work in (Feng & An, 2018) tries to estimate the latent label distribution with iterative label propagations and then induce a prediction model by fitting the learned latent label distribution. Another work in (Lei & An, 2019) exploits a self-training strategy to induce label confidence values and learn classifiers in an alternative manner by minimizing the squared loss between the model predictions and the learned label confidence matrix. However, these methods suffer from the cumulative errors induced in either the separate label distribution estimation steps or the error-prone label confidence estimation process. Moreover, all these methods have a common drawback: they automatically assumed random noise in the label space -that is, they assume the noise labels are randomly distributed in the label space for each instance. However, in real world problems the appearance of noise labels is usually dependent on the target true label. For example, when the object contained in an image is a "computer", a noise label "TV" could be added due to a recognition mistake or image ambiguity, but it is less likely to annotate the object as "lamp" or "curtain", while the probability of getting noise labels such as "tree" or "bike" is even smaller.

