LEVERAGED ASYMMETRIC LOSS WITH DISAMBIGUA-TION FOR MULTI-LABEL RECOGNITION WITH ONE-POSITIVE ANNOTATIONS

Abstract

In the problem of multi-label learning from single positive labels (SPL), we learn the potential multiple labels from one observable single positive annotation. Despite many efforts to solve this problem, an effective algorithm with sound theoretical understanding is still in need. In this paper, we propose a novel loss function for the SPL problem, called leveraged asymmetric loss with disambiguation (LASD), where we introduce a pair of leverage parameters to address the severe negativepositive imbalance. From the theoretical perspective, we analyze the SPL problem, for the first time, from the perspective of risk consistency, which links the SPL loss with losses for ordinary multi-label classification. We prove the consistency of our proposed LASD loss to the cost-sensitive Hamming loss, which provides guidance to the empirical choice of our proposed leverage parameters. In experiments, we demonstrate the effectiveness of our proposed LASD loss function over other state-of-the-art methods and empirically verify our theoretical results.

1. INTRODUCTION

Different from standard multi-class classification, where each instance is tagged with one target label, multi-label classification (Liu et al., 2022; Li et al., 2022) allows an instance to have multiple labels and thus is applicable to wider real-world scenarios. For example, a picture can contain multiple objects (Lanchantin et al., 2021; Hu et al., 2021) , a sentence can express multiple emotions (Huang et al., 2021; Fei et al., 2020) , and a song can belong to multiple genres (Shrivastava et al., 2020; Pellegrini & Masquelier, 2021) . Despite the wide applications of multi-label learning, the existence of multi-labels further increases the difficulty of annotating high-quality labels (Deng et al., 2014) . On the one hand, label annotation can be extremely laborious and costly (Deng et al., 2014) . On the other hand, the small objects or rare classes are often inevitably ignored by the human annotators (Liu et al., 2021; Wolfe et al., 2005) . To deal with such problems, researchers loosen the requirements of label and propose the "Partial Multi-Label" (PML) paradigm, where the label of each instance can be the subset of its complete label set (Xie & Huang, 2021; Yan & Guo, 2021; Lyu et al., 2021; Li et al., 2021) . Recently, based on PML classification, Cole et al. ( 2021) take a step further and present the paradigm called Single Positive Labels (SPL), where the data only provides one correct (positive) label for each instance. A simple example is provided to illustrate the the difference between the PML and SPL paradigms: for an image containing a sofa, a chair, and a potted plant but without a person or a car, for PML paradigm, we are given the label: sofa (yes), chair (yes), person (no), others (unknown), while for SPL paradigm, we only know the label: sofa (yes), others (unknown). In fact, many real-world multi-class datasets are potentially multi-labeled, where SPL algorithms can be used directly to explore their underlying multi-labels. For example, the empirical study by Stock & Cisse (2018) presents the multi-label nature in images of ImageNet dateset (Russakovsky et al., 2015) . By using the previous datasets for multi-class problems, SPL can save a massive amount of money, time, and labor resources on data collection. The algorithmic study of SPL is relatively under-exploited. Cole et al. (2021) first formally define the problem of SPL and proposes the adaptive SPL loss, which shows satisfactory experimental performance. However, it suffers from the shortcoming that one of its hyper-parameters, the average number of positive labels, is hard to define and may vary according to data selection. Subsequently, Verelst et al. ( 2022) and Zhou et al. ( 2022) also aim to solve the SPL problem, and introduce the use of spatial consistency loss and entropy maximization, respectively. However, there exists a great demand for theoretical properties for SPL problems. An alternative idea for solving SPL problems is to utilize PML classifiers. The first category explores the relationship among labels and models the label correlations (Chen et al., 2019; Durand et al., 2019; Huynh & Elhamifar, 2020) . However, this category requires at least two labels per instance and thus is incompatible with the SPL problem. The second category turns the PML problem into solving an optimization problem (Sun et al., 2010; Bucak et al., 2011; Cabral et al., 2011; Xu et al., 2013) . However, most of these algorithms only show promising performances on conventional datasets when there are sufficient positive labels per instance. In other words, directly utilizing these algorithms on SPL may induce severe performance degradation. Besides, one can also solve the SPL problem through positive-unlabeled (PU) frameworks for multi-label learning (Sun et al., 2010; Hsieh et al., 2015; Han et al., 2018; Kanehira & Harada, 2016) . Nonetheless, these methods are rarely explored in the SPL setting, and most of the works cannot be directly applied to large-scale multi-labeled image classification. Under such conditions, aiming at solving the SPL problem, we propose a new loss function called leveraged asymmetric loss with disambiguation (LASD), which explicitly copes with the challenges of extreme label imbalance and label self-disambiguation. To address the effectiveness of our proposed loss function, we for the first time use the concept of risk consistency to show the relationship between loss functions for SPL and that for fully supervised multi-label data. The contribution of this paper can be summarized as follows: • We propose a novel loss function for the SPL problem, where we introduce a pair of leverage parameters to address the severer negative-positive label imbalance that occurred in SPL than in ordinary multi-label learning. Moreover, we resort to the self-labeling mechanism to disambiguate the unobserved labels and alleviate the bad impact of false negatives. • We for the first time analyze an SPL loss function from the perspective of risk consistency. Under mild sampling assumptions, we first show the theoretical link between arbitrary SPL losses and losses for ordinary multilabel learning. Then we prove the risk consistency of our LASD loss to the cost-sensitive Hamming loss, which guarantees the effectiveness of LASD in dealing with severe label imbalance. This theoretical result also provides theoretical guidance to the choice of the leverage parameters. • In experiments, we compare our proposed loss with other state-of-the-art SPL methods on multiple multi-label image classification datasets and show the effectiveness of our method. Empirical understandings of the leverage parameter and ablation studies are also conducted.

2. RELATED WORKS

2.1 METHODS TARGETING AT SPL PROBLEM Cole et al. ( 2021) first proposed the idea of SPL, which regards the false-negative labels as label noise, decreases the weights of the negative labels in the loss function, and uses the label smoothing (Szegedy et al., 2016) to trim the parameters. However, their loss function contains a hyper-parameter describing the average number of positive labels for each instance, which is hard to obtain and can vary across datasets. Verelst et al. ( 2022) utilized a consistency loss to keep feature map output the same across training epochs, making the model for multi-label learning act in synergy with the ubiquitous random resize crop data-augmentation. However, they focus less on the high negativepositive label imbalance in the SPL problem. Moreover, both the works mentioned above failed to justify their proposed methods theoretically. Zhou et al. (2022) for the first time introduced the idea of entropy maximization on the unlabeled data and utilized asymmetric pseudo-labeling to address the negative-positive imbalance. Kim et al. (2022) discussed the loss correction method to solve the SPL problem, while SPL algorithms with sound theoretical understandings from the statistical view are still yet to be exploited.

2.2. PML AND PU METHODS FOR SPL PROBLEM

The majority can be summarized in two branches: The first branch establishes a new setting with positive and negative labels and solves the problems by designing efficient loss and optimization

