LEVERAGED ASYMMETRIC LOSS WITH DISAMBIGUA-TION FOR MULTI-LABEL RECOGNITION WITH ONE-POSITIVE ANNOTATIONS

Abstract

In the problem of multi-label learning from single positive labels (SPL), we learn the potential multiple labels from one observable single positive annotation. Despite many efforts to solve this problem, an effective algorithm with sound theoretical understanding is still in need. In this paper, we propose a novel loss function for the SPL problem, called leveraged asymmetric loss with disambiguation (LASD), where we introduce a pair of leverage parameters to address the severe negativepositive imbalance. From the theoretical perspective, we analyze the SPL problem, for the first time, from the perspective of risk consistency, which links the SPL loss with losses for ordinary multi-label classification. We prove the consistency of our proposed LASD loss to the cost-sensitive Hamming loss, which provides guidance to the empirical choice of our proposed leverage parameters. In experiments, we demonstrate the effectiveness of our proposed LASD loss function over other state-of-the-art methods and empirically verify our theoretical results.

1. INTRODUCTION

Different from standard multi-class classification, where each instance is tagged with one target label, multi-label classification (Liu et al., 2022; Li et al., 2022) allows an instance to have multiple labels and thus is applicable to wider real-world scenarios. For example, a picture can contain multiple objects (Lanchantin et al., 2021; Hu et al., 2021) , a sentence can express multiple emotions (Huang et al., 2021; Fei et al., 2020) , and a song can belong to multiple genres (Shrivastava et al., 2020; Pellegrini & Masquelier, 2021) . Despite the wide applications of multi-label learning, the existence of multi-labels further increases the difficulty of annotating high-quality labels (Deng et al., 2014) . On the one hand, label annotation can be extremely laborious and costly (Deng et al., 2014) . On the other hand, the small objects or rare classes are often inevitably ignored by the human annotators (Liu et al., 2021; Wolfe et al., 2005) . To deal with such problems, researchers loosen the requirements of label and propose the "Partial Multi-Label" (PML) paradigm, where the label of each instance can be the subset of its complete label set (Xie & Huang, 2021; Yan & Guo, 2021; Lyu et al., 2021; Li et al., 2021) . Recently, based on PML classification, Cole et al. ( 2021) take a step further and present the paradigm called Single Positive Labels (SPL), where the data only provides one correct (positive) label for each instance. A simple example is provided to illustrate the the difference between the PML and SPL paradigms: for an image containing a sofa, a chair, and a potted plant but without a person or a car, for PML paradigm, we are given the label: sofa (yes), chair (yes), person (no), others (unknown), while for SPL paradigm, we only know the label: sofa (yes), others (unknown). In fact, many real-world multi-class datasets are potentially multi-labeled, where SPL algorithms can be used directly to explore their underlying multi-labels. For example, the empirical study by Stock & Cisse (2018) presents the multi-label nature in images of ImageNet dateset (Russakovsky et al., 2015) . By using the previous datasets for multi-class problems, SPL can save a massive amount of money, time, and labor resources on data collection. The algorithmic study of SPL is relatively under-exploited. Cole et al. (2021) first formally define the problem of SPL and proposes the adaptive SPL loss, which shows satisfactory experimental performance. However, it suffers from the shortcoming that one of its hyper-parameters, the average number of positive labels, is hard to define and may vary according to data selection. Subsequently, Verelst et al. (2022) and Zhou et al. (2022) also aim to solve the SPL problem, and introduce the

