LONG-TAILED PARTIAL LABEL LEARNING VIA DYNAMIC REBALANCING

Abstract

Real-world data usually couples the label ambiguity and heavy imbalance, challenging the algorithmic robustness of partial label learning (PLL) and long-tailed learning (LT). The straightforward combination of LT and PLL, i.e., LT-PLL, suffers from a fundamental dilemma: LT methods build upon a given class distribution that is unavailable in PLL, and the performance of PLL is severely influenced in long-tailed context. We show that even with the auxiliary of an oracle class prior, the state-of-the-art methods underperform due to an adverse fact that the constant rebalancing in LT is harsh to the label disambiguation in PLL. To overcome this challenge, we thus propose a dynamic rebalancing method, termed as RECORDS, without assuming any prior knowledge about the class distribution. Based on a parametric decomposition of the biased output, our method constructs a dynamic adjustment that is benign to the label disambiguation process and theoretically converges to the oracle class prior. Extensive experiments on three benchmark datasets demonstrate the significant gain of RECORDS compared with a range of baselines. The code is publicly available.

1. INTRODUCTION

Partial label learning (PLL) origins from the real-world scenarios, where the annotation for each sample is an ambiguous set containing the groundtruth and other confusing labels. This is common when we gather annotations of samples from news websites with several tags (Luo & Orabona, 2010) , videos with several characters of interest (Chen et al., 2018) , or labels from multiple annotators (Gong et al., 2018) . The ideal assumption behind PLL is that the collected data is approximately uniformly distributed regarding classes. However, a natural distribution assumption in above realworld applications should be imbalance, especially follows the long-tailed law, which should be considered if we deploy the PLL methods into online systems. This thereby poses a new challenge about the robustness of algorithms to both category imbalance and label ambiguity in PLL studies. Existing efforts, partial label learning and long-tailed learning, independently study the partial aspect of this problem in the past decades. The standard PLL requires the label disambiguation from candidate sets along with the training of an ordinary classifier (Feng et al., 2020) . The mainstream to solve this problem is estimating label-wise confidence to implicitly or explicitly re-weight the classification loss, e.g., PRODEN (Lv et al., 2020 ), LW (Wen et al., 2021 ), CAVL (Fei et al., 2022) and CORR (Wu et al., 2022) , which have achieved the state-of-the-art performance in PLL. When it comes to the long-tailed learning, the core difficulty lies on diminishing the inherent bias induced by the heavy class imbalance (Chawla et al., 2002; Menon et al., 2013) . The simple but fairly effective method is the logit adjustment (Menon et al., 2021; Ren et al., 2020) , which has been demonstrated very powerful in a range of recent studies (Cui et al., 2021; Narasimhan & Menon, 2021) . Nevertheless, considering a more practical long-tailed partial label learning (LT-PLL) problem, several dilemma remains based on the above two paradigms. One straightforward concern is that the skewed long-tailed distribution exacerbates the bias to the head classes in the label disambiguation, † Corresponding to: Jiangchao Yao (Sunarker@sjtu.edu.cn) and Yanfeng Wang (wangyanfeng@sjtu.edu.cn). easily resulting in the trivial solution that are excessively confident to the head classes. More importantly, most state-of-the-art long-tailed learning methods cannot be directly used in LT-PLL, since they require the class distribution available that is agnostic in PLL due to the label ambiguities. In addition, we discover that even after applying an oracle class distribution prior in the training, existing techniques underperform in LT-PLL and even fail in some cases. In Figure 1 , we trace the average prediction of a PLL model PRODEN (Lv et al., 2020) , on a uniform test set. Normally, the backbone PLL method PRODEN exhibits the biased prediction towards head classes shown by the blue curve, and ideally, we expect with the intervention of the state-of-the-art logit adjustment in LT, the prediction for all classes will be equally confident, namely, the purple curve. However, as can be seen, PRODEN calibrated by the oracle prior actually performs worse and is prone to over-adjusting towards the tail classes as shown in the orange curve. This is because logit adjustment in LT leverages a constant class distribution prior to rebalance the training and does not consider the dynamic of label disambiguation. Specially, at the early stage where the true label is very ambiguous from the candidate set, over-adjusting the logit only leads to the strong confusion of the classifier, which is negative to the overall training. Thus, we can see in Figure 1 , the average prediction on the tail classes becomes too high along with the training. Based on the above analysis, compared with the previous constant rebalancing methods in PLL, a dynamic rebalancing mechanism friendly to the training dynamic will be more preferred. To this intuition, we propose a novel method, termed as REbalanCing fOR Dynamic biaS (RECORDS) for LT-PLL. Specifically, we perform a parametric decomposition of the biased model output and implement a dynamic adjustment by maintaining a prototype feature with momentum updates during training. The empirical and theoretical analysis demonstrate that our dynamic parametric class distribution is asymmetrically approaching to the statistical prior but benign to the overall training. A quick glance at the performance of RECORDS is the red curve in Figure 1 , which approximately fits the expected purple curve in the whole training progress. The contribution can be summarized as follows, 1. We delve into a more practical but under-explored LT-PLL scenario, and identify its several challenges in this task that cannot be addressed and even lead to failure by the straightforward combination of the current long-tailed learning and partial label learning. 2. We propose a novel RECORDS for LT-PLL that conducts the dynamic adjustment to rebalance the training without requiring any prior about the class distribution. The theoretical and empirical analysis show that the dynamic parametric class distribution is asymmetrically approaching to the oracle class distribution but more friendly to label disambiguation. 3. Our method is orthogonal to existing PLL methods and can be easily plugged into the current PLL methods in an end-to-end manner. Extensive experiments on three benchmark datasets under the long-tailed setting and a range of PLL methods demonstrate the effectiveness of the proposed RECORDS. Specially, we show a 32.03% improvement in classification performance compared to the best CORR on the Pascal VOC dataset.



Figure 1: Average classifier prediction (on the CIFAR-100 test set) of different methods during training (on LT-PLL training set CIFAR-100-LT with imbalance ratio ρ = 100 and ambiguity q = 0.05). "PRODEN" (Lv et al., 2020) is a popular PLL method. "PRODEN + Oracle-LA" denotes PRO-DEN with the state-of-the-art logit adjustment (Menon et al., 2021; Hong et al., 2021) in LT under the oracle prior. "PRODEN + RECORDS" is PRODEN with our proposed calibration. "Uniform" characterizes the expected average confidence on different classes.

