SUBCLASS-BALANCING CONTRASTIVE LEARNING FOR LONG-TAILED RECOGNITION

Abstract

Long-tailed recognition with imbalanced class distribution naturally emerges in practical machine learning applications. Existing methods such as data reweighing, resampling, and supervised contrastive learning enforce the class balance with a price of introducing imbalance between instances of head class and tail class, which may ignore the underlying rich semantic substructures of the former and exaggerate the biases in the latter. We overcome these drawbacks by a novel "subclassbalancing contrastive learning (SBCL)" approach that clusters each head class into multiple subclasses of similar sizes as the tail classes and enforce representations to capture the two-layer class hierarchy between the original classes and their subclasses. Since the clustering is conducted in the representation space and updated during the course of training, the subclass labels preserve the semantic substructures of head classes. Meanwhile, it does not overemphasize tail class samples so each individual instance contribute to the representation learning equally. Hence, our method achieves both the instance-and subclass-balance, while the original class labels are also learned through contrastive learning among subclasses from different classes. We evaluate SBCL over a list of long-tailed benchmark datasets and it achieves the state-of-the-art performance. In addition, we present extensive analyses and ablation studies of SBCL to verify its advantages.

1. INTRODUCTION

In reality, the datasets often follow the Zipfian distribution over classes with a long tail (Zipf, 2013; Spain & Perona, 2007) , i.e., a few classes (head classes) containing significantly more instances than the remaining tail classes. Such tail classes could be of great importance for high-stake applications, e.g., patient class in medical diagnosis or accident class in autonomous driving (Cao et al., 2019; Shen et al., 2015) . However, training on such class-imbalanced datasets can result in a severely biased model with noticeable performance drop in classification tasks (Wang et al., 2017; Mahajan et al., 2018; Zhong et al., 2019; Ando & Huang, 2017; Buda et al., 2018; Collobert et al., 2008; Yang et al., 2019) . To overcome the challenges posed by long-tailed data, data resampling (Ando & Huang, 2017; Buda et al., 2018; Chawla et al., 2002; Shen et al., 2016) and loss reweighing (Byrd & Lipton, 2019; Cao et al., 2019; Cui et al., 2019; Dong et al., 2018) have been widely applied but they cannot fully leverage all the head-class samples. Very recent work discovered that supervised contrastive learning (SCL) (Khosla et al., 2020) can achieve state-of-the-art (SOTA) performance on benchmark datasets of long-tailed recognition (Kang et al., 2020; Li et al., 2022) . Specifically, the k-positive contrastive learning (KCL) (Kang et al., 2020) and its subsequent work targeted supervised contrastive learning (TSC) (Li et al., 2022) revamp SCL by encouraging the learned feature space to be class-balanced and uniformly distributed. However, those methods enforcing class-balance often come with a price of instance-imbalance, i.e., each individual instance of tail classes would have much greater impact on model training than that of head classes. Such instance-imbalance can result in significant degradation of the performance on long-tailed recognition for several reasons. On the one hand, the limited samples in each tail class might not be sufficiently representative of the whole class. So even a small bias of them can be enormously exaggerated by class-balancing methods and result in sub-optimal learning of classifiers or representations. On the other hand, head classes usually have more complicated semantic substructures, e.g., multiple high-density regions of the data distribution, so simply downweighing samples of head classes and treating them equally can easily lose critical structural information. For example, images of a head class "cat" might be highly diverse in breeds and colors, which need to be captured by different features but downweighing or subsampling them may easily lose such information, while a tail class "platypus" might only contain a few similar images that are unlikely to cover all the representative features. Therefore, it is non-trivial to enforce both class-balance and instance-balance simultaneously in the same method. Can we remove the negative impact of class-imbalance while still retain the advantages of instancebalance? In this paper, we achieve both through subclass-balancing contrastive learning (SBCL), a novel supervised contrastive learning defined on subclasses, which are the clusters within each head class, have comparable size as tail classes, and are adaptively updated during the training. Instead of sacrificing instance-balance for class-balance, our method achieves both instance-and subclass-balance by exploring the head-class structure in the learned representation space of the model-in-training. In particular, we propose a bi-granularity contrastive loss that enforces a sample (1) to be closer to samples from the same subclass than all the other samples; and (2) to be closer to samples from a different subclass but the same class than samples from any other subclasses. While the former learns representations with balanced and compact subclasses, the latter preserves the class structure on subclass level by encouraging the same class's subslasses to be closer to each other than to any different class's subclasses. Hence, it can learn an accurate classifier distinguishing original classes while enjoy both the instance-and subclass-balance. In this paper, we apply SBCL for several visual recognition tasks to demonstrate SBCL superiority over other previous works (e.g., KCL(Kang et al., 2020 ),TSC (Li et al., 2022) ). To summarize, this paper makes the following contributions: (a). We provide a new design principal of leveraging supervised contrastive learning for longtailed recognition, i.e., aiming at achieving both instance-and subclass-balance instead of class-balance at the expense of instance-balance.

(b).

We propose a novel instantiation of the aforementioned design principal, subclass-balancing contrastive learning (SBCL), which consists of two major components, namely, subclassbalancing adaptive clustering and bi-granularity contrastive loss. (c). Empirically, we compare the SBCL against state-of-the-art methods on three visual tasks: image classification, object detection, and instance segmentation to demonstrate its effectiveness on handling class imbalance. We also conduct a series of experiments to analyze the efficacy of SBCL.

2. BACKGROUND AND NOTATIONS

Long-tailed recognition. et al., 2019; Hong et al., 2021) to assume that the classes are sorted by cardinality in decreasing order (i.e., if i < j, then n i ≥ n j ), and n 1 ≫ n C . In addition, we define the imbalance ratio as max k∈ [C] (n k )/ min k∈[C] (n k ) = n 1 /n C . Finally, let f θ (•) be a deep feature extractor, e.g., a neural network, parameterized by θ and w c is the linear classifier of class c, then the classifier we aim to learn is h(x i ) = arg max c∈[C] w ⊤ c f θ (x i ). Supervised contrastive learning. Recent studies have shown that supervised contrastive learning (SCL) (Khosla et al., 2020) provides a strong performance gain for long-tailed recognition and its variants have achieved state-of-the-art (SOTA) performance (Kang et al., 2020; Li et al., 2022) . Specifically, SCL learns the feature extractor f θ (•) via maximizing the discriminativeness of positive instances, i.e., instances from the same class, and the learning objective for a single training data



Long-tailed recognition aims to learn a classifier from a training dataset with long-tailed class distribution, i.e., a few classes contain many data (head classes) while most classes contain only a few data (tail classes), where the major challenge is to require model recognizing all classes equally well. Let D = {x i , y i } i∈[n] be a long-tailed training dataset, where x i denotes a sample and y i ∈ [C] denotes its label. Denote by D k ⊆ D the set of instances belonging to class k and n k = |D k | the number of samples in class k The total number of training samples over C classes is n = C k=1 n k . Without loss of generality, we follow prior work (Kang

