BEYOND RE-BALANCING: DISTRIBUTIONALLY RO-BUST AUGMENTATION AGAINST CLASS-CONDITIONAL DISTRIBUTION SHIFT IN LONG-TAILED RECOGNITION

Abstract

As a fundamental and practical problem, long-tailed recognition has drawn burning attention. In this paper, we investigate an essential but rarely noticed issue in long-tailed recognition, Class-Conditional Distribution (CCD) shift due to scarce instances, which exhibits a significant discrepancy between the empirical CCDs for training and test data, especially for tail classes. We observe empirical evidence that the shift is a key factor that limits the performance of existing longtailed learning methods, and provide novel understanding of these methods in the course of our analysis. Motivated by this, we propose an adaptive data augmentation method, Distributionally Robust Augmentation (DRA), to learn models more robust to CCD shift. A new generalization bound under mild conditions shows the objective of DRA bounds balanced risk on test distribubtion partially. Experimental results verify that DRA outperforms related data augmentation methods without extra training cost and significantly improves the performance of some existing long-tailed recognition methods.

1. INTRODUCTION

Recently, visual recognition has achieved significant progress, driven by the development of deep neural networks (He et al., 2016) as well as large-scale datasets (Russakovsky et al., 2015) . However, in contrast with manually balanced datasets, real-world data often has a long-tailed distribution over classes i.e. a few classes contain many instances (head classes), whereas most classes contain only a few instances (tail classes) (Liu et al., 2019; Van Horn & Perona, 2017) . Training models on long-tailed datasets usually leads to degenerated results, including over preference to head classes, undesired estimation bias and poor generalization (Zhou et al., 2020; Cao et al., 2019; Kang et al., 2019) . To solve above issues, various solutions have been proposed. Many of them focus on addressing imbalanced label distribution for simulating class-balanced model training. Direct re-balancing, like re-sampling and re-weighting, is the most intuitive (Huang et al., 2016; Zhang et al., 2021b) . Recently, the two-stage methods, which apply re-balancing strategy in tuning classifier (Kang et al., 2019) or defer re-weighting after initialization (Cao et al., 2019) , have been verified effective. Logit adjustment uses margin-based loss or post-hoc adjustment to rectify the biased prediction caused by long-tailed distribution (Menon et al., 2020; Ren et al., 2020; Hong et al., 2021) . Formally, denoting an input-label pair as (x, y), classification or recognition models are trained to estimate the posterior probability P (y|x) ∝ P (y)P (x|y). In long-tailed recognition scenarios, most solutions actually obey the following assumption: the class distribution P (y) shifts from training to test (usually classimbalanced in training but class-balanced in test), while the class-conditional distribution (CCD) P (x|y) keeps consistent, i.e. P train (y) ̸ = P test (y) and P train (x|y) = P test (x|y) (Menon et al., 2020; Ren et al., 2020) . Under this assumption, a series of methods including direct re-balancing and logit adjustment have been proved Fisher-consistent (Menon et al., 2020) . We argue that although the consistent CCD assumption (Menon et al., 2020) is reasonable if there is no sampling bias within each class, estimating P (x|y) by empirical CCD is unreliable, especially for tail classes where the samples are extremely scarce. Therefore, to obtain a generalizable model, the shift between empirical CCD and the ideal CCD cannot be ignored. Our focus does not overlap

