ASYNCHRONOUS MODELING: A DUAL-PHASE PER-SPECTIVE FOR LONG-TAILED RECOGNITION

Abstract

This work explores deep learning based classification model on real-world datasets with a long-tailed distribution. Most of previous works deal with the long-tailed classification problem by re-balancing the overall distribution within the whole dataset or directly transferring knowledge from data-rich classes to data-poor ones. In this work, we consider the gradient distortion in long-tailed classification when the gradient on data-rich classes and data-poor ones are incorporated simultaneously, i.e., shifted gradient direction towards data-rich classes as well as the enlarged variance by the gradient fluctuation on data-poor classes. Motivated by such phenomenon, we propose to disentangle the distinctive effects of data-rich and data-poor gradient and asynchronously train a model via a dualphase learning process. The first phase only concerns the data-rich classes. In the second phase, besides the standard classification upon data-poor classes, we propose an exemplar memory bank to reserve representative examples and a memoryretentive loss via graph matching to retain the relation between two phases. The extensive experimental results on four commonly used long-tailed benchmarks including CIFAR100-LT, Places-LT, ImageNet-LT and iNaturalist 2018 highlight the excellent performance of our proposed method.

1. INTRODUCTION

Past years have witnessed huge progress in visual recognition with the successful application of deep convolutional neural networks (CNNs) on large-scale datasets, e.g., ImageNet ILSVRC 2012 (Russakovsky et al., 2015) , Places (Zhou et al., 2017) . Such datasets are usually artificially collected and exhibit approximately uniform distribution concerning the number of samples in each class. Real-world datasets, however, are always long-tailed that only a few classes occupy the majority of instances in the dataset (data-rich) and most classes have rarely few samples (data-poor) (Reed, 2001; Van Horn & Perona, 2017) . When modeling such datasets, many standard methods suffer from severe degradation of overall performance. More specifically, the recognition ability on classes with rarely few instances are significantly impaired (Liu et al., 2019) . One prominent direction is to apply class re-sampling or loss re-weighting to balance the influence of different classes (Byrd & Lipton, 2019; Shu et al., 2019) and another alternative is to conduct transferring (Wang et al., 2017; Liu et al., 2019) by the assumption that knowledge obtained on the data-rich classes should benefit the recognition of data-poor classes. Recently, more sophisticated models are designed to train the model either base on some new findings (Zhou et al., 2020; Kang et al., 2020) or combine all available techniques (Zhu & Yang, 2020). However, the property of longtailed setting makes it remain to be difficult to achieve large gains compared to balanced datasets. In contrast to the aforementioned strategies, we approach the long-tailed recognition problem by analyzing gradient distortion in long-tailed data, attributing to the interaction between gradients generated by data-rich and data-poor classes, i.e., the direction of overall gradient is shifted to be closer to the gradient on data-rich classes and its norm variance is increased due to the dramatic variation in the gradient generated by data-poor classes. The degenerated performance when comparing with balanced datasets indicates the gradient distortion is negative during model training. Motivated by this, we hypothesize that the combined analysis for gradients generated by data-rich and data-poor classes could be improper in long-tailed data and attempt to disentangle these two gradients. We thus propose the conception of asynchronous modeling and split the original network to promote a 1

