EXPLORING BALANCED FEATURE SPACES FOR REP-RESENTATION LEARNING

Abstract

Existing self-supervised learning (SSL) methods are mostly applied for training representation models from artificially balanced datasets (e.g. ImageNet). It is unclear how well they will perform in the practical scenarios where datasets are often imbalanced w.r.t. the classes. Motivated by this question, we conduct a series of studies on the performance of self-supervised contrastive learning and supervised learning methods over multiple datasets where training instance distributions vary from a balanced one to a long-tailed one. Our findings are quite intriguing. Different from supervised methods with large performance drop, the self-supervised contrastive learning methods perform stably well even when the datasets are heavily imbalanced. This motivates us to explore the balanced feature spaces learned by contrastive learning, where the feature representations present similar linear separability w.r.t. all the classes. Our further experiments reveal that a representation model generating a balanced feature space can generalize better than that yielding an imbalanced one across multiple settings. Inspired by these insights, we develop a novel representation learning method, called k-positive contrastive learning. It effectively combines strengths of the supervised method and the contrastive learning method to learn representations that are both discriminative and balanced. Extensive experiments demonstrate its superiority on multiple recognition tasks, including both long-tailed ones and normal balanced ones. Code is available at https://github.com/bingykang/BalFeat.

1. INTRODUCTION

Self-supervised learning (SSL) has been popularly explored as it can learn data representations without requiring manual annotations and offer attractive potential of leveraging the vast amount of unlabeled data in the wild to obtain strong representation models (Gidaris et al., 2018; Noroozi & Favaro, 2016; He et al., 2020; Chen et al., 2020a; Wu et al., 2018) . For instance, some recent SSL methods (Hénaff et al., 2019; Oord et al., 2018; Hjelm et al., 2018; He et al., 2020) use the unsupervised contrastive loss (Hadsell et al., 2006) to train the representation models by maximizing the instance discriminativeness, which are shown to generalize well across various downstream tasks, and even surpass the supervised learning counterparts in some cases (He et al., 2020; Chen et al., 2020a) . Despite the great success, existing SSL methods focus on learning data representations from the artificially balanced datasets (e.g. ImageNet (Deng et al., 2009) ) where all the classes have similar numbers of training instances. However in reality, since the classes in natural images follow the Zipfian distribution, the datasets are usually imbalanced and show a long-tailed distribution (Zipf, 1999; Spain & Perona, 2007) , i.e., some classes involving significantly fewer training instances than others. Such imbalanced datasets are very challenging for supervised learning methods to model, leading to noticeable performance drop (Wang et al., 2017; Mahajan et al., 2018; Zhong et al., 2019) . Thus several interesting questions arise: How well will SSL methods perform on imbalanced datasets? Will the quality of their learned representations deteriorate as the supervised learning methods? Or can they perform stably well? Answering these questions is important for understanding the behavior of SSL in practice. But these questions remain open as no research investigations have been conducted along this direction so far.

