CLASS IMBALANCE IN FEW-SHOT LEARNING

Abstract

Few-shot learning aims to train models on a limited number of labeled samples from a support set in order to generalize to unseen samples from a query set. In the standard setup, the support set contains an equal amount of data points for each class. This assumption overlooks many practical considerations arising from the dynamic nature of the real world, such as class-imbalance. In this paper, we present a detailed study of few-shot class-imbalance along three axes: dataset vs. support set imbalance, effect of different imbalance distributions (linear, step, random), and effect of rebalancing techniques. We extensively compare over 10 state-of-the-art few-shot learning methods using backbones of different depths on multiple datasets. Our analysis reveals that 1) compared to the balanced task, the performances of their class-imbalance counterparts always drop, by up to 18.0% for optimization-based methods, although feature-transfer and metric-based methods generally suffer less, 2) strategies used to mitigate imbalance in supervised learning can be adapted to the few-shot case resulting in better performances, 3) the effects of imbalance at the dataset level are less significant than the effects at the support set level. The code to reproduce the experiments is released under an open-source license.

1. INTRODUCTION

Deep learning methods are well known for their state-of-the-art performances on a variety of tasks (LeCun et al., 2015; Russakovsky et al., 2015; Schmidhuber, 2015) . However, they often require to be trained on large labeled datasets to acquire robust and generalizable features. Few-Shot Learning (FSL) (Chen et al., 2019; Wang et al., 2019b; Bendre et al., 2020) aims at reducing this burden by defining a distribution over tasks, with each task containing a few labeled data points (support set) and a set of target data (query set) belonging to the same set of classes. A common way to train FSL methods is through episodic meta-training (Vinyals et al., 2017) with the model repeatedly exposed to batches of tasks sampled from a task-distribution and then tested on a different but similar distribution in the meta-testing phase. The prefix "meta" is commonly used to distinguish the highlevel training and evaluation routines of meta-learning (outer loop), from the training and evaluation routines at the single-task level (inner loop). Limitations. Standard meta-training overlooks many challenges stemming from real-world dynamics, such as class-imbalance (CI). The standard setting assumes that all classes in the support set contain the same number of data points, whereas in many practical applications, the number of samples for each class may vary (Buda et al., 2018; Leevy et al., 2018) . Given the limited amount of data used in FSL, a small difference in the number of samples between classes could already introduce significant levels of imbalance. Most FSL methods are not designed to cope with these more challenging settings. Figure 1 exemplifies these considerations by showing that several state-of-the-art FSL methods underperform when tested under three CI regimes (linear, step, random). Previous work. Previous work mainly focuses on the single imbalance case or grouping several settings into one task, offering limited insights into the effects of CI on FSL and making it challenging to quantify its effects (Guan et al., 2020; Triantafillou et al., 2020; Lee et al., 2019; Chen et al., 2020) . A common approach to mitigate imbalance is Random-Shot meta-training (Triantafillou et al., 2020) , which exposes the model to imbalanced tasks during meta-training. However, previous work provides little insight into the effectiveness of this procedure on the imbalanced FSL evaluation task. Furthermore, minimal work exists that investigates meta-training outcomes under an imbalanced distribution of classes at the (meta-)dataset level, while this case is common in recent FSL applications (Ochal et al., 2020; Guan et al., 2020) and meta-learning benchmarks (Triantafillou et al., 2020) . The CI problem is well-known within the supervised learning community, which has systematically produced strategies to deal with the problem, such as the popular Random Over-Sampling (Japkowicz & Stephen, 2002 ) that aims at rebalancing minority classes by uniform sampling. While such strategies have been extensively studied on many supervised learning problems, there is little understanding of how they behave with the recently proposed FSL methods in the low-data regime. Our work and main contributions. In this paper, we provide, for the first time, a detailed analysis of the CI problem within the FSL framework. Our results show that even small CI levels can introduce a significant performance drop for all the methods considered. Moreover, we find that only a few models benefit from Random-Shot meta-training (Triantafillou et al., 2020; Lee et al., 2019; Chen et al., 2020) over the classical (balanced) episodic meta-training (Vinyals et al., 2017) ; while pairing the meta-training procedures with Random Over-Sampling offers a substantial advantage. The experimental results show that imbalance severity at the dataset level depends on the size of the dataset. Our contributions can be summarized as follows: 1. A systematic, comprehensive and in-depth study of the effects of CI within the FSL framework along three axes: (i) dataset vs. support set imbalance, (ii) effect of different imbalance distributions (linear, step, random), (iii) effect of rebalancing techniques, such as random over-sampling and the recently proposed Random-Shot meta-training (Triantafillou et al., 2020) . 2. We reveal novel insights into the meta-learning and support set adaptation capabilities to the CI regime, supported by extensive results on over 10 FSL methods with different imbalance settings, backbones, support set sizes, and datasets. 3. We provide insight into the previously unaddressed CI problem in the (meta-)training dataset, showing that the effects of imbalance at the dataset level are less significant than the effects at the support set level. 



Figure 1: Accuracy (mean percentage on 3 runs) and 95% confidence intervals on FSL methods with balanced tasks (red bars) vs 3 imbalanced task (blue bars). Most methods perform significantly worse on the imbalanced tasks, as showed by the lower accuracy of the blue bars.

2.1 CLASS IMBALANCEIn classification, imbalance occurs when at least one class (the majority class) contains a higher number of samples than the others. The classes with the lowest number of samples are called minority classes. If uncorrected, conventional supervised loss functions, such as (multi-class) cross-entropy, skew the learning process in favor of the majority class, introducing bias and poor generalization toward the minority class samples(Buda et al., 2018; Leevy et al., 2018). Imbalance approaches are categorized into three groups: data-level, algorithm-level, and hybrid. Data-level strategies manipulate and create new data points to equalize data sampling. Popular data-level methods include Random Over-Sampling (ROS) and Random Under-Sampling (RUS)(Japkowicz & Stephen, 2002). ROS randomly resamples data points from the minority classes, while RUS randomly leaves out a randomly selected portion of the majority classes to decrease imbalance levels. Algorithm-level strategies use regularization or minimization of loss/cost functions. Weighted loss is a common approach where each sample's loss is weighted by the inverse frequency of that sample's class. Focal loss(Lin et al., 2017)  is another type of cost function that has seen wide success. Hybrid methods combine one or more types of strategies (e.g.Two-Phase Training, Havaei et al. (2017)).

