RECALL LOSS FOR IMBALANCED IMAGE CLASSIFICA-TION AND SEMANTIC SEGMENTATION

Abstract

Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation and image classification. Specifically, uneven class distributions in a training dataset often result in unsatisfactory performance on underrepresented classes. Many works have proposed to weigh the standard cross entropy loss function with pre-computed weights based on class statistics such as the number of samples and class margins. There are two major drawbacks to these methods: 1) constantly up-weighing minority classes can introduce excessive false positives especially in semantic segmentation; 2) many recent works discovered that pre-computed weights have adversarial effects on representation learning. In this regard, we propose a hard-class mining loss by reshaping the vanilla cross entropy loss such that it weights the loss for each class dynamically based on changing recall performance. We show mathematically that the novel recall loss changes gradually between the standard cross entropy loss and the well-known inverse frequency cross entropy loss and balances precision and accuracy. We first demonstrate that the proposed loss effectively balances precision and accuracy on semantic segmentation datasets, and leads to significant performance improvement over other state-of-the-art loss functions used in semantic segmentation, especially on shallow networks. On image classification, we design a simple two-head training strategy to show that the novel loss function improves representation learning on imbalanced datasets. We outperform the previously best performing method by 5.7% on Place365-LT and by 1.1% on iNaturalist.

1. INTRODUCTION

Dataset imbalance is an important problem for many computer vision tasks such as semantic segmentation and image classification. In semantic segmentation, imbalance occurs as a result of natural occurrence and varying sizes of different classes. For example, in an outdoor driving segmentation dataset, light poles and pedestrians are considered minority classes compared to large classes such as building, sky, and road. These minority classes are often more important than large classes for safety reasons. In image classification, imbalance can occur as a result of data collection. Some classes are more difficult to obtain data for than others. For example, the inaturalist dataset (Van Horn et al., 2018) has collected images of over 8000 natural species. Since some species are rare, the dataset exhibits the notorious long-tail distribution. When presented with imbalanced datasets, the standard cross entropy loss often yields unsatisfactory results as the training process naturally biases towards large classes resulting in low accuracy and precision on small classes. Researchers have studied the imbalance problem for classification, detection, and segmentation extensively. Most prior research has been on designing balanced loss functions. We classify existing loss functions under three categories: region-based losses, statistics-balanced losses and performance-balanced losses. Region-based losses directly optimize region metrics (e.g., Jaccard index (Rahman & Wang, 2016)) and are mainly popular in medical segmentation applications; Statistics-balanced losses (e.g., LDAM (Cao et al., 2019) , Class-Balanced (CB) loss (Cui et al., 2019) ) up/down weighs the contribution of a class based on its class margin or class size; however, they tend to encourage excessive false positives in minority classes to improve mean accuracy especially in segmentation. A recent study in Zhou et al. ( 2020) also shows that the weighting undermines the generic representation learning capability of the feature extractors; Performancebalanced losses (e.g., focal loss (Lin et al., 2017) ) use a certain performance indicator to weigh

