RECALL LOSS FOR IMBALANCED IMAGE CLASSIFICA-TION AND SEMANTIC SEGMENTATION

Abstract

Class imbalance is a fundamental problem in computer vision applications such as semantic segmentation and image classification. Specifically, uneven class distributions in a training dataset often result in unsatisfactory performance on underrepresented classes. Many works have proposed to weigh the standard cross entropy loss function with pre-computed weights based on class statistics such as the number of samples and class margins. There are two major drawbacks to these methods: 1) constantly up-weighing minority classes can introduce excessive false positives especially in semantic segmentation; 2) many recent works discovered that pre-computed weights have adversarial effects on representation learning. In this regard, we propose a hard-class mining loss by reshaping the vanilla cross entropy loss such that it weights the loss for each class dynamically based on changing recall performance. We show mathematically that the novel recall loss changes gradually between the standard cross entropy loss and the well-known inverse frequency cross entropy loss and balances precision and accuracy. We first demonstrate that the proposed loss effectively balances precision and accuracy on semantic segmentation datasets, and leads to significant performance improvement over other state-of-the-art loss functions used in semantic segmentation, especially on shallow networks. On image classification, we design a simple two-head training strategy to show that the novel loss function improves representation learning on imbalanced datasets. We outperform the previously best performing method by 5.7% on Place365-LT and by 1.1% on iNaturalist.

1. INTRODUCTION

Dataset imbalance is an important problem for many computer vision tasks such as semantic segmentation and image classification. In semantic segmentation, imbalance occurs as a result of natural occurrence and varying sizes of different classes. For example, in an outdoor driving segmentation dataset, light poles and pedestrians are considered minority classes compared to large classes such as building, sky, and road. These minority classes are often more important than large classes for safety reasons. In image classification, imbalance can occur as a result of data collection. Some classes are more difficult to obtain data for than others. For example, the inaturalist dataset (Van Horn et al., 2018) has collected images of over 8000 natural species. Since some species are rare, the dataset exhibits the notorious long-tail distribution. When presented with imbalanced datasets, the standard cross entropy loss often yields unsatisfactory results as the training process naturally biases towards large classes resulting in low accuracy and precision on small classes. Researchers have studied the imbalance problem for classification, detection, and segmentation extensively. Most prior research has been on designing balanced loss functions. We classify existing loss functions under three categories: region-based losses, statistics-balanced losses and performance-balanced losses. Region-based losses directly optimize region metrics (e.g., Jaccard index (Rahman & Wang, 2016)) and are mainly popular in medical segmentation applications; Statistics-balanced losses (e.g., LDAM (Cao et al., 2019) We propose a novel performance-balanced loss using the recall metric to address the imbalance problem. The recall loss down/up weighs a class based on the training recall performance of that class. It is an example of hard class mining as supposed to the hard example mining strategy in the focal loss. Unlike the statistics-balanced losses, the recall loss dynamically changes its weights with training based on per-class recall performance (see fig. 1(a) ). The dynamism is the key to overcome many drawbacks of the statistics-balanced losses. In our experiments, the CB loss improves accuracy at the expense of Intersection over Union (IOU) which considers false positives in semantic segmentation. However, our recall loss can effectively balance between precision and recall of each class, and hence, it improves accuracy but maintains a competitive IOU. Experiments on two benchmark semantic segmentation datasets demonstrate that the proposed recall loss shows significantly better performance than state-of-the-art loss functions used in prior works. We also show that while statistics-balanced losses negatively affect representation learning, the recall loss improves representation learning for imbalanced image classification and achieves state-of-the-art results with our simple decoupled network (fig. 1 (b),(c)) on two common benchmarks. Specifically, we outperform previous state-of-the-art methods on Place-LT by 5.7% and iNaturalist2018 by 1.1%. Our main contributions are summarized below. • We introduce a novel loss function based on the metric recall. Recall loss weighs the standard cross entropy loss for each class with its instantaneous training recall performance. • The proposed recall loss learns a better semantic segmentation model that provides improved and balanced performance of accuracy and IOU. We demonstrate the loss on both synthetic and real semantic segmentation datasets. • The proposed loss also improves feature learning in image classification. We show state-of-the-art results on two common classification benchmarks with a simple decoupled network.

2. RELATED WORK

Imbalance in Image Classification. Various losses have been proposed to deal with imbalance or long-tail distributions in image classification. Cost-sensitive loss (Khan et al., 2017) proposes to



Figure 1: (a) We show normalized recall weights over iterations for recall loss. The horizontal axis is arranged in descending order in terms of pixel percentage. % indicates class pixel percentage. The recall weights change dynamically according to the performance metric recall. (b) We design a Simple Decoupled Network (SDN) to decouple representation and classifier learning. (c) In inference, only one branch is used in SDN.

