ONE SIZE DOESN'T FIT ALL: ADAPTIVE LABEL SMOOTHING

Abstract

This paper concerns the use of objectness measures to improve the calibration performance of Convolutional Neural Networks (CNNs). CNNs have proven to be very good classifiers and generally localize objects well; however, the loss functions typically used to train classification CNNs do not penalize inability to localize an object, nor do they take into account an object's relative size in the given image. During training on ImageNet-1K almost all approaches use random crops on the images and this transformation sometimes provides the CNN with background only samples. This causes the classifiers to depend on context. Context dependence is harmful for safety-critical applications. We present a novel approach to classification that combines the ideas of objectness and label smoothing during training. Unlike previous methods, we compute a smoothing factor that is adaptive based on relative object size within an image. This causes our approach to produce confidences that are grounded in the size of the object being classified instead of relying on context to make the correct predictions. We present extensive results using ImageNet to demonstrate that CNNs trained using adaptive label smoothing are much less likely to be overconfident in their predictions. We show qualitative results using class activation maps and quantitative results using classification and transfer learning tasks. Our approach is able to produce an order of magnitude reduction in confidence when predicting on context only images when compared to baselines. Using transfer learning, we gain 0.021AP on MS COCO compared to the hard label approach.

1. INTRODUCTION

Convolutional neural networks (CNNs) have been used for addressing many computer vision problems for over 2 decades (LeCun, 1998) ; in particular, showing promising results on object detection and localization tasks since 2013 (Krizhevsky et al., 2012; Russakovsky et al., 2015; Girshick et al., 2018) . Unfortunately, modern CNNs are overconfident in their predictions (Lakshminarayanan et al., 2017; Hein et al., 2019) and they suffer from reliability issues due to miscalibration (Guo et al., 2017a) . Problems related to overconfidence, generalization, bias and reliability represent a severe limitation of current CNNs for real-world applications. We address the problems of overconfidence and contextual bias in this work. Recently, (Szegedy et al., 2016) introduced label smoothing, providing soft labels that are a weighted average of the hard targets uniformly distributed over classes during training, to improve learning speed and generalization performance. In the case of classification CNNs, ground-truth labels are typically provided as a one-hot (hard labels) representation of class probabilities. These labels consist of 0s and 1s, with a single 1 indicating the pertinent class in a given label vector. Label smoothing minimizes weight magnification (Mukhoti et al., 2020; Müller et al., 2019) and shows improvement in learning speed and generalization; in contrast, hard targets tend to increase the values of the logits and produce overconfident predictions (Szegedy et al., 2016; Müller et al., 2019) . Label smoothing and the traditional hard labels force CNNs to produce high confidence predictions even when pertinent objects are absent during training. To obtain more reliable confidence measures, we use the objectness measure to derive a smoothing factor for every sample undergoing a unique scale and crop transformation in an adaptive manner. Safely deploying deep learning based models has also become a more immediate challenge (Amodei et al., 2016) . As a community, we need to obtain high accuracies, but also provide reliable uncertainty measures of CNNs. We can improve the precision 1

