ENERGY-BASED OUT-OF-DISTRIBUTION DETECTION FOR MULTI-LABEL CLASSIFICATION

Abstract

Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment. Improved methods for OOD detection in multi-class classification have emerged, while OOD detection methods for multi-label classification remain underexplored and use rudimentary techniques. We propose SumEnergy, a simple and effective method, which estimates the OOD indicator scores by aggregating energy scores from multiple labels. We show that SumEnergy can be mathematically interpreted from a joint likelihood perspective. Our results show consistent improvement over previous methods that are based on the maximum-valued scores, which fail to capture joint information from multiple labels. We demonstrate the effectiveness of our method on three common multi-label classification benchmarks, including MS-COCO, PASCAL-VOC, and NUS-WIDE. We show that SumEnergy can reduce the FPR95 by up to 10.05% compared to the previous best baseline, establishing state-of-the-art performance.

1. INTRODUCTION

Out-of-distribution (OOD) detection is central for reliably deploying machine learning models in open-world environments, where new forms of test-time data may appear that were nonexistent during the training time. The problem of OOD detection has gained significant research attention lately, given its importance for safety-critical applications such as unseen disease identification (Cao et al., 2020) . However, recent studies have primarily focused on detecting OOD examples in multiclass classification, where each sample is assigned to one and only one label (Bevandić et al., 2018; Hein et al., 2019; Hendrycks & Gimpel, 2016; Lakshminarayanan et al., 2017; Lee et al., 2018; Liang et al., 2018; Mohseni et al., 2020; Chen et al., 2020; Hsu et al., 2020; Liu et al., 2020) . This can be restrictive in many real-world settings where images often have multiple objects of interest. For example, self-driving cars must differentiate between the road, traffic signs, and obstacles within a frame. In the medical domain, multiple abnormalities may be present in a medical image (Wang et al., 2017) . Multi-label classification is desirable since there is no constraint on the number of classes an instance can be assigned to. Currently, OOD detection in multi-label classification remains relatively underexplored. Out of the multi-label methods evaluated in (Hendrycks et al., 2019) , MaxLogit achieved the best performance. However, simply using the maximum-valued logit is limiting because it does not incorporate information available from other possible labels. As seen in Figure 1 , MaxLogit can only capture the difference between the dominant outputs for dog (in-dist.) and car (OOD), while positive information from another dominant label cat (in-dist.) is dismissed. Other baseline methods such as ODIN (Liang et al., 2018) and Mahalanobis (Lee et al., 2018 ) also derive scores based on the maximum score (e.g., calibrated softmax score or Mahalanobis distance), and fail to capture the joint information. While energy scores have recently demonstrated superior OOD detection performance in the multi-class setting (Liu et al., 2020) , this method does not trivially generalize to a multi-label classification setting where labels are not mutually exclusive. Hitherto, a key challenge lies in how to leverage information across different labels. In this paper, we propose an energy-based method for OOD detection in the multi-label setting, which estimates OOD indicator scores jointly from multiple labels. We propose a simple and effective aggregation mechanism, SumEnergy, that seeks to combine the label-wise energy scores derived from individually independent classes. We show that the SumEnergy scores are mathematically

