ON THE IMPORTANCE OF IN-DISTRIBUTION CLASS PRIOR FOR OUT-OF-DISTRIBUTION DETECTION Anonymous authors

Abstract

Given a pre-trained in-distribution (ID) model, the task of inference-time out-ofdistribution (OOD) detection methods aims to recognize upcoming OOD data in inference time. However, some representative methods share an unproven assumption that the probability that OOD data belong to every ID class should be the same, i.e., probabilities that OOD data belong to ID classes form a uniform distribution. In this paper, we theoretically and empirically show that this assumption makes these methods incapable of recognizing OOD data when the ID model is trained with class-imbalanced data. Fortunately, by analyzing the causal relations between ID/OOD classes and features, we identify several common scenarios where probabilities that OOD data belong to ID classes should be the ID-class-prior distribution. Based on the above finding, we propose two effective strategies to modify previous inference-time OOD detection methods: 1) if they explicitly use the uniform distribution, we can replace the uniform distribution with the ID-class-prior distribution; 2) otherwise, we can reweight their scores according to the similarity between the ID-class-prior distribution and the softmax outputs of the pre-trained model. Extensive experiments show that both strategies significantly improve the accuracy of recognizing OOD data when the ID model is pre-trained with imbalanced data. As a highlight, when evaluating on the iNaturalist dataset, our method can achieve ∼36% increase on AUROC and ∼61% decrease on FPR95, compared with the original Energy method, reflecting the importance of ID-class prior in the OOD detection, which lights up a new road to study this problem.

1. INTRODUCTION

How to reliably deploy machine learning models into real-world scenarios has been attracting more and more attention (Huang et al., 2021; Liang et al., 2018; Liu et al., 2020) . In real-world scenarios, test data usually contain known and unknown classes (Hendrycks & Gimpel, 2017) . We expect the deployed model to eliminate the interference of unknown classes while classifying known classes well. Nevertheless, current models tend to be overconfident in the unknown classes (Nguyen et al., 2015) , and thus confusing known and unknown classes, which increases the risk of deploying these models in the real world. Especially if the scenarios are life-critical (e.g., car-driving scenarios), we cannot take the risks of deploying unreliable models in them. This motivates researchers to study out-of-distribution (OOD) detection, where we need to identify unknown classes (i.e., OOD classes) and classify known classes (i.e., in-distribution (ID) classes) well at the same time (Hendrycks & Gimpel, 2017; Hendrycks et al., 2019) . In the OOD detection, a well-known branch is to develop the inference-time/post hoc OOD detection methods (Huang et al., 2021; Liang et al., 2018; Liu et al., 2020; Hendrycks & Gimpel, 2017; Lee et al., 2018b; Sun et al., 2021) , where we are given a pre-trained ID model and then aim to recognize upcoming OOD data well. The key advantage of inference-time OOD detection methods is that the classification performance on ID data will be unaffected since we only use the ID model instead of changing it. A general way to design a large-scale-friendly inference-time OOD detection method is to propose a score function by using the ID model's information. For example, maximum softmax probability (MSP) uses the ID model's outputs (Hendrycks & Gimpel, 2017) , and GradNorm uses the ID model's gradients (Huang et al., 2021) . If the score of a data point is smaller, then this data point is an OOD data point with a higher probability.

