UNLEASHING MASK: EXPLORE THE INTRINSIC OUT-OF-DISTRIBUTION DETECTION CAPABILITY Anonymous

Abstract

Out-of-distribution (OOD) detection is an important aspect of safely deploying machine learning models in real-world applications. Previous approaches either design better scoring functions or utilize the knowledge of outliers to equip the well-trained models with the ability of OOD detection. However, few of them explore to excavate the intrinsic OOD detection capability of a given model. In this work, we discover the existence of an intermediate stage of a model trained on in-distribution data having higher OOD detection performance than that of its final stage across different settings and further identify the critical attribution to be learning with atypical samples. Based on such empirical insights, we propose a new method, Unleashing Mask (UM), that restores the OOD discriminative capabilities of the model. To be specific, we utilize the mask to figure out the memorized atypical samples and fine-tune the model to forget them. Extensive experiments have been conducted to characterize and verify the effectiveness of our method.

1. INTRODUCTION

Out-of-distribution (OOD) detection has drawn increasing attention when deploying machine learning models into the open-world scenarios (Nguyen et al., 2015; Lee et al., 2018a) . Since the test samples can naturally arise from a label-different distribution, identifying OOD inputs is important, especially for those safety-critical applications like autonomous driving and medical intelligence. Previous studies focus on designing a series of scoring functions (Hendrycks & Gimpel, 2017b; Liang et al., 2018; Lee et al., 2018a; Liu et al., 2020; Sun et al., 2021; 2022) for OOD uncertainty estimation or finetuning with auxiliary outlier data to better distinguish the OOD inputs (Hendrycks et al., 2019c; Tack et al., 2020; Mohseni et al., 2020; Sehwag et al., 2021; Wei et al., 2022; Ming et al., 2022) . Despite the promising results achieved by previous methods (Hendrycks & Gimpel, 2017a; Hendrycks et al., 2019c; Liu et al., 2020; Ming et al., 2022) , little attention is paid to considering whether the well-trained given model is the most appropriate for OOD detection. In general, models deployed for various applications have different targets (e.g., multi-class classification) (Goodfellow et al., 2016) instead of OOD detection (Nguyen et al., 2015; Lee et al., 2018a) . However, most representative score functions, e.g., MSP (Hendrycks et al., 2019c) , ODIN(Liang et al., 2018), and Energy (Liu et al., 2020) , uniformly leverage the given models for OOD detection. Considering the target-oriented discrepancy, it arises a critical question: does the well-trained given model have the optimal OOD detection capability? If not, how can we find a more appropriate model for OOD detection? In this work, we start by revealing an important observation (as illustrated in Figure 1 ), i.e., there exists a historical training stage where the model has a higher OOD detection performance than the final well-trained one. This is generally true across different OOD/ID datasets (Netzer et al., 2011; Van Horn et al., 2018; Cimpoi et al., 2014) , learning rate schedules (Loshchilov & Hutter, 2017) , and model structures (Huang et al., 2017; Zagoruyko & Komodakis, 2016) . The empirical results of Figure 1 reflect the inconsistency between gaining better OOD detection capability (Nguyen et al., 2015) and pursuing better performance on ID data. We delve into the differences between the intermediate model and the final model by visualizing the misclassified examples. As shown in Figure 2 , one possible attribution for covering the detection capability should be memorizing the atypical samples (at the semantic level) that are hard to learn for the model. Seeking zero error on those samples makes the model more confident on OOD data (see Figures 1(b ) and 1(c)).

