THE DEVIL IS IN THE WRONGLY-CLASSIFIED SAM-PLES: TOWARDS UNIFIED OPEN-SET RECOGNITION

Abstract

Open-set Recognition (OSR) aims to identify test samples whose classes are not seen during the training process. Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications. In this paper, we deeply analyze the UOSR task under different training and evaluation settings to shed light on this promising research direction. For this purpose, we first evaluate the UOSR performance of several OSR methods and show a significant finding that the UOSR performance consistently surpasses the OSR performance by a large margin for the same method. We show that the reason lies in the known but wrongly classified samples, as their uncertainty distribution is extremely close to unknown samples rather than known and correctly classified samples. Second, we analyze how the two training settings of OSR (i.e., pre-training and outlier exposure) influence the UOSR. We find although they are both beneficial for distinguishing known and correctly classified samples from unknown samples, pre-training is also helpful for identifying known but wrongly classified samples while outlier exposure is not. In addition to different training settings, we also formulate a new evaluation setting for UOSR which is called few-shot UOSR, where only one or five samples per unknown class are available during evaluation to help identify unknown samples. We propose FS-KNNS for the few-shot UOSR to achieve state-of-the-art performance under all settings.

1. INTRODUCTION

Neural networks have achieved tremendous success in the closed-set classification (Deng et al., 2009) , where the test samples share the same In-Distribution (InD) class set with training samples. Open-Set Recognition (OSR) (Scheirer et al., 2013) is proposed to tackle the challenge that some samples whose classes are not seen during training, which are Out-of-Distribution (OoD) data, may occur in the real world applications and should be rejected. However, some researchers have argued that the model should not only reject OoD samples but also InD samples that are Wrongly classified (InW), as the model gives the wrong answers for both of them. So Unified Open-set Recognition (UOSR) is proposed to only accept InD samples that are correctly classified (InC) and reject OoD and InW samples (Kim et al., 2021) simultaneously. The difference between the UOSR and OSR lies in the InW samples, where OSR is supposed to accept them while UOSR has the opposite purpose. Actually, UOSR is more useful in most real-world applications, but it receives little attention from the research community as it has been proposed very recently and lacks comprehensive systematic research. Therefore, we deeply analyze the UOSR problem in this work to fill this gap. We first apply existing OSR methods for UOSR in Sec. 3, and then analyze UOSR under different training settings and evaluation settings in Sec. 4 and Sec. 5 respectively. In Sec. 3, several existing OSR methods are applied for UOSR, and we find that the UOSR performance is consistently and significantly better than the OSR performance for the same method, as shown in Fig. 1 (a) . We show that this phenomenon holds for different network architectures, datasets, and domains (image and video recognition). We find the devil is in the InW samples that have similar uncertainty distribution with OoD samples rather than InC samples. Therefore, the false positive predictions in OSR tend to be InW samples, which is extremely important but dismissed by all existing OSR works. In Sec. Pre-training is to use the weights that are trained on a large-scale dataset for better down-task performance, and outlier exposure is to introduce some background data without labels into training to help the model classify InD and OoD samples. We find both of them have better performance for InC/OoD discrimination, which explains why they are beneficial for OSR. However, pre-training is also helpful for InC/InW discrimination, while outlier exposure has a comparable or even worse performance to distinguish InC and InW samples. The performance of UOSR can be regarded as the comprehensive results of InC/OoD and InC/InW discrimination so that both techniques can boost the performance of UOSR. We build up a comprehensive UOSR benchmark that involves both pretraining and outlier exposure settings, as shown in Fig. 1 (b) . In addition to the two aforementioned training settings, we introduce a new evaluation setting into UOSR in Sec. 5. We formulate the few-shot UOSR, similar to SSD (Sehwag et al., 2022) that proposes few-shot OSR, where 1 or 5 samples per OoD class are introduced for reference to better identify OoD samples. We first develop a KNN-based baseline (Sun et al., 2022) FS-KNN for the fewshot UOSR. Although InC/OoD discrimination is improved due to the introduced OoD reference samples, the InC/InW discrimination is severely harmed compared to SoftMax baseline (Hendrycks & Gimpel, 2017) . To alleviate this problem, we propose FS-KNNS that dynamically fuses the FS-KNN with SoftMax uncertainty scores to keep high InC/InW and InC/OoD performance simultaneously. Our FS-KNNS achieves state-of-the-art performance under all settings in the UOSR benchmark, as shown in Fig. 1 (b), even without outlier exposure during training. Note that InC/OoD performances are comparable between FS-KNNS and FS-KNN, but their distinct InC/InW performances makes FS-KNN better at OSR and FS-KNNS better at UOSR, which illustrates the difference between few-shot OSR and UOSR and the importance of InW samples during evaluation.

2. TOWARDS UNIFIED OPEN-SET RECOGNITION

In this section, we first formalize the UOSR problem and then discuss the relation between UOSR and other uncertainty-related tasks. Unified Open-set Recognition. Suppose the training dataset is D train = {(x i , y i )} N i=1 ⊂ X × C, where X refers to the input space, e.g., images or videos, and C refers to the InD sets. In closed-set



Figure 1: (a) shows that the UOSR performance is significantly better than OSR performance for the same method, which illustrates the uncertainty distribution of these OSR methods is actually closer to the expectation of UOSR than OSR. (b) shows the UOSR performance under different settings and the skeleton of this paper. Results are based on the ResNet50 backbone. CIFAR100 and TinyImageNet are InD and OoD datasets, respectively. (TS: Train from Scratch. TP: Train from Pre-training. OE: Outlier Exposure. FS: Few-shot.)

