ADAPTIVE ROBUST EVIDENTIAL OPTIMIZATION FOR OPEN SET DETECTION FROM IMBALANCED DATA

Abstract

Open set detection (OSD) aims at identifying data samples of an unknown class (i.e., open set) from those of known classes (i.e., closed set) based on a model trained from closed set samples. However, a closed set may involve a highly imbalanced class distribution. Accurately differentiating open set samples and those from a minority class in the closed set poses a fundamental challenge as the model may be equally uncertain when recognizing samples from the minority class. In this paper, we propose Adaptive Robust Evidential Optimization (AREO) that offers a principled way to quantify sample uncertainty through evidential learning while optimally balancing the model training over all classes in the closed set through adaptive distributively robust optimization (DRO). To avoid the model to primarily focus on the most difficult samples by following the standard DRO, adaptive DRO training is performed, which is governed by a novel multi-scheduler learning mechanism to ensure an optimal model training behavior that gives sufficient attention to the difficult samples and the minority class while capable of learning common patterns from the majority classes. Our experimental results on multiple real-world datasets demonstrate that the proposed model outputs uncertainty scores that can clearly separate samples from closed and open sets, respectively, and the detection results outperform the competitive baselines.

1. INTRODUCTION

In many practical scenarios (e.g., drug discovery, anomaly detection etc.), it is likely to encounter unknown samples and it is desirable that the model can properly detect these samples as unknown. Various approaches have been proposed to tackle the unknown sample detection problem (Bendale & Boult, 2016; Sun et al., 2020) , using techniques such as Weibull-Calibration SVM (W-SVM) (Scheirer et al., 2013) , reconstruction error (Zhang & Patel, 2017) , nearest neighbor (Júnior et al., 2016) , and quasi-linear function (Cevikalp & Yavuz, 2017 ). As a representative example, the Openmax framework removes softmax from the last layer of a neural network and includes an additional layer to produce the probability of a sample being unknown. This essentially redistributes the probability mass to (K + 1) classes (with unknown being a new class). Multiple efforts follow this direction (Sun et al., 2020; Neal et al., 2018) . While this technique is viable to detect open-set samples, the additional layer is included during the testing phase. As a result, the training still follows the closed set assumption. Recent advances in uncertainty quantification provide a more systematic way to break the closed set limitation by explicitly modeling the uncertainty mass that corresponds to the unknown class. One representative work is the evidential deep learning (EDL) model (Sensoy et al., 2018) , which treats the predicted multi-class probability as a multinomial opinion according to subjective logic (Jøsang, 2016) . Similar to EDL, Prior Networks (PNs) (Malinin & Gales, 2018) explicitly considers the distributional uncertainty that quantifies the distributional mismatch (Malinin & Gales, 2018) . The Posterior Networks further improves PNs by leveraging normalizing flows for density estimation in the latent space to predict a posterior distribution, which can be used to identify out-of-distribution (OOD) samples from in-distribution ones (Charpentier et al., 2020) . Despite the promising progress in OSD that focuses on differentiating samples from the closed and open sets, respectively, limited attention has been devoted to the situation where the closed set involves highly imbalanced classes, which may be quite common in many practical settings. For example, for anomaly detection, the known types of anomalies available for model training are usually unevenly distributed into multiple categories (e.g., car accident vs. shooting). Similarly, for computer-aided medical diagnosis, the known diseases (to the model) may be highly imbalanced based on the available cases. Thus, following the standard Empirical Risk Minimization (ERM) framework for training, the model may not learn properly from the minority class due to the lack of positive samples. As a result, it is more likely to misidentify a minority-class sample as an unknown-class sample during OSD, leading to a high false-positive rate. Distributionally Robust Optimization (DRO) offers an effective means to handle the imbalance class distribution in the closed set setting (Qi et al., 2020; Zhu et al., 2019) . In DRO, the worst case weighted loss is optimized, where the weights are searched in a given neighborhood (referred to as the uncertainty set) of the empirical sample distribution such that the overall loss is maximized. By expanding the uncertainty set, the model is encouraged to assign higher weights to difficult samples. As a result, samples from the minority class will be given more emphasis during model training if not properly learned (which incurs a larger loss). Another common solution to handle imbalanced class distribution in the closed set is through oversampling to achieve a more balanced class distribution (Chawla et al., 2002) . While both oversampling and DRO may help to improve the closed set performance, neither of them is adequate to address OSD from imbalanced data. A fundamental challenge lies in the interplay between samples from the minority class and the difficult samples from the majority classes. As a result, simply oversampling the minority class may neglect these difficult samples. Similarly, applying DRO with a flexible uncertainty set may put too much emphasis on these difficult samples and ignore the minority class as well as some representative samples from the majority classes, which affects proper model training. In fact, directly applying these models for OSD may lead to even worse detection performance, which is evidenced by our experimental results. Few recent approaches try to address the OSD under class-imbalanced setting. Liu et al. ( 2019) leverage the visual similarity across the centroids of closed set classes to allow more effective training from the minority class samples. However, it is possible that the samples from the minority class may look quite different from most other samples, making such a strategy less effective. Further, Wang et al. ( 2022) try to push minority class samples away from open set ones in the feature space using contrastive learning. However, the final OSD depends heavily on the selection of open set samples as evidenced by our experiment results. To systematically tackle the fundamental challenge as outlined above, we propose Adaptive Robust Evidential Optimization (AREO) that offers a principled way to quantify sample uncertainty through evidential learning while optimally balancing the model training over all classes in the closed set through novel adaptive DRO learning. To avoid the model from primarily focusing on the most difficult samples by following the standard DRO, the adaptive learning strategy gradually increases the size of the uncertainty set using Multi Scheduler Function (MSF), which allows the model to learn from easy to hard samples. A class-ratio biased loss is further assigned to the minority class to ensure proper learning from its limited samples. Our main contribution is fourfold: • a novel extension of DRO to evidential learning, which enables principled uncertainty quantification under the class imbalanced setting, critical for many applications, including OSD, • adaptive DRO training governed by a uniquely designed multi-scheduler learning mechanism to ensure an optimal model training behavior that gives sufficient attention to the difficult samples and the minority class while capable of learning common patterns from the majority classes, • theoretical connection to a boosting model (i.e., AdaBoost), which ensures the nice convergence and generalization properties of AREO, • state-of-the-art OSD performance on various datasets.

2. RELATED WORK

Open set detection. Various SVM based techniques (Scheirer et al., 2013; Jain et al., 2014; Scheirer et al., 2014) have been proposed for OSD. For instance, Scheirer et al. (Scheirer et al., 2013) proposed an SVM based model, which performs detection using a Weibull-calibrated SVM (W-SVM) by leveraging Extreme Value Theory (EVT). Reconstruction based approaches have been proposed (Zhang & Patel, 2017) , where a threshold defined over the reconstruction error is used to decide whether the sample is from a known or an unknown class. Other traditional models, such as nearest neighbor (Júnior et al., 2016) , quasi-linear function (Cevikalp & Yavuz, 2017) , have also been explored as well. Deep learning models have been increasingly applied for open

