INFORMATIVE OUTLIER MATTERS: ROBUSTIFYING OUT-OF-DISTRIBUTION DETECTION USING OUTLIER MINING Anonymous

Abstract

Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in an open-world setting. However, existing OOD detection solutions can be brittle in the open world, facing various types of adversarial OOD inputs. While methods leveraging auxiliary OOD data have emerged, our analysis reveals a key insight that the majority of auxiliary OOD examples may not meaningfully improve the decision boundary of the OOD detector. In this paper, we provide a theoretically motivated method, Adversarial Training with informative Outlier Mining (ATOM), which improves the robustness of OOD detection. We show that, by mining informative auxiliary OOD data, one can significantly improve OOD detection performance, and somewhat surprisingly, generalize to unseen adversarial attacks. ATOM achieves state-of-the-art performance under a broad family of classic and adversarial OOD evaluation tasks. For example, on the CIFAR-10 in-distribution dataset, ATOM reduces the FPR95 by up to 57.99% under adversarial OOD inputs, surpassing the previous best baseline by a large margin.

1. INTRODUCTION

Out-of-distribution (OOD) detection has become an indispensable part of building reliable open-world machine learning models (Amodei et al., 2016 ). An OOD detector determines whether an input is from the same distribution as the training data, or a different distribution (i.e., out-of-distribution). The performance of the OOD detector is central for safety-critical applications such as autonomous driving (Eykholt et al., 2018) or rare disease identification (Blauwkamp et al., 2019) . Despite exciting progress made in OOD detection, previous methods mostly focused on clean OOD data (Hendrycks & Gimpel, 2016; Liang et al., 2018; Lee et al., 2018; Lakshminarayanan et al., 2017; Hendrycks et al., 2018; Mohseni et al., 2020) . Scant attention has been paid to the robustness aspect of OOD detection. Recent works (Hein et al., 2019; Sehwag et al., 2019; Bitterwolf et al., 2020) considered worst-case OOD detection under adversarial perturbations (Papernot et al., 2016; Goodfellow et al., 2014; Biggio et al., 2013; Szegedy et al., 2013) . For example, an OOD image (e.g., mailbox) can be perturbed to be misclassified by the OOD detector as in-distribution (traffic sign data). Such an adversarial OOD example is then passed to the image classifier and trigger undesirable prediction and action (e.g., speed limit 70). Therefore, it remains an important question to make out-of-distribution detection algorithms robust in the presence of small perturbations to OOD inputs. In this paper, we begin with formally formulating the task of robust OOD detection and providing theoretical analysis in a simple Gaussian data model. While recent OOD detection methods (Hendrycks et al., 2018; Hein et al., 2019; Meinke & Hein, 2019; Mohseni et al., 2020) have leveraged auxiliary OOD data, they often sample randomly uniformly from the auxiliary dataset. Contrary to the common practice, our analysis reveals a key insight that the majority of auxiliary OOD examples may not provide useful information to improve the decision boundary of OOD detector. Under a Gaussian model of the data, we theoretically show that using outlier mining significantly improves the error bound of OOD detector in the presence of non-informative auxiliary OOD data. Motivated by this insight, we propose Adversarial Training with informative Outlier Mining (ATOM), which justifies the theoretical intuitions above and achieves state-of-the-art performance on a broad family of classic and adversarial OOD evaluation tasks for modern neural networks. We show that, by carefully choosing which OOD data to train on, one can significantly improve the robustness of an OOD detector, and somewhat surprisingly, generalize to unseen adversarial attacks. We note that while hard negative mining has been extensively used in various learning tasks such as object recognition (Felzenszwalb et al., 2009; Gidaris & Komodakis, 2015; Shrivastava et al., 2016) , to the best of our knowledge, we are the first to exploit the novel connection between hard example mining and OOD detection. We show both empirically and theoretically that hard example mining significantly improves the generalization and robustness of OOD detection. To evaluate our method, we provide a unified framework that allows examining the robustness of OOD detection algorithms under a broad family of OOD inputs, as illustrated in Figure 1 . Our evaluation includes existing classic OOD evaluation task -Natural OOD, and adversarial OOD evaluation task -L ∞ OOD. Besides, we also introduce new adversarial OOD evaluation tasks -Corruption OOD and Compositional OOD. Under these evaluation tasks, ATOM achieves state-ofthe-art performance compared to eight competitive OOD detection methods (refer to Appendix B.3 for a detailed description of these methods). On the Natural OOD evaluation task, ATOM achieves comparable and often better performance than current state-of-the-art methods. On L ∞ OOD evaluation task, ATOM outperforms current state-of-the-art method ACET by a large margin (e.g. on CIFAR-10, outperforms it by 53.9%). Under the new Corruption OOD evaluation task, where the attack is unknown during training time, ATOM also achieves much better results than previous methods (e.g. on CIFAR-10, outperform previous best method by 30.99%). While almost every method fails under the hardest Compositional OOD evaluation task, ATOM still achieves impressive results (e.g. on CIFAR-10, reduce the FPR by 57.99%). The performance is noteworthy since ATOM is not trained explicitly on corrupted OOD inputs. In summary, our contributions are: • Firstly, we contribute theoretical analysis formalizing the intuition of mining hard outliers for improving the robustness of OOD detection. • Secondly, we contribute a theoretically motivated method, ATOM, which leads to state-ofthe-art performance on both classic and adversarial OOD evaluation tasks. We conduct extensive evaluations and ablation analysis to demonstrate the effectiveness of informative outlier mining. • Lastly, we provide a unified evaluation framework that allows future research examining the robustness of OOD detection algorithms under a broad family of OOD inputs.



Figure 1: When deploying an image classification system (OOD detector G(x) + image classifier f (x)) in an open world, there can be multiple types of out-of-distribution examples. We consider a broad family of OOD inputs, including (a) Natural OOD, (b) L∞ OOD, (c) corruption OOD, and (d) Compositional OOD. A detailed description of these OOD inputs can be found in Section 5.1. In (b-d), a perturbed OOD input (e.g., a perturbed mailbox image) can mislead the OOD detector to classify it as an in-distribution sample. This can trigger the downstream image classifier f (x) to predict it as one of the in-distribution classes (e.g., speed limit 70). Through adversarial training with informative outlier mining (ATOM), our method can robustify the decision boundary of OOD detector G(x), which leads to improved performance across all types of OOD inputs. Solid lines are actual computation flow.

