A SIMPLE AND EFFECTIVE BASELINE FOR OUT-OF-DISTRIBUTION DETECTION USING AN ABSTENTION CLASS Anonymous

Abstract

Refraining from confidently predicting when faced with categories of inputs different from those seen during training is an important requirement for the safe deployment of deep learning systems. While simple to state, this has been a particularly challenging problem in deep learning, where models often end up making overconfident predictions in such situations. In this work we present a simple, but highly effective approach to deal with out-of-distribution detection that uses the principle of abstention: when encountering a sample from an unseen class, the desired behavior is to abstain from predicting. Our approach uses a network with an extra abstention class and is trained on a dataset that is augmented with an uncurated set that consists of a large number of out-of-distribution (OoD) samples that are assigned the label of the abstention class; the model is then trained to learn an effective discriminator between in and out-of-distribution samples. We compare this relatively simple approach against a wide variety of more complex methods that have been proposed both for out-of-distribution detection as well as uncertainty modeling in deep learning, and empirically demonstrate its effectiveness on a wide variety of of benchmarks and deep architectures for image recognition and text classification, often outperforming existing approaches by significant margins. Given the simplicity and effectiveness of this method, we propose that this approach be used as a new additional baseline for future work in this domain.

1. INTRODUCTION AND RELATED WORK

Most of supervised machine learning has been developed with the assumption that the distribution of classes seen at train and test time are the same. However, the real-world is unpredictable and open-ended, and making machine learning systems robust to the presence of unknown categories and out-of-distribution samples has become increasingly essential for their safe deployment. While refraining from predicting when uncertain should be intuitively obvious to humans, the peculiarities of DNNs makes them overconfident to unknown inputs Nguyen et al. (2015) and makes this a challenging problem to solve in deep learning. A very active sub-field of deep learning, known as out-of-distribution (OoD) detection, has emerged in recent years that attempts to impart to deep neural networks the quality of "knowing when it doesn't know". The most straight-forward approach in this regard is based on using the DNNs output as a proxy for predictive confidence. For example, a simple baseline for detecting OoD samples using thresholded softmax scores was presented in Hendrycks & Gimpel (2016) . where the authors provided empirical evidence that for DNN classifiers, in-distribution predictions do tend to have higher winning scores than OoD samples, thus empirically justifying the use of softmax thresholding as a useful baseline. However this approach is vulnerable to the pathologies discussed in Nguyen et al. (2015) . Subsequently, increasingly sophisticated methods have been developed to attack the OoD problem. Liang et al. ( 2018) introduced a detection technique that involves perturbing the inputs in the direction of increasing the confidence of the network's predictions on a given input, based on the observation that the magnitude of gradients on in-distribution data tend to be larger than for OoD data. The method proposed in Lee et al. ( 2018) also involves input perturbation, but confidence in this case was measured by the Mahalanobis distance score using the computed mean and covariance of the pre-softmax scores. A drawback of such methods, however, is that it introduces a number of

