INTERPRETABLE OUT-OF-DISTRIBUTION DETECTION USING PATTERN IDENTIFICATION

Abstract

Out-of-distribution (OoD) detection for data-based programs is a goal of paramount importance. Common approaches in the literature tend to train detectors requiring inside-of-distribution (in-distribution, or IoD) and OoD validation samples, and/or implement confidence metrics that are often abstract and therefore difficult to interpret. In this work, we propose to use existing work from the field of explainable AI, namely the PARTICUL pattern identification algorithm, in order to build more interpretable and robust OoD detectors for visual classifiers. Crucially, this approach does not require to retrain the classifier and is tuned directly to the IoD dataset, making it applicable to domains where OoD does not have a clear definition. Moreover, pattern identification allows us to provide images from the IoD dataset as reference points to better explain the confidence scores. We demonstrates that the detection capabilities of this approach are on par with existing methods through an extensive benchmark across four datasets and two definitions of OoD. In particular, we introduce a new benchmark based on perturbations of the IoD dataset which provides a known and quantifiable evaluation of the discrepancy between the IoD and OoD datasets that serves as a reference value for the comparison between various OoD detection methods. Our experiments show that the robustness of all metrics under test does not solely depend on the nature of the IoD dataset or the OoD definition, but also on the architecture of the classifier, which stresses the need for thorough experimentations for future work on OoD detection.

1. INTRODUCTION

A fundamental aspect of software safety is arguably the modelling of its expected operational domain through a formal or semi-formal specification, giving clear boundaries on when it is sensible to deploy the program, and when it is not. It is however difficult to define such boundaries for machine learning programs, especially for visual classifiers based on artificial neural networks (ANN), which are the subject of this paper. Indeed, such programs process high-dimensional data (images, videos) and are the result of a complex optimization procedure, but they do not embed clear failure modes that could get trigged in the case of an unknown distribution, with potentially dire consequences in critical applications. Although it is difficult to characterize an operational distribution, one could still measure its dissimilarity with other distributions. In this context, Out-of-Distribution (OoD) detection -which aims to detect whether an input of an ANN is Inside-of-distribution (IoD) or outside of it -serves several purposes. It helps characterize the extent to which the ANN can operate outside a bounded dataset (which is important due to the incompleteness of the training set w.r.t. the operational domain). It also constitutes a surrogate measure of the generalization abilities of the ANN. Finally, OoD detection can help assess when an input is too far away from the operational domain, which prevents misuses of the program and increases its safety.

2. RELATED WORK AND CONTRIBUTION

Out-of-distribution detection The maximum class probability (MCP) obtained after softmax normalization of the classifier logits already constitutes a good baseline for OoD detection Hendrycks & Gimpel (2017) . However, neural networks tend to be overconfident on their predictions Szegedy et al. Gal & Ghahramani (2016) . While efficient, these approaches may prove costly or impractical in an industrial context where a lot of resources might have already been dedicated to obtain an accurate model for the task at hand. Moreover, we make a distinction between methods that require a validation set composed of OoD samples for the calibration of hyper-parameters (OoD-specific), and methods that do not require such validation set and are therefore "OoD-agnostic" (Liu et al. ( 2020)). OoD-specific methods. ODIN Liang et al. ( 2018) extends the effect of temperature scaling with the use of small adversarial perturbations, applied on the input sample, that aim at increasing the maximum softmax score. OoD detection is performed by measuring the gain in softmax score after calibrating the temperature value and the perturbation intensity on a validation set, so that perturbations lead to a greater margin for IoD data than for OoD data. Other approaches also attempt to capture the "normal" behaviour of the different layers of the classifier: Huang et al. ( 2021 2019) (ABC) does not requires access to IoD data and equates the confidence of the network to its local stability by sampling the neighbourhood of a given input and measuring stability through attribution methods. Although this last method relies on fewer prerequisites, it is computationally expensive (e.g., gradient computation) and may not be suited to runtime constraints. As noted by Hendrycks et al. (2019) , OoD-specific and OoD-agnostic methods are usually "not directly comparable" due to their different prerequisites. Therefore, in this work, we mainly compare OoD-agnostic methods, using FSSD Huang et al. (2021) only as a reference measure to illustrate possible detection gaps between OoD-agnostic and OoD-specific methods. We also exclude ABC Jha et al. ( 2019) from our experiments due to its computational cost. It is also important to note that all the methods presented above are evaluated on different datasets and definitions of OoD (e.g., different datasets, distribution shifts), which only gives a partial picture on their robustness Tajwar et al. (2021) . Although works such as Open-OoD Yang et al. ( 2022) -which aims at standardizing the evaluation of OoD detection, anomaly detection and open-set recognition into a unified benchmark -are invaluable for the community, most datasets commonly in use (MNIST Deng (2012) , CIFAR-10/100 Krizhevsky ( 2009)) contain images with low resolution that may not reflect the detection capabilities of the methods under test in more realistic operational settings. Moreover, when evaluating the ability of an OoD detection method to discriminate between IoD and OoD datasets, it is often difficult to properly quantify the discrepancy between these two datasets, independently from the method under test, and therefore to exhibit a "ground truth" value of what this margin should be. Therefore, in this paper we propose a new type of OoD benchmark based on perturbations of the IoD dataset which aims at measuring the correlation between the OoD detection score of a given method on the perturbed dataset (OoD) and the intensity of the perturbation, under the hypothesis that the intensity of the perturbation can serve as a ground-truth measure of the discrepancy between the IoD and the OoD dataset.



2014); Lee et al. (2018a); Hein et al. (2019), even when they are wrong, which may result in false claims of confidence. Hence, the development of enhancements like temperature scaling Liang et al. (2017), ensemble learning Nguyen et al. (2020) or True-Class Probability learning Corbiere et al. (2021) (assuming such information is known). Other types of confidence measures have also been developed, with various operational settings. Note that in this work, we focus on methods that can apply to pre-trained classifiers. Therefore, we exclude methods such as Lee et al. (2018a); Hein et al. (2019); Hendrycks et al. (2019) -which integrate the learning of the confidence measure within the training objective of the model -or specific architectures from the field of Bayesian Deep-Learning that aim at capturing uncertainty by design

) states that the latent representations of OoD samples through a CNN classifier are clustered around a point called the feature-space singularity (FSS) which serves as a reference point for OoD detection. Similarly, Lee et al. (2018b) proposes a confidence score based on the Mahalanobis distance between a new sample and class conditional Gaussian distributions inferred from the training set. Both approaches operate upon multiple layers of the network and require the use of a set of OoD samples for calibrating the relative importance of each layer in the final confidence score. OoD-agnostic methods. Liu et al. (2020) proposes a framework based on energy scores (using in practice the denominator of the softmax normalization function) which can be used either during inference on a pre-trained model or to fine-tuned the model for more discriminative properties. The Fractional Neuron Region Distance Hond et al. (2021) (FNRD) computes the range of activations for each neuron on the training set, then provides a score describing how many neurons are activated outside their boundaries for a given input. Finally, Attribution-Based Confidence Jha et al. (

