PERFECT DENSITY MODELS CANNOT GUARANTEE ANOMALY DETECTION

Abstract

Thanks to the tractability of their likelihood, some deep generative models show promise for seemingly straightforward but important applications like anomaly detection, uncertainty estimation, and active learning. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. In this paper, we take a closer look at the behavior of distribution densities and show that these quantities carry less meaningful information than previously thought, beyond estimation issues or the curse of dimensionality. We conclude that the use of these likelihoods for out-of-distribution detection relies on strong and implicit hypotheses, and highlight the necessity of explicitly formulating these assumptions for reliable anomaly detection.

1. INTRODUCTION

Several machine learning methods aim at extrapolating a behavior observed on training data in order to produce predictions on new observations. But every so often, such extrapolation can result in wrong outputs, especially on points that we would consider infrequent with respect to the training distribution. Faced with unusual situations, whether adversarial (Szegedy et al., 2013; Carlini & Wagner, 2017) or just rare (Hendrycks & Dietterich, 2019) , a desirable behavior from a machine learning system would be to flag these outliers so that the user can assess if the result is reliable and gather more information if need be (Zhao & Tresp, 2019; Fu et al., 2017) . This can be critical for applications like medical decision making (Lee et al., 2018) or autonomous vehicle navigation (Filos et al., 2020) , where such outliers are ubiquitous. What are the situations that are deemed unusual? Defining these anomalies (Hodge & Austin, 2004; Pimentel et al., 2014) manually can be laborious if not impossible, and so generally applicable, automated methods are preferable. In that regard, the framework of probabilistic reasoning has been an appealing formalism because a natural candidate for outliers are situations that are improbable or out-of-distribution. Since the true probability distribution density p * X of the data is often not provided, one would instead use an estimator, p (θ) X , from this data to assess the regularity of a point. Density estimation has been a particularly challenging task on high-dimensional problems. However, recent advances in deep probabilistic models, including variational auto-encoders (Kingma & Welling, 2014; Rezende et al., 2014; Vahdat & Kautz, 2020) , deep autoregressive models (Uria et al., 2014; van den Oord et al., 2016b; a) , and flow-based generative models (Dinh et al., 2014; 2016; Kingma & Dhariwal, 2018) , have shown promise for density estimation, which has the potential to enable accurate density-based methods (Bishop, 1994) for anomaly detection. Yet, several works have observed that a significant gap persists between the potential of density-based anomaly detection and empirical results. For instance, Choi et al. (2018 ), Nalisnick et al. (2018 ), and Hendrycks et al. (2018) noticed that generative models trained on a benchmark dataset (e.g., CIFAR-10, Krizhevsky et al., 2009) and tested on another (e.g., SVHN, Netzer et al., 2011) are not able to identify the latter as out-of-distribution with current methods. Different hypotheses have been formulated to explain that discrepancy, ranging from the curse of dimensionality (Nalisnick et al., 2019) to a significant mismatch between p (θ) X and p * X (Choi et al., 2018; Fetaya et al., 2020; Kirichenko et al., 2020; Zhang et al., 2020) . In this work, we propose a new perspective on this discrepancy and challenge the expectation that density estimation should enable anomaly detection. We show that the aforementioned discrepancy

