OUTLIER PRESERVING DISTRIBUTION MAPPING AU-TOENCODERS

Abstract

State-of-the-art deep outlier detection methods map data into a latent space with the aim of having outliers far away from inliers in this space. Unfortunately, this is shown to often fail the divergence penalty they adopt pushes outliers into the same high-probability regions as inliers. We propose a novel method, OP-DMA, that successfully addresses the above problem. OP-DMA succeeds in mapping outliers to low probability regions in the latent space by leveraging a novel Prior-Weighted Loss (PWL) that utilizes the insight that outliers are likely to have a higher reconstruction error than inliers. Building on this insight, explicitly encourages outliers to be mapped to low-propbability regions of its latent by weighing the reconstruction error of individual points by a multivariate Gaussian probability density function evaluated at each point's latent representation. We formally prove that OP-DMA succeeds to map outliers to low-probability regions. Our experimental study demonstrates that OP-DMA consistently outperforms state-of-art methods on a rich variety of outlier detection benchmark datasets.

1. INTRODUCTION

Background. Outlier detection, the task of discovering abnormal instances in a dataset, is critical for applications from fraud detection, error measurement identification to system fault detection (Singh & Upadhyaya, 2012) . Given outliers are by definition rare, it is often infeasible to get enough labeled outlier examples that are represetnative of all the forms the outliers could take. Consequently, unsupervised outlier detection methods that do not require prior labeling of inliers or outliers are frequently adopted (Chandola et al., 2009) . State-of-Art Deep Learning Methods for Outlier Detection. Deep learning methods for outlier detection commonly utilize the reconstruction error of an autoencoder model as an outlier score for outlier detection (Sakurada & Yairi, 2014; Vu et al., 2019) . However, directly using the reconstruction error as the outlier score has a major flaw. As the learning process converges, both outliers and inliers tend to converge to the average reconstruction error (to the same outlier score) -making them indistinguishable (Beggel et al., 2019) . This is demonstrated in Figure 1a , which shows that the ratio of average reconstruction error for outliers converges to that of the inliers. To overcome this shortcoming, recent work (Beggel et al., 2019; Perera et al., 2019) utilizes the distribution-mapping capabilities of generative models that encourage data to follow a prior distribution in the latent space. These cutting-edge methods assume that while the mapping of inlier points will follow the target prior distribution, outliers will not due to their anomalous nature. Instead, outliers will be mapped to low-probability regions of the prior distribution, making it easy to detect them as outliers (Beggel et al., 2019; Perera et al., 2019) . However, this widely held assumption has been shown to not hold in practice (Perera et al., 2019) . Unfortunately, as shown in Figure 1b , both inliers and outliers are still mapped to the same high probability regions of the target prior distribution, making them difficult to distinguish. Problem Definition. Given a given dataset X ∈ R M of multivariate observations, let f : R M → R N , N ≤ M , be a function from the multivariate feature space of X to a latent space f (x) ∈ R N such that f (X) ∼ P Z , where P Z is a known and tractable prior probability density function. The dataset X ∈ R M is composed as X = X O + X I , where X O and X I are a set of outlier and inlier points, respectively. During training, it is unknown whether any given point x ∈ X is an outlier

