RESTRICTED GENERATIVE PROJECTION FOR ONE-CLASS CLASSIFICATION AND ANOMALY DETECTION Anonymous

Abstract

We present a novel framework for one-class classification and anomaly detection. The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known distribution that is supposed to be different from the transformed distribution of unknown abnormal data. Crucially, the target distribution of training data should be sufficiently simple, compact, and informative. The simplicity is to ensure that we can sample from the distribution easily, the compactness is to ensure that the decision boundary between normal data and abnormal data is clear and reliable, and the informativeness is to ensure that the transformed data preserve the important information of the original data. Therefore, we propose to use truncated Gaussian, uniform in hyperball, uniform on hypersphere, or uniform between hyperspheres, as the target distribution. We then minimize the distance between the transformed data distribution and the target distribution while keeping the reconstruction error for the original data small enough. Our model is simple and easy to train especially compared with those based on generative models. Comparative studies on a few benchmark datasets verify the effectiveness of our method in comparison to baselines.

1. INTRODUCTION

Anomaly detection (AD) aims to distinguish normal data and abnormal data using a model trained on only normal data without using any information of abnormal data (Chandola et al., 2009; Pang et al., 2021; Ruff et al., 2021) . AD is useful in numerous real problems such as intrusion detection for video surveillance, fraud detection in finance, and fault detection for sensors. Many AD methods have been proposed in the past decades (Schölkopf et al., 1999; 2001; Tax & Duin, 2004; Liu et al., 2008) . For instance, Schölkopf et al. (2001) proposed the one-class support vector machine (OC-SVM) that finds, in a high-dimensional kernel feature space, a hyperplane yielding a large distance between the normal training data and the origin. Tax & Duin (2004) presented the support vector data description (SVDD), which obtains a spherically shaped boundary (with minimum volume) around the normal training data to identify abnormal samples. There are also many deep learning based AD methods (Erfani et al., 2016; Ruff et al., 2018; Golan & El-Yaniv, 2018; Hendrycks et al., 2018; Abati et al., 2019; Pidhorskyi et al., 2018; Zong et al., 2018; Wang et al., 2019; Liznerski et al., 2020; Qiu et al., 2021; Raghuram et al., 2021; Wang et al., 2021) . Deep learning based AD methods may be organized into three categories. The first category is based on compression and reconstruction. These methods usually use autoencoder (Hinton & Salakhutdinov, 2006; Kingma & Welling, 2013) to learn a low-dimensional representation to reconstruct the high-dimensional data (Vincent et al., 2008; Wang et al., 2021) . It is expected that the learned autoencoder on the normal training data has a much higher reconstruction error on unknown abnormal data than on normal data. The second category is based on the combination of classical one-class classification (Tax & Duin, 2004; Golan & El-Yaniv, 2018) and deep learning (Ruff et al., 2018; 2019; 2020; Perera & Patel, 2019; Bhattacharya et al., 2021; Shenkar & Wolf, 2022; Chen et al., 2022) . For instance, Ruff et al. ( 2018) proposed a method called deep one-class SVDD. The main idea is to use deep learning to construct a minimum-radius hypersphere to include all the training data, while the unknown abnormal data are expected to fall outside. The last category is based on generative learning or adversarial learning (Malhotra et al., 2016; Deecke et al., 2018; Pidhorskyi et al., 2018; Nguyen et al., 2019; Perera et al., 2019; Goyal et al., 2020; Raghuram et al., 2021; Yan et al., 2021) . For example, Perera et al. (2019) proposed to use the generative adversarial net-work (GAN) (Goodfellow et al., 2014) with constrained latent representation to detect anomalies for image data. Goyal et al. ( 2020) presented a method called deep robust one-class classification (DROCC). The method aims to find a low-dimensional manifold to accommodate the normal data via an adversarial optimization approach. Although deep learning AD methods have shown promising performance on various datasets, they still have limitations. For instance, the one-class classification methods such as Deep SVDD (Ruff et al., 2018) only ensure that the normal data could be included by a hypersphere but cannot guarantee that the normal data are distributed evenly in the hypersphere, which may lead to large empty regions in the hypersphere and hence yield incorrect decision boundary. The adversarial learning methods such as (Nguyen et al., 2019; Perera et al., 2019; Goyal et al., 2020) may suffer from high computational cost and instability in optimization. In this work, we present a restricted generative projection framework for one-class classification and anomaly detection. The model of the framework is efficient to train and able to provide reliable decision boundaries for precise anomaly detection. Our main idea is to train a deep neural network to convert the distribution of normal training data to a target distribution that is simple, compact, and informative, which will provide a reliable decision boundary to identify abnormal data from normal data. There are many choices for the target distribution, such as truncated Gaussian and uniform on hypersphere. Our contributions are three-fold. • We present a novel framework for one-class classification and anomaly detection. It aims to transform the data distribution to some target distributions that are easy to be violated by unknown abnormal data. • We present a few simple, compact, and informative target distributions and propose to minimize the distances between the converted data distribution and these target distributions via minimizing the maximum mean discrepancy. • We conduct extensive experiments to compare the performance of different target distributions and compare our method with state-of-the-art competitors. The numerical results on five benchmark datasets verify the effectiveness of our methods.

2. METHODOLOGY

Suppose we have a set of m-dimensional training data X = {x 1 , x 2 , . . . , x n } drawn from an unknown bounded distribution D x and any samples drawn from D x are normal data. We want to train a model M on X to determine whether a test data x new is drawn from D x or not. One may consider to estimate the density function (denoted by p x ) of D x using some techniques such as kernel density estimation (Rosenblatt, 1956) . Suppose the estimation px is good enough, then one can determine whether x new is normal or not according to the value of px (x new ): if px (x new ) is zero or close to zero, x new is an abnormal data point; otherwise, x new is a normal data pointfoot_0 . However, the dimensionality of the data is often high and hence it is very difficult to obtain a good estimation px . We propose to learn a mapping T : R m → R d to transform the unknown bounded distribution D x to a known distribution D z while there still exists a mapping T ′ : R d → R m that can recover D x from D z approximately. Let p z be the density function of D z . Then we can determine whether x new is normal or not according to the value of p z (T (x new )). To be more precisely, we want to solve the following problem minimize T , T ′ M (T (D x ), D z ) + λM (T ′ (T (D x )), D x ) , where M(•, •) denotes some distance metric between two distributions and λ is a trade-off parameter for the two terms. Note that if λ = 0, T may convert any distribution to D z and lose the ability of distinguishing normal data and abnormal data. Based on the universal approximation theorems (Pinkus, 1999; Lu et al., 2017) and substantial success of neural networks, we use deep neural networks (DNN) to model T and T ′ respectively. Let f θ and g ϕ be two DNNs with parameters θ and ϕ respectively. We solve the following problem



Here we assume that the distributions of normal data and abnormal data do not overlap. Otherwise, it is difficult to determine whether a single point is normal or not.

