DEEP BATCH ACTIVE ANOMALY DETECTION WITH DIVERSE QUERIES

Abstract

Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of conditions under which the ranking of anomaly scores generalizes from labeled queries to unlabeled data. Inspired by these conditions, we propose a new querying strategy for batch active anomaly detection that leads to systematic improvements over current approaches. It selects a diverse set of data points for labeling, achieving high data coverage with a limited budget. These labeled data points provide weak supervision to the unsupervised anomaly detection problem. However, correctly identifying anomalies in the contaminated training data requires an estimate of the contamination ratio. We show how this anomaly rate can be estimated from the query set by importance-weighting, removing the associated bias due to the non-uniform sampling procedure. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art active anomaly detection performance.

1. INTRODUCTION

Detecting anomalies in data is a fundamental task in machine learning with applications in various domains, from industrial fault detection to medical diagnosis. The main idea is to train a model (such as a neural network) on a data set of "normal" samples to minimize the loss of an auxiliary (e.g., self-supervised) task. Using the loss function to score test data, one hopes to obtain low scores for normal data and high scores for anomalies (Ruff et al., 2021) . Oftentimes, the training data is contaminated with unlabeled anomalies, and many approaches either hope that training will be dominated by the normal samples (inlier priority, Wang et al. ( 2019)) or try to detect and exploit anomalies in the training data (e.g., Qiu et al. (2022a) ). In some set-ups, expert feedback is available to check if individual samples are normal or should be considered anomalies. These labels are usually expensive to obtain but are very valuable to guide an anomaly detector during training. For example, in a medical setting, one may ask a medical doctor to confirm whether a given image shows normal or abnormal cellular tissue. Other application areas include detecting network intrusions or machine failures. As expert feedback is typically expensive, it is essential to find effective strategies for querying informative data points. Previous work on active anomaly detection primarily involves domain-specific applications and/or ad hoc architectures, making it hard to disentangle modeling choices from querying strategies (Trittenbach et al., 2021) . This paper aims to disentangle different factors that affect detection accurary. We theoretically analyze generalization performance under various querying strategies and find that diversified sampling systematically improve over existing popular querying strategies, such as querying data based on their predicted anomaly score or around the decision boundaries. Based on these findings, we propose active latent outlier exposure (ALOE): a state-of-the-art active learning strategy compatible with many unsupervised and self-supervised losses for anomaly detection (Ruff et al., 2021; Qiu et al., 2022a) . ALOE draws information from both queried and unqueried parts of the data based on two equally-weighted losses. Its sole hyperparameter-the assumed anomaly rate-can be efficiently estimated based on an importance sampling estimate. We show on a multitude of data sets (images, tabular data, and video) that ALOE leads to a new state of the art. In summary, our main contributions are as follows: 1. We prove that the ranking of anomaly scores generalizes from labeled queries to unlabeled data under certain conditions that characterize how well the queries cover the data. Based on this theory, we propose a diverse querying strategy for batch active anomaly detection. 2. We propose ALOE, a new active learning framework compatible with a large number of deep anomaly detection losses. It trains on both the labeled queries and the unlabeled data and we characterize in Thm. 1 how the performance on the queried samples generalizes to the unlabeled training data. We also show how all major hyperparameters in ALOE can be eliminated, making the approach easy to use. To this end, we provide an importance-sampling-based estimate for the rate of anomalies in the data. 3. We provide an extensive benchmark for deep active anomaly detection. Our experiments on image, tabular, and video data provide evidence that ALOE with diverse querying, outperforms existing methods significantly. Comprehensive ablations disentangle the benefits of each component. Our paper is structured as follows. Section 2 discusses related work in deep active anomaly detection. Section 3 introduces our main algorithm. Section 4 discusses experimental result on image, video, and tabular data. Finally, we conclude this work in Section 5. -Yaniv, 2018; Hendrycks et al., 2019; Bergman and Hoshen, 2020; Qiu et al., 2021; Shenkar and Wolf, 2022; Qiu et al., 2022b) . Our work resides in the self-supervised anomaly detection category and can be extended to other data modalities if an appropriate loss is provided.

2. RELATED WORK

While all these methods assume that the training data consists of only normal samples, in many practical applications, the training pool may be contaminated with unidentified anomalies (Vilhjálmsson and Nordborg, 2013; Steinhardt et al., 2017) . This can be problematic because the detection accuracy typically deteriorates when the contamination ratio increases (Wang et al., 2019) . Addressing this, refinement (Zhou and Paffenroth, 2017; Yoon et al., 2021) et al. (2020) propose querying strategies based on reinforcement learning, which requires additional labeled datasets. In contrast, our work does not require a labeled dataset to start active learning. Our work is also not comparable to Pelleg and Moore (2004) and Ghasemi et al. (2011) , who fit a density model on the raw input data, which is known to be problematic for high-dimensional data (Nalisnick et al., 2018) . Other querying strategies from the papers discussed above are fairly general and can be applied in combination with various backbone models. In this paper we study these methods in the



attempts to cleanse the training pool by removing anomalous samples therein, although they may provide valuable training signals. As a remedy, Qiu et al. (2022a) propose to jointly infer binary labels to each datum (normal vs. anomalous) while updating the model parameters of a deep anomaly detector based on outlier exposure. Our work also makes the contaminated data assumption and employs the training signal of abnormal data. Active Anomaly Detection. Active learning for anomaly detection was pioneered in Pelleg and Moore (2004). Most research in active anomaly detection is on shallow detectors. Many works query samples that are close to the decision boundary of a one-class SVM (Görnitz et al., 2013; Yin et al., 2018) or a density model (Ghasemi et al., 2011). Siddiqui et al. (2018); Das et al. (2016) propose to query the most anomalous instance, while Das et al. (2019) employ a tree-based ensemble to query both anomalous and diverse samples. A recent survey compares various aforementioned query strategies applied to one-class classifiers (Trittenbach et al., 2021). Recently, deep active anomaly detection has received a lot of attention. Pimentel et al. (2020) query samples with the top anomaly scores for autoencoder-based methods, while Ning et al. (2022) improve the querying by taking diversity into consideration. Tang et al. (2020) use an ensemble of deep anomaly detectors and query the most likely anomalies for each detector separately. Russo et al. (2020) query samples where the model is uncertain about the predictions. Pang et al. (2021) and Zha

