TOWARDS LIGHTWEIGHT, MODEL-AGNOSTIC AND DIVERSITY-AWARE ACTIVE ANOMALY DETECTION

Abstract

Active Anomaly Discovery (AAD) is flourishing in the anomaly detection research area, which aims to incorporate analysts' feedback into unsupervised anomaly detectors. However, existing AAD approaches usually prioritize the samples with the highest anomaly scores for user labeling, which hinders the exploration of anomalies that were initially ranked lower. Besides, most existing AAD approaches are specially tailored for a certain unsupervised detector, making it difficult to extend to other detection models. To tackle these problems, we propose a lightweight, model-agnostic and diversity-aware AAD method, named LMADA. In LMADA, we design a diversity-aware sample selector powered by Determinantal Point Process (DPP). It considers the diversity of samples in addition to their anomaly scores for feedback querying. Furthermore, we propose a model-agnostic tuner. It approximates diverse unsupervised detectors with a unified proxy model, based on which the feedback information is incorporated by a lightweight non-linear representation adjuster. Through extensive experiments on 8 public datasets, LMADA achieved 74% F1-Score improvement on average, outperforming other comparative AAD approaches. Besides, LMADA can also achieve significant performance boosting under any unsupervised detectors.

1. INTRODUCTION

Anomaly detection aims to detect the data samples that exhibit significantly different behaviors compared with the majority. It has been applied in various domains, such as fraud detection (John & Naaz, 2019) , cyber intrusion detection (Sadaf & Sultana, 2020) , medical diagnosis (Fernando et al., 2021) , and incident detection (Wang et al., 2020) . Numerous unsupervised anomaly detectors have been proposed (Zhao et al., 2019; Boukerche et al., 2020; Wang et al., 2019) . However, practitioners are usually unsatisfied with their detection accuracy (Das et al., 2016) , because there is usually a discrepancy between the detected outliers and the actual anomalies of interest to users (Das et al., 2017; Zha et al., 2020; Siddiqui et al., 2018) . To mitigate this problem, Active Anomaly Discovery (AAD) (Das et al., 2016) , is proposed to incorporate analyst's feedback into unsupervised detectors so that the detection output better matches the actual anomalies. The general workflow of Active Anomaly Discovery is shown in Fig. 1 . In the beginning, a base unsupervised anomaly detector is initially trained. After that, a small number of samples are selected to present to analysts for querying feedback. The labeled samples are then utilized to update the detector for feedback information incorporation. Based on the updated detection model, a new set of samples are recommended for the next feedback iteration. Finally, the tuned detection model is ready to be applied after multiple feedback iterations, until the labeling budget is exhausted. Despite the progress of existing AAD methods (Das et al., 2017; Zha et al., 2020; Siddiqui et al., 2018; Keller et al., 2012; Zhang et al., 2019; Li et al., 2019; Das et al., 2016) , some intrinsic limitations of these approaches still pose great barriers to their real-world applications. Firstly, most AAD methods adopt the top-selection strategy for the feedback querying (Das et al., 2017; Zha et al., 2020; Siddiqui et al., 2018; Li et al., 2019) , i.e., the samples with the highest anomaly scores are always prioritized for user labeling. However, it hinders exploring the actual anomalies that are not initially scored highly by the base detector. As such, these AAD approaches are highly susceptible to over-fitting to the top-ranked samples, resulting in a suboptimal recall with respect to all anomalies. We shall demonstrate this with a real example in Sec. 2.1. Secondly, most existing AAD approaches (Das et al., 2017; 2016; Siddiqui et al., 2018) are tightly tailored for a certain kind of detection model, making it difficult to extend to other unsupervised detectors. They need to modify the internal structure of a particular type of unsupervised detector, endowing them with the ability of feedback integration. Therefore, it is impractical and ad-hoc to re-design them each time facing such a variety of unsupervised detection models. Recent AAD methods (Zha et al., 2020; Li et al., 2019) attempted to generalize to arbitrary detectors. However, they can barely scale because their mode size grows with the number of samples.

Data

To tackle these problems in AAD, we propose a Lightweight, Model-Agnostic and Diversity-Aware active anomaly detection approach, named LMADA. It consists of two components, i.e, sample selector (for sample selection) and model tuner (for feedback incorporation). In the sample selector, we take the anomaly scores as well as the diversity of samples into account, instead of solely picking up the most anomalous ones for feedback querying. Specifically, we fuse anomaly scores and the feedback repulsion scores into a diversity-aware sampling technology powered by Determinantal Point Processes (DPP) (Chen et al., 2018; Kulesza et al., 2012) . In the model tuner, we first leverage a neural network as the proxy model to approximate an arbitrary unsupervised detector. After that, we fix the weights of the proxy model and learn a representation adjuster on top of it. The representation adjuster is responsible for transforming the input feature vector to fit the feedback-labeled samples. Finally, each sample to be detected is transformed by the representation adjuster and then fed back to the base detector to estimate its anomaly score. In this way, the model tuner shields the details of different unsupervised detectors and achieves lightweight feedback incorporation, only via a non-linear representation transformation. We conducted extensive experiments on 8 public AD datasets to evaluate the effectiveness of our proposed method. The experimental results show that LMADA can achieve 74% F1-Score improvement on average, outperforming other comparative AAD approaches under the same feedback sample budget. In addition, we also validated that LMADA works well under various unsupervised anomaly detectors.

2. RELATED WORK AND MOTIVATION

In this section, we will give a brief introduction to the existing AAD work and analyze their limitations from two aspects: (1) sample selection and (2) feedback incorporation.

2.1. SAMPLE SELECTION

Most AAD approaches (Siddiqui et al., 2018; Das et al., 2017; Zha et al., 2020; Li et al., 2019; Das et al., 2016) adopt the top-selection strategy. The anomalous samples, that are not ranked on the top initially by the base detector, would have little chance to be selected for feedback, and therefore can hardly be recalled subsequently. We show a real example using KDD-99 SAfoot_0 , which is a famous intrusion detection dataset. The dataset contains one normal class (96.7%) and 11 anomalous classes (3.3%) of various intrusion types. We applied the Isolation Forest (Liu et al., 2012) detector (a widely accepted one) to this dataset and found that the recall was around 0.28. We show the anomaly score distribution for the normal samples and three major intrusion types, respectively, in Fig. 2 . Only the samples of two intrusion types, i.e., "neptune" and "satan", are assigned high anomaly scores (0.60 ∼ 0.70). However, the samples of another major intrusion type "smurf" (accounts for 71.27% of all anomalous samples) are assigned relatively low anomaly scores (0.50 ∼ 0.55), which is even below the anomaly scores of many normal samples (4168 normal samples vs. 15 "smurf" anomalies were assigned anomaly scores over 0.55). Under this circumstance, selecting the top samples only for feedback can hardly improve the recall for the "smurf" type. In LMADA, we consider both anomaly scores as well as the diversity of samples during the sample selection. In this way, samples



https://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/kddcup.data.gz



Figure 1: The general workflow of AAD.

