LEARN TO KNOW UNKNOWNS: A BIONIC MEMORY NETWORK FOR UNSUPERVISED ANOMALY DETECTION

Abstract

Is generalization always beneficial? Over-strong generalization induces the model insensitive to anomalies. Unsupervised anomaly detection requires only unlabeled non-anomalous data to learn and generalize normal patterns, which results in a modest reconstruction error when reconstructing normal instances and a significant reconstruction error when reconstructing anomalies. However, over-strong generalization leads to the indistinguishable reconstruction error of normal instances and anomalies, which means that the model well reconstructs the unknown anomalies, resulting in unnoticeable reconstruction error. Inspired by the cascade structure of the hippocampus and cortex in human brain memory, we proposed a re-representation memory network called Random Forgetting Twin Memory (RFTM) to decompose the latent space and introduce a configurable reintegration mechanism to suppress overgeneralization. RFTM shows striking brain-like memory characteristics, which enables the model to know what it does not know. RFTM has the convenience of a single line of code boosting at the model level without adding any additional extra loss terms at the loss function level. RFTMbased models have achieved state-of-the-art experimental results on different public benchmarks.

1. INTRODUCTION

Anomaly detection (AD) refers to the identification of deviant samples based on known rules, expectations, or distributions. It has been extensively studied in many fields requiring attention to rare events, such as healthcare ( Šabić et al., 2021; Arabahmadi et al., 2022) , financial fraud (Hilal et al., 2021; Sanober et al., 2021) , and defect detection (Fu et al., 2022; Cui et al., 2022) . The Machine learning-based approach is gradually adopted to deal with AD tasks with more and more impressive results. Supervised and unsupervised methods are two branches of machine learning-based AD (Omar et al., 2013) . The former requires labeled data for model training, while the latter does not. Due to the occurrence of anomalies being a small probability event, the supervised approach is restricted by the extremely imbalanced samples, which means there are not sufficient labeled anomaly samples for the model to learn. In contrast, the unsupervised approach does not require anomaly samples. Unsupervised anomaly detection (UAD) only needs to fit the unlabeled normal samples to learn the normal patterns, so it can identify the anomaly when the instance deviates from the normal patterns. Therefore, UAD has attracted extensive research interest due to its label-free characteristic. Reconstruction-based frameworks have achieved excellent performance in image (Li et al., 2021; Schneider et al., 2022 ), video(Deepak et al., 2021; Chang et al., 2022) , and time series (Thill et al., 2021; Kieu et al., 2022) AD tasks in recent years as a classic UAD paradigm. But at the same time, some studies (Gong et al., 2019; Park et al., 2020) show that reconstruction-based AD has an overgeneralization problem (OGP). Generalization is not always a good thing. Over-strong generalization in UAD leads to the failure of the reconstruction model to detect anomalies. The OGP was first observed in Figure 1 Figure 1 : OGP is demonstrated in the inference stage by using reconstruction models with the different number of bottleneck neurons trained on MNIST. The increasing number of bottleneck neurons from 2 to 512 stands for the increasing complexity of the model. The images from MNIST are taken as normal, and the input from Fashion-MNIST is taken as anomalous. It can be seen that the reconstruction error gently decreases with the increase of model complexity for the normal class handwritten digit 5. The reconstruction error decreases rapidly with the increase of model complexity for the trousers image from Fashion-MNIST. The model trained on MNIST could reconstruct the data from Fashion-MNIST well when the complexity increased to a certain extent. The deviation decreases with the increase of model complexity, which leads to the anomalies undetectable. of unseen instances by the reconstruction error in the inference phase. The reconstruction error directly reflects the deviation between the unknown samples and the learned normal patterns. The larger the reconstruction error is, the higher the corresponding anomaly degree is. However, this framework has an implicit ideal assumption (IIA) that the model trained on the normal samples cannot reconstruct the anomaly samples well. The model generalization is enhanced with the increase of model complexity which leads to the input data indiscriminately being reconstructed too well as shown in Figure 1a . The input data containing both normal and anomaly samples will output the reconstruction error with an indistinguishable difference in the inference phase due to the excessive generalization of the model, resulting in the collapse of IIA as shown in Figure 1c . The OGP is particularly prominent and challenging in unsupervised semantic anomaly detection (USAD). The example shown in Figure 1a is a routine setting, that is, using one dataset as normal and the other as anomalous. Different from the routine setting, the motivation proposed by Ahmed & Courville (2020) emphasizes that anomalies should be at the semantic level, which recommends that hold out one class as anomalous and the rest classes as normal within the same dataset. The main challenges of USAD are as follows. First, the labels of normal classes are not accessible, which makes it impossible to train the classifier with supervised labels to obtain tight bounds that can describe normal patterns. Second, the number of classes of the normal pattern is unknown, which means that the OGP is more likely to occur in this multimodal case because more unlabeled classes of data lead to better generalization. Third, the anomalies are at the semantic level and invisible, which means that the OGP can easily occur if the bounds on the normal pattern are not tight enough. Finally, the OGP can be alleviated by reducing the complexity of the reconstruction model from the perspective of its origin. However, there may be no model with optimal complexity that can both well generalize normal samples and fail to overgeneralize anomaly samples to satisfy IIA. Besides, multiple training searches are computationally intolerable even if the optimal solution exists. In recent years, a part of the work has started to pay attention and try to solve the above problems. MemAE(Gong et al., 2019) proposed a memory module that stores a certain amount of prototype of latent information and re-represents the latent space with a sparse attention mechanism. MNAD(Park et al., 2020) introduces a transformation matrix to remap the latent space, using



of paperZong  et al. (2018).Gong et al. (2019)  first briefly demonstrated and reported this problem in a one-class classification (OC) case. We give the first formal description of OGP in a broad sense in the following and demonstrate it in a scenario that is not limited to OC. The framework of reconstructionbased AD is shown in Figure1b. The model is first trained by minimizing the reconstruction error of normal samples in the training phase, and then the trained model measures the anomaly degree (a) The visual illustration of OGP in reconstruction. (b) The framework of AD. (c) The collapse of IIA.

