LEARN TO KNOW UNKNOWNS: A BIONIC MEMORY NETWORK FOR UNSUPERVISED ANOMALY DETECTION

Abstract

Is generalization always beneficial? Over-strong generalization induces the model insensitive to anomalies. Unsupervised anomaly detection requires only unlabeled non-anomalous data to learn and generalize normal patterns, which results in a modest reconstruction error when reconstructing normal instances and a significant reconstruction error when reconstructing anomalies. However, over-strong generalization leads to the indistinguishable reconstruction error of normal instances and anomalies, which means that the model well reconstructs the unknown anomalies, resulting in unnoticeable reconstruction error. Inspired by the cascade structure of the hippocampus and cortex in human brain memory, we proposed a re-representation memory network called Random Forgetting Twin Memory (RFTM) to decompose the latent space and introduce a configurable reintegration mechanism to suppress overgeneralization. RFTM shows striking brain-like memory characteristics, which enables the model to know what it does not know. RFTM has the convenience of a single line of code boosting at the model level without adding any additional extra loss terms at the loss function level. RFTMbased models have achieved state-of-the-art experimental results on different public benchmarks.

1. INTRODUCTION

Anomaly detection (AD) refers to the identification of deviant samples based on known rules, expectations, or distributions. It has been extensively studied in many fields requiring attention to rare events, such as healthcare ( Šabić et al., 2021; Arabahmadi et al., 2022 ), financial fraud(Hilal et al., 2021; Sanober et al., 2021) , and defect detection (Fu et al., 2022; Cui et al., 2022) . The Machine learning-based approach is gradually adopted to deal with AD tasks with more and more impressive results. Supervised and unsupervised methods are two branches of machine learning-based AD (Omar et al., 2013) . The former requires labeled data for model training, while the latter does not. Due to the occurrence of anomalies being a small probability event, the supervised approach is restricted by the extremely imbalanced samples, which means there are not sufficient labeled anomaly samples for the model to learn. In contrast, the unsupervised approach does not require anomaly samples. Unsupervised anomaly detection (UAD) only needs to fit the unlabeled normal samples to learn the normal patterns, so it can identify the anomaly when the instance deviates from the normal patterns. Therefore, UAD has attracted extensive research interest due to its label-free characteristic. Reconstruction-based frameworks have achieved excellent performance in image (Li et al., 2021; Schneider et al., 2022 ), video(Deepak et al., 2021; Chang et al., 2022) , and time series (Thill et al., 2021; Kieu et al., 2022) AD tasks in recent years as a classic UAD paradigm. But at the same time, some studies (Gong et al., 2019; Park et al., 2020) show that reconstruction-based AD has an overgeneralization problem (OGP). Generalization is not always a good thing. Over-strong generalization in UAD leads to the failure of the reconstruction model to detect anomalies. The OGP was first observed in Figure 1 of paper Zong et al. (2018) . Gong et al. (2019) first briefly demonstrated and reported this problem in a one-class classification (OC) case. We give the first formal description of OGP in a broad sense in the following and demonstrate it in a scenario that is not limited to OC. The framework of reconstructionbased AD is shown in Figure 1b . The model is first trained by minimizing the reconstruction error of normal samples in the training phase, and then the trained model measures the anomaly degree

