ITERATIVE IMAGE INPAINTING WITH STRUCTURAL SIMILARITY MASK FOR ANOMALY DETECTION Anonymous

Abstract

Autoencoders have emerged as popular methods for unsupervised anomaly detection. Autoencoders trained on the normal data are expected to reconstruct only the normal features, allowing anomaly detection by thresholding reconstruction errors. However, in practice, autoencoders fail to model small detail and yield blurry reconstructions, which makes anomaly detection challenging. Moreover, there is objective mismatching that models are trained to minimize total reconstruction errors while expecting a small deviation on normal pixels and a large deviation on anomalous pixels. To tackle these two issues, we propose the iterative image inpainting method that reconstructs partial regions in an adaptive inpainting mask matrix. This method constructs inpainting masks from the anomaly score of structural similarity. Overlaying inpainting mask on images, each pixel is bypassed or reconstructed based on the anomaly score, enhancing reconstruction quality. The iterative update of inpainted images and masks by turns purifies the anomaly score directly and follows the expected objective at test time. We evaluated the proposed method using the MVTec Anomaly Detection dataset. Our method outperformed previous state-of-the-art in several categories and showed remarkable improvement in high-frequency textures.

1. INTRODUCTION

Anomaly detection (AD) is the identification task of the rarely happened events or items that differ from the majority of the data. In the real world, there are many applications, such as the medial diagnosis (Baur et al., 2018; Zimmerer et al., 2019a) , defect detection in the factories (Matsubara et al., 2018; Bergmann et al., 2019) , early detection of plant disease (Wang et al., 2019) , and X-Ray security detection in public space (Griffin et al., 2018) . Because manual inspection by humans is slow, expensive, and error-prone, automating visual inspection is the popular application of artificial intelligence. In transferring knowledge from humans to machines, there is a lack of anomalous samples due to their low event rate and difficulty annotating and categorizing various anomalous defects beforehand. Therefore, AD methods typically take unsupervised approaches that try to learn compact features of data from normal samples and detect anomalies by thresholding anomaly score to measure the deviation from learned features. To deal with high-dimensional images and learn their features, it is popular to use deep neural networks (Goodfellow et al., 2016) . In this work, we focus on the reconstruction-based unsupervised AD. This attempts to reconstruct only the normal dataset and classify the normal or anomalous data on thresholding reconstruction errors (An & Cho, 2015) . The architectures are based on deep neural networks such as deep autoencoders (Hinton & Salakhutdinov, 2006) , variational autoencoders (VAEs) (Kingma & Welling, 2013; Rezende et al., 2014) , or autoencoders with generative adversarial networks (GANs) (Goodfellow et al., 2014) . These models compress the high-dimensional information into the data manifold in lower-dimensional latent space by reconstructing input data under certain constraints for latent space, such as a prior distribution or an information bottleneck (Alemi et al., 2016) . The reconstruction-based AD approach issue is that autoencoders fail to model small details and yield blurry image reconstruction. This is especially the case for the high-frequency textures, such as carpet, leather, and tile (Bergmann et al., 2019) . Dehaene et al. ( 2020) also pointed out that there is no guarantee of the generalization of their behavior for out-of-samples, and local defects added to normal images could deteriorate whole images. In the viewpoint of the signal-to-noise ratio (SNR),

