FEW-SHOT ANOMALY DETECTION ON INDUSTRIAL IMAGES THROUGH CONTRASTIVE FINE-TUNING Anonymous

Abstract

Detecting abnormal products through imagery data is essential to quality control in manufacturing. Existing approaches towards anomaly detection (AD) often rely on substantial amount of anomaly-free samples to train representation and density models. Nevertheless, large anomaly-free datasets may not always be available before inference stage and this requires building an anomaly detection framework with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). We propose two techniques to address the challenges in FSAD. First, we employ a model pretrained on large source dataset to initialize model weights. To ameliorate the covariate shift between source and target domains, we adopt contrastive training on the few-shot target domain data. Second, to encourage learning representations suitable for downstream AD, we further incorporate cross-instance pairs to increase tightness within normal sample cluster and better separation between normal and synthesized negative samples. Extensive evaluations on six few-shot anomaly detection benchmarks demonstrate the effectiveness of the proposed method.

1. INTRODUCTION

Industrial defect detection is an important real-world use-case for visual anomaly detection methods. In this setting, anomaly detection models typically have to be trained with only defect-free, or normal images, as defects rarely occur on functioning production lines. Anomaly detection methods for this one-class classification setting typically assume that normal images are available in abundance, even though this may not always be the case. For example, in certain applications such as semiconductor manufacturing where image acquisition requires 3D scans using specialized equipment (Pahwa et al., 2021) , acquiring defect-free images is time-consuming and costly. Flexible manufacturing systems also require rapid adaptation to changes in the type and quantity of products to be manufactured (Shivanand, 2006) . As a result, large numbers of defect-free images may not be available for new products, or in the initial stages of bootstrapping a visual inspection system. Although anomaly detection in general is a well-studied topic (Chandola et al., 2009; Pang et al., 2021b) , anomaly detection on images with only few normal and no abnormal images, or fewshot anomaly detection (FSAD), has only recently begun to receive attention from the community (Sheynin et al., 2021; Huang et al., 2022) . In their pioneering work, Sheynin et al. (2021) developed a generative adversarial model to distinguish transformed image patches from generated ones. However, such adversarial models may be tricky to tune (Kodali et al., 2017) and the method requires multiple transformations on test samples at inference time, resulting in additional computation overhead. The more recent work of Huang et al. ( 2022) learns a common model over multiple classes of normal images using a feature registration proxy task, but their method requires a training set with normal images from multiple known classes, which is a more restrictive setting. In this work, we develop a simple yet effective method for few-shot anomaly detection. We achieve this by synergistically combining transfer learning from a pretrained model with representation learning on the few-shot normal data. Finetuning from a backbone network pretrained on a large source domain dataset, e.g. ImageNet (Russakovsky et al., 2015) , allows reusing good low-level feature extractors and better initialization of network parameters (Kornblith et al., 2019) . We believe finetuning from pretrained weights could particularly contribute to few-shot anomaly detection when not enough training data is available for training good representations. However, as pointed out by some existing work (Xu et al., 2022; Li et al., 2021b) , directly reusing the pretrained weights may not fully unleash the power of finetuning. This is probably caused by two factors. First, when the source domain data has a different data distribution from the target domain, the covariate shift (Wang & Deng, 2018) causes performance degradation. Second, due to the fact that anomaly detection requires feature representation that separates normal samples from abnormal ones. The representations learned from ImageNet pretraining tasks, mostly semantic image classification, is not necessarily optimal for anomaly detection. To ameliorate the covariate shift between source and target domain data, we first propose to introduce contrastive training to adapt pretrained model weights to the target data distribution for downstream anomaly detection. Given initial model weights, we optimize a contrastive loss defined on all available few-shot normal examples so that the pretrained low-level features will be adjusted towards the target data distribution. We further encourage learnt feature representations to be suited to the downstream anomaly detection task by encouraging normal samples to form a cluster in feature space. To achieve this, we introduce a cross-instance positive pair loss that randomly samples two normal samples and encourages their feature embeddings to be close. Note that this differs from standard contrastive training as closeness is encouraged across two different normal samples instead of a sample and its augmented version. Finally, when prior knowledge on the anomalies is available, e.g. we are able to synthesize negative examples (Li et al., 2021a) , we further introduce an additional negative pair loss to encourage better separation between normal and synthesized anomalous examples. We empirically reveal that the choice of negative sample synthesis is crucial to the success of FSAD and should be used only when concrete prior knowledge on the anomalies is available. We summarize the contribution of this work as below, • We approach anomaly detection for industrial defect inspection from a transfer learning perspective. We propose to do contrastive training on few-shot normal samples in the target domain to alleviate the distribution shift between source and target domains. • We further introduce an across instance positive pair loss to encourage normal samples to form a tight cluster in the embedding space for better density-based anomaly detection. • When prior knowledge on negative sample is available a negative pair loss is further incorporated to allow better separation between normal and synthesized negative samples. • We demonstrate superior performance on 4 real-world industrial defect identification datasets and 2 synthetic corruption identification datasets.

2. RELATED WORK

Anomaly Detection: Traditional anomaly detection (AD) methods include PCA, cluster analysis (Kim & Scott, 2012) and one-class classification (Schölkopf et al., 2001) . With the advent of deep learning, representation learning is employed to avoid manual feature engineering and kernel construction. This leads to novel anomaly detection methods based on generative adversarial networks (GAN) (Perera et al., 2019; Schlegl et al., 2017) and Autoencoders (Bergmann et al., 2019a) . Among them, anoGAN (Schlegl et al., 2017) was proposed to learn the manifold of normal samples and anomalous samples cannot be perfectly projected onto the normal manifold by the generator learned solely with normal samples. However, it requires expensive optimization for detecting abnormal samples and training GANs is prone to some well-known challenges including instability and mode collapse. Among the autoencoder based approaches, (Bergmann et al., 2019a) adopted SSIM metric as the similarity measure between input and reconstructed images. Recently an effective line of works approach AD through representation learning and formulate AD as detecting outliers in the learned representation space (Ruff et al., 2018; Golan & El-Yaniv, 2018; Sohn et al., 2021) . Among these works, deep SVDD (Ruff et al., 2018) proposed to learn a feature embedding that groups normal samples closer to a cluster center. Follow-up works develop self-supervised pretraining methods to learn representations suitable for separating abnormal samples from normal ones by optimizing a proxy task (Golan & El-Yaniv, 2018; Sohn et al., 2021; Li et al., 2021a) . Anomaly detection is then implemented through fitting a density model on the learnt representations of normal training samples. These approaches prevail in many anomaly detection benchmarks and are computationally efficient. Nevertheless, representation learning requires a substantial amount of training data which may not be readily available in certain industrial environments.

