RED PANDA: DISAMBIGUATING IMAGE ANOMALY DE-TECTION BY REMOVING NUISANCE FACTORS

Abstract

Anomaly detection methods strive to discover patterns that differ from the norm in a meaningful way. This goal is ambiguous as different human operators may find different attributes meaningful. An image differing from the norm by an attribute such as pose may be considered anomalous by some operators while others may consider the attribute irrelevant. Breaking from previous research, we present a new anomaly detection method that allows operators to exclude an attribute when detecting anomalies. Our approach aims to learn representations which do not contain information regarding such nuisance attributes. Anomaly scoring is performed using a density-based approach. Importantly, our approach does not require specifying the attributes where anomalies could appear, which is typically impossible in anomaly detection, but only attributes to ignore. An empirical investigation is presented verifying the effectiveness of our approach 1 .

1. INTRODUCTION

Anomaly detection, discovering unusual patterns in data, is a key capability for many machine learning and computer vision applications. In the typical setting, the learner is provided with training data consisting only of normal samples, and is then tasked with classifying new samples as normal or anomalous. It has emerged that the representations used to describe data are key for anomaly detection in images and videos (Reiss et al., 2021) . Advances in deep representation learning (Huh et al., 2016) have been used to significantly boost anomaly detection performance on standard benchmarks. However, these methods have not specifically addressed biases in the used data. Anomaly detection methods which suffer from the existence of such biases may produce more overall errors, and incorrectly classify as anomalies some types of samples more than others. A major source for such biases is the presence of additional, nuisance factors (Lee & Wang, 2020) . One of the most important and unsolved challenges of anomaly detection is resolving the ambiguity between relevant and nuisance attributes. As a motivating example let us consider the application of detecting unusual vehicles using road cameras. Normal samples consist of images of known vehicle types. When aiming to detect anomalies, we may encounter two kinds of difficulties: (i) The distribution of unknown vehicles (anomalies) is not known at training time. E.g., unexpected traffic may come in many forms: a horse cart, heavy construction equipment, or even wild animals. This is the standard problem addressed by most anomaly detection methods (Ruff et al., 2018; Reiss et al., 2021; Tack et al., 2020) . (ii) The normal data may be biased. For example, assume all agricultural machinery appearing during the collection of normal data was moved towards the farmlands. During inference performed on another season, we may see the same equipment moving to the other side (and from a different angle). This novel view might be incorrectly perceived as an anomaly. Unlike previous works, we aim to disambiguate between true anomalies (e.g., unseen vehicle types) and unusual variations of nuisance attributes in normal data (e.g., a known vehicle observed previously only in another direction). Detecting normal but unusual variations according to nuisance attributes as anomalies may be a source of false positive alarms. In addition, they may introduce an undesirable imbalance in the detected anomalies, or even discriminate against certain groups. There are many settings where some attribute combinations are missing from the training dataset but are considered normal: assembly line training images may be biased in terms of lighting conditions or camera angles -while these may be irrelevant to their anomaly score; photos of people may be biased in terms of ethnicity, for example when collected in specific geographical areas. Moreover, in some cases, normal attribute combinations may be absent just due to the rarity of some attributes (e.g. rare car colors with specific car models). The task of learning to ignore nuisance attributes requires a general approach. While simple heuristics might sometimes be possible, they suffer from inherent weaknesses: (i) lack of generalization to new image types and nuisance attributes (ii) targeting a specific type of anomalies, which means they will fail to generalize to new, unexpected anomalies. While nuisance attribute removal is easy when the representation is already disentangled in nuisance and relevant components (e.g., some tabular data settings), most image representations and highly entangled. Our technical approach proposes to ignore nuisance attributes by learning representations that are independent from them. Our approach takes as input a training set of normal samples with a labeled nuisance attribute. We utilize a domain-supervised disentanglement approach (Kahana & Hoshen, 2022) to remove the information associated with the provided nuisance attribute, while preserving as much uncorrelated information as possible about the image. Specifically, we train an encoder with an additional per-domain contrastive loss term to learn a representation which is independent of the labeled nuisance attribute. For example, an encoder guided to be invariant to the viewing angle would be trained to contrast images of cars driving to the left with similar images, but not against images of cars driving to the right. Additionally, a conditional generator is trained on the representations with a reconstruction term, to ensure the representations are informative. We stress that we only use the reconstruction loss to encourage the informativeness of our encoder, and do not use the reconstruction errors to score anomalies. The combination of the two loss terms yields informative representations which are less sensitive to the nuisance attributes. Although our obtained representation is far from being completely invariant to the nuisance attributes, it provides significant gains on several benchmarks. The representations are then combined with standard density estimation methods (k nearest neighbors) for anomaly scoring. Our setting differs from previous ones, as it only relies on nuisance attribute labels. Few anomaly detection algorithms consider the case where the training set contains attribute labels and therefore most methods do not aim to ignore nuisance attributes. Out-of-distribution detection assumes that normal data are labelled with the value of the relevant attribute and that anomalies belong to a novel class, outside the set of labelled values (Salehi et al., 2021; Hendrycks et al., 2020; Hendrycks & Gimpel, 2016) . The weakly supervised setting assumes future anomalies will be similar to a few labelled anomalous samples available during training (Cozzolino et al., 2018; Gianchandani et al., 2019; Deecke et al., 2021) . However, this type of knowledge is often limiting due to the inherent unpredictability of anomalies. In contrast, we only require knowledge of the factors that are not indicative of the anomalies we wish to find -while assuming no specific knowledge of the expected anomalies. In fact, labels for attributes we wish to ignore are often provided by the datasets, such as information about the sensor used to collect the data. In other cases, such labels are easily predicted using pre-trained classifiers such as CLIP (Radford et al., 2021) . As this task is novel, we present new benchmarks and new metrics for evaluation. Our benchmarks incorporate normal examples which experience unusual variation in a nuisance attribute. Our evaluation metrics measure both the overall anomaly detection accuracy, as well as the false alarm rate due to mistaking normal samples with nuisance variation as anomalies. Our experiments indicate that using our approach for removing the dependencies on a nuisance attribute from the representation improves these metrics on our evaluation datasets. While our method can currently handle only quite simple cases, this study indicates a way forward for tackling more realistic cases. Contributions: (i) Introducing the novel setting of Negative Attribute Guided Anomaly Detection (NAGAD). (ii) Presenting new evaluation benchmarks and metrics for the NAGAD setting (iii) Proposing a new approach, REpresentation Disentanglement for Pre-trained Anomaly Detection Adaptation (Red PANDA), using domain-supervised disentanglement to address this setting. (iv) Demonstrating the potential of our approach through empirical evaluation.



The presented benchmarks are available on github under: https://github.com/NivC/RedPANDA.

