DETECTING OUT-OF-DISTRIBUTION DATA WITH SEMI-SUPERVISED FEATURE NETWORKS

Abstract

Anomalous and out-of-distribution (OOD) data present a significant challenge to the robustness of decisions taken by deep neural networks, with myriad real-world consequences. State-of-the-art OOD detection techniques use embeddings learned by large pre-trained transformers. We demonstrate that graph structures and topological properties can be leveraged to detect both far-OOD and near-OOD data reliably, simply by characterising each data point (image) as a network of related features (visual concepts). Furthermore, we facilitate human-in-the-loop machine learning by expressing this data to comprise high-level domain-specific concepts. We obtained 97.95% AUROC on far-OOD and 98.79% AUROC on near-OOD detection tasks based on the LSUN dataset (comparable to the performance of state-of-the-art techniques).

1. INTRODUCTION

Trustworthy machine learning systems must hand over decisions it is not confident about to human experts. Most machine learning pipelines operate on the assumption of a closed world. The test data is assumed to be drawn in an IID fashion from the same distribution as the training data. The difficulty of OOD detection relies primarily on how semantically close the outliers are to the inliers. Therefore, based on difficulty [Winkens et al. (2020) ], the OOD detection task is split into the following. 1. Near OOD refers to semantic shifts in the data, such as (SVHN and MNIST). Generally, this is a more challenging problem to solve, and the AUROC hovers around 93 per cent for state-of-the-art methods [Fort et al. (2021) ]. 2. Far OOD is a covariate shift, which is less difficult to detect. The AUROC hovers around 99 per cent in the current state of the art [Fort et al. (2021) ]. Common sense is a very an essential yet absent element of AI systems. This crucial ability to judge and understand everyday things amongst most humans is a non-trivial problem with machines [Xu et al. (2021) ]. The absence of common sense prevents intelligent systems from understanding a changing world (distribution drift), behaving reasonably in unforeseen situations (such as OOD detection), and learning quickly from new experiences (i.e. prior information). Furthermore, it is hard to learn, encode and represent this information. This shared and undefined knowledge base in humans is known from extensive exposure open domain data -such as basic physical phenomena. In this paper, we operate under the assumption that common sense can be learnt in patterns of occurrences, and this knowledge can be learnt in a domain-specific manner. Therefore, our strategy relies on creating a commonsense service that learns from experience, based on computational models that mimic child cognition towards scenes and reasoning. Intuition Graphs provide a general language for describing and analysing entities with interactions between them. We want to use the rich relational structures among visual concepts in complex domains to represent commonsense concepts. Our hypothesis is this would lead to better OOD prediction while maintaining justifications humans can understand. Contributions This work includes the following contributions 1. We propose a novel semi-supervised geometric-learning-based framework that operates on human-interpretable concepts. This relies on representing each data point (image) as a graph of visual features. 2. We demonstrate that our technique performs on par with state-of-the-art methods on near and far-OOD tasks based on the LSUN dataset. (2017) ]. On the other hand, using large pre-trained transformer networks does improve performance -but relies heavily on the assumption that the embeddings generated by them are infallible. In our case, since we do not need the relational structures between objects to be humanreadable, in the interest of reducing computational overhead -we favour object detection networks over scene graph generation networks.

Scene Graph

Commonsense knowledge graphs (CSKGs) are gaining popularity [Ilievski et al. (2021); Guan et al. (2019) ] as origins of background knowledge (domain-specific conceptual, syntactic information) that are conceptualised to help with downstream reasoning tasks such as question answering and planning. In our context, we intend to use these for OOD data detection. For this, we exploit the recent advances in geometric learning. These same methods allow graph neural networks to be used to predict molecule properties and social media conversation characteristics.

3. PROBLEM SETUP

This paper assesses the problem of differentiating between in-distribution and out-of-distribution image examples on a pre-trained neural network. Let us assume that two distributions, D in and D out , are drawn from the space X . In dataset D in of x in , y in pairs where x denotes the input feature vector, and y in ∈ Y in := {1, . . . , K} denotes the class label. Let D out denote an out-ofdistribution dataset of (x out , y out ) pairs where y out ∈ Y out := {K + 1, . . . , K + O}, Y out ∩ Y in = ∅. In our experiments, we sample from the mixture distribution. The conditional probability of drawing from this mixed distribution is P X|Z=0 = D in for in-distribution and P X|Z=1 = D out for out-of-distribution. Our problem setup allows access to OOD samples for training. We are therefore presented with the following challenge: Given an image X drawn from the mixture distribution P X * Z -can we distinguish whether the image is from in-distribution D in ?



Generation (SGG) refers to the task of automatically mapping an image or a video into a semantic structural scene graph, [Zhu et al. (2022) requiring the accurate labelling of detected objects and their relational structures. Although this is a tricky task, the availability of extensive datasets, such as Visual Genome [Krishna et al. (2016)], and massive models, such as OSCAR [Li et al. (2020); Zhang et al. (2021)] and RelationFormer [Shit et al. (2022)], has shown impressive results.

Detecting Out Of Distribution (OOD) points in a relatively lower dimensional space have been extensively used[Pimentel et al. (2014)] in experiments. Conventionally, these methods include density estimation, nearest neighbour-based algorithms, and clustering analysis. The density estimation approach uses probabilistic models to estimate the in-distribution density, while declaring a data point out of distribution if it is located in a low-density region. Clustering-based methods rely on distance measures between points to find out-of-distribution points (that are further away from the neighbourhood). The primary drawback of these methods has always been their inadequacy in working with high-dimensional data[Theis et al. (2015)], such as images.

