DECOMPOSING TEXTURE AND SEMANTIC FOR OUT-OF-DISTRIBUTION DETECTION Anonymous

Abstract

Out-of-distribution (OOD) detection tasks have made significant progress recently since the distribution mismatch between training and testing can severely deteriorate the reliability of AI systems. Nevertheless, the lack of precise interpretation for the in-distribution (ID) limits the application of the OOD detection methods to real-world systems. To tackle this, we decompose the definition of the ID into texture and semantics, motivated by the demands of real-world scenarios. We also design new benchmarks to measure the robustness that OOD detection methods should have. To achieve a good balance between the OOD detection performance and robustness, our method takes a divide-and-conquer approach. Specifically, the proposed model first handles each component of the texture and semantics separately and then fuses these later. This philosophy is empirically proven by a series of benchmarks including both the proposed and the conventional counterpart.

1. INTRODUCTION

Out-of-distribution (OOD) detection is the task that recognizes whether the given data comes from the distribution of training samples, also known as in-distribution (ID), or not. Any machine learning-based system could receive input samples that have a completely disparate distribution from the training environments (e.g., dataset). Since the distribution shift can severely degrade the model performance (Amodei et al., 2016) , it is a potential threat to reliable real-world AI systems. However, the ambiguous definition of the ID limits the feasibility of the OOD detection method in real-world applications, considering the various OOD scenarios. For example, subtle corruption is a clear signal of OOD in the machine vision field while a change in semantic information might not be. On the other hand, an autonomous driving system may assume the ID from the semanticoriented perspective; e.g., an unseen traffic sign is OOD. Unfortunately, most of the conventional OOD detection methods and benchmarks (Zhang et al., 2021; Tack et al., 2020; Ren et al., 2019; Chan et al., 2021) assume the ID as a single-mode thus they cannot handle other aspects of OOD properly (Figure 1a ). To tackle this, we revisit the definition of the ID by decomposing it into two factors: texture and semantics (Figure 1b ). For the texture factor, we define OOD as the textural difference between the ID and OOD datasets. On the contrary, the semantic OOD focuses on the class labels that do not exist in the ID environment. Note that the two aspects have a trade-off relationship, thus detecting both problems with a single model is challenging with the (conventional) entangled OOD perspective. 2018) investigated the texture-shape cue conflict in the deep network and a series of subsequent studies (Hermann et al., 2019; Li et al., 2020; Ahmed & Courville, 2020) explored how to achieve a balance between these perspectives. However, they only analyzed the texture-shape bias inherited in deep networks. Instead, we focus on analyzing the texture and semantic characteristics underlying the ID to build a more practically applicable OOD detection method.

Geirhos et al. (

Unfortunately, to the best of our knowledge, none of the studies on the OOD detection benchmark have thoroughly analyzed the definition of the ID. It can be problematic when the method judges the image corrupted by negligible distortion as OOD, even when the environment can tolerate the small changes in texture. Because of such a complicated scenario, it is crucial to evaluate the OOD detection method in a comprehensive way that goes beyond the simple benchmark. Thus, in this study, we propose a new approach to measuring the performance of the method according to the decomposed 1

