FINE-GRAIN INFERENCE ON OUT-OF-DISTRIBUTION DATA WITH HIERARCHICAL CLASSIFICATION

Abstract

Machine learning methods must be trusted to make appropriate decisions in realworld environments, even when faced with out-of-distribution (OOD) samples. Many current approaches simply aim to detect OOD examples and alert the user when an unrecognized input is given. However, when the OOD sample significantly overlaps with the training data, a binary anomaly detection is not interpretable or explainable, and provides little information to the user. We propose a new model for OOD detection that makes predictions at varying levels of granularity-as the inputs become more ambiguous, the model predictions become coarser and more conservative. Consider an animal classifier that encounters an unknown bird species and a car. Both cases are OOD, but the user gains more information if the classifier recognizes that its uncertainty over the particular species is too large and predicts "bird" instead of detecting it as OOD. Furthermore, we diagnose the classifier's performance at each level of the hierarchy improving the explainability and interpretability of the model's predictions. We demonstrate the effectiveness of hierarchical classifiers for both fine-and coarse-grained OOD tasks.

1. INTRODUCTION

Real-world computer vision systems will encounter out-of-distribution (OOD) samples while making or informing consequential decisions. Therefore, it is crucial to design machine learning methods that make reasonable predictions for anomalous inputs that are outside the scope of the training distribution. Recently, research has focused on detecting inputs during inference that are OOD for the training distribution (Ahmed & Courville, 2020; Hendrycks & Gimpel, 2017; Hendrycks et al., 2019; Hsu et al., 2020; Huang & Li, 2021; Lakshminarayanan et al., 2017; Lee et al., 2018; Liang et al., 2018; Liu et al., 2020; Neal et al., 2018; Roady et al., 2020; Inkawhich et al., 2022) . These methods typically use a threshold on the model's "confidence" to produce a binary decision indicating if the sample is in-distribution (ID) or OOD. However, binary decisions based on model heuristics offer little interpretability or explainability. The fundamental problem is that there are many ways for a sample to be out-of-distribution. Ideally, a model should provide more nuanced information about how a sample differs from the training data. For example, if a bird classifier is presented with a novel bird species, we would like it to recognize that the sample is a bird rather than simply reporting OOD. On the contrary, if the bird classifier is shown an MNIST digit then it should indicate that the digit is outside its domain of expertise. Recent studies have shown that fine-grained OOD samples are significantly more difficult to detect, especially when there is a large number of training classes (Ahmed & Courville, 2020; Huang & Li, 2021; Roady et al., 2020; Zhang et al., 2021; Inkawhich et al., 2021) . We argue that the difficulty stems from trying to address two opposing objectives: learning semantically meaningful features to discriminate between ID classes while also maintaining tight decision boundaries to avoid misclassification on fine-grain OOD samples (Ahmed & Courville, 2020; Huang & Li, 2021) . We hypothesize that additional information about the relationships between classes could help determine those decision boundaries and simultaneously offer more interpretable predictions. To address these challenges, we propose a new method based on hierarchical classification. The approach is illustrated in fig. 1 . Rather than directly outputting a distribution over all possible classes, as in a flat network, hierarchical classification methods leverage the relationships between classes to produce conditional probabilities for each node in the tree. This can simplify the classification

