AN EMPIRICAL EXPLORATION OF OPEN-SET RECOG-NITION VIA LIGHTWEIGHT STATISTICAL PIPELINES Anonymous

Abstract

Machine-learned safety-critical systems need to be self-aware and reliably know their unknowns in the open-world. This is often explored through the lens of anomaly/outlier detection or out-of-distribution modeling. One popular formulation is that of open-set classification, where an image classifier trained for 1-of-K classes should also recognize images belonging to a (K + 1) th "other" class, not present in the training set. Recent work has shown that, somewhat surprisingly, most if not all existing open-world methods do not work well on high-dimensional open-world images (Shafaei et al., 2019). In this paper, we carry out an empirical exploration of open-set classification, and find that combining classic statistical methods with carefully computed features can dramatically outperform prior work. We extract features from off-the-shelf (OTS) state-of-the-art networks for the underlying K-way closed-world task. We leverage insights from the retrieval community for computing feature descriptors that are low-dimensional (via pooling and PCA) and normalized (via L2-normalization), enabling the modeling of training data densities via classic statistical tools such as kmeans and Gaussian Mixture Models (GMMs). Finally, we (re)introduce the task of open-set semantic segmentation, which requires classifying individual pixels into one of K known classes or an "other" class. In this setting, our feature-based statistical models noticeably outperform prior open-world methods.

1. INTRODUCTION

Embodied perception and autonomy require systems to be self-aware and reliably know their unknowns. This requirement is often formulated as the open set recognition problem (Scheirer et al., 2012) , meaning that the system, e.g., a K-way classification model, should recognize anomalous examples that do not belong to one of K closed-world classes. This is a significant challenge for machine-learned systems that notoriously over-generalize to anomalies and unknowns on which they should instead raise a warning flag (Amodei et al., 2016) . Open-world benchmarks: Curating open-world benchmarks is hard (Liu et al., 2019) . One common strategy re-purposes existing classification datasets into closed vs open examples -e.g., declaring MNIST digits 0-5 as closed and 6-9 as open (Neal et al., 2018; Oza & Patel, 2019; Geng et al., 2020) . In contrast, anomaly/out-of-distribution (OOD) benchmarks usually generate anomalous samples by adding examples from different datasets -e.g., declaring CIFAR as anomalous for MNIST (Ge et al., 2017; Oza & Patel, 2019; Liu et al., 2019) In this paper, we carry out a rigorous empirical exploration of open-set recognition of highdimensionial images. We explore simple statistical models such as Nearest Class Means (NCMs), kmeans and Gaussian Mixture Models (GMMs). Our hypothesis is that such classic statistical methods can reliably model the closed-world distribution (through the closed-world training data), Contribution 1: We build classic statistical models on top of off-the-shelf (OTS) features computed by the underlying K-way classification network. We find it crucial to use OTS features that have been pre-trained and post-processed appropriately (discussed further below). Armed with such features, we find classic statistical models such as kmeans and GMMs (Murphy, 2012) can outperform prior work. We describe two core technical insights below. Insight & Pentland, 1991) . Then, to ensure features are invariant to scalings, we adopt L2 normalization (Gong et al., 2014; Gordo et al., 2017) . While these are somewhat standard practices for deep feature extraction in areas such as retrieval, their combination is not well explored in the open-set literature (Bendale & Boult, 2016; Grathwohl et al., 2019) . Given a particular OTS K-way classification network, we determine the "right" feature processing through validation. In particular, we find that L2-normalization greatly boosts open-world recognition performance; spatial pooling and PCA altogether reduce feature dimension by three orders of magnitude without degrading performance, resulting in a lightweight pipeline. Contribution 2: We re(introduce) the problem of open-set semantic segmentation. Interestingly, classic benchmarks explicitly evaluate background pixels outside the set of K classes of interest (Everingham et al., 2015) . However, contemporary benchmarks such as Cityscapes (Cordts et al., 2016) 



Figure 1: We motivate open-set recognition with safety concerns in autonomous systems. Left: State-ofthe-art semantic segmentation networks (Wang et al., 2019) do not model "strollers", which are outside the K closed-set categories in Cityscapes benchmark(Cordts et al., 2016). Here, the network misclassifies the "stroller" as a "motorcycle", which can be a critical mistake when fed into an autonomy stack because the two objects exhibit different behaviours (and so require different plans for obstacle avoidance). Right: While classic semantic segmentation benchmarks explicitly evaluate background pixels outside the set of K classes(Everingham et al., 2015), contemporary benchmarks such as Cityscapes ignore such pixels during evaluation. As a result, most segmentation networks also ignore such pixels during training. Perhaps surprisingly, such ignored pixels include vulnerable objects like wheelchairs and strollers (see left). We repurpose these ignored pixels as open-set examples that are from the (K+1) th "other" class, allowing for a large-scale exploration of open-set recognition via semantic segmentation.

Intuitively, classifiers can easily overfit to the available set of open-world images, which won't likely exhaustively span the open world outside the K classes of interest.

-1 Pre-training networks (e.g., onImageNet (Deng et al., 2009)) is a common practice for traditional closed-world tasks. However, to the best of our knowledge, open-world methods do not sufficiently exploit pre-training(Oza & Patel, 2019).Hendrycks et al. (2019a)  report that pre-training improves anomaly detection using softmax confidence thresholding(Hendrycks & Gimpel, 2017). We find pretraining to be a crucial factor in learning better representations that support more sophisticated open-world reasoning. Intuitively, pre-trained networks expose themselves to diverse data that may look similar to open-world examples encountered at test-time. We operationalize this intuition by building statistical models on top of existing discriminative networks, which tend to make use of pre-training by design. We demonstrate this significantly outperforms features trained from scratch, as most prior open-set work does.

