ONE-CLASS CLASSIFICATION ROBUST TO GEOMETRIC TRANSFORMATIONS

Abstract

Recent studies on one-class classification have achieved a remarkable performance, by employing the self-supervised classifier that predicts the geometric transformation applied to in-class images. However, they cannot identify inclass images at all when the input images are geometrically-transformed (e.g., rotated images), because their classification-based in-class scores assume that input images always have a fixed viewpoint, as similar to the images used for training. Pointing out that humans can easily recognize such transformed images as the same class, in this work, we aim to propose a one-class classifier robust to geometrically-transformed inputs, named as GROC. To this end, we introduce a conformity score which indicates how strongly an input image agrees with one of the predefined in-class transformations, then utilize the conformity score with our proposed agreement measures for one-class classification. Our extensive experiments demonstrate that GROC is able to accurately distinguish in-class images from out-of-class images regardless of whether the inputs are geometricallytransformed or not, whereas the existing methods fail.

1. INTRODUCTION

One-class classification refers to the problem of identifying whether an input example belongs to a single target class (in-class) or any of novel classes (out-of-class). The main challenge of this task is that only in-class examples are available at training time. Thus, by using only positive examples, a model has to learn the decision boundary that distinguishes in-class examples from out-of-class examples, whose distribution is assumed to be unknown in practice. Early work on one-class classification mainly utilized kernel-based methods (Schölkopf et al., 2000; Tax & Duin, 2004 ) to find a hypersphere (or hyperplane) enclosing all training in-class examples, or density estimation techniques (Parzen, 1962) to measure the likelihood of an input example. In the era of deep learning, numerous literature have tried to employ deep neural networks to effectively learn the high-dimensional data (e.g., images). Most of them aim to detect out-of-class examples based on density estimation, by adopting the architecture of autoencoders (Ruff et al., 2018; Zong et al., 2018) or generative adversarial networks (GANs) (Schlegl et al., 2017; Zenati et al., 2018) . Nevertheless, their supervision is not useful enough to capture the semantic of highdimensional data for a target class, which eventually leads to the limited performance. Recently, there have been several attempts to make use of self-supervised learning (Golan & El-Yaniv, 2018; Hendrycks et al., 2019; Bergman & Hoshen, 2020) for more informative supervision on the target class, and made a major breakthrough to this problem. They build a self-labeled image set by applying a bunch of geometric transformations to training images, then train a classifier to accurately predict the transformation applied to original input images. This approach achieved the state-of-theart performance for one-class classification even without modeling the latent distribution of in-class examples for density estimation. However, all the aforementioned methods are quite vulnerable to spatial variances within the images, because they were developed based on the assumption that in-class (and out-of-class) images have a fixed viewpoint. In particular, the existing self-supervised methods do not work completely for the inputs with various viewpoints in that their capability of predicting the geometric transformation relies on the fixed viewpoint. Note that humans usually recognize that the images of a target object with different viewpoints belong to the same class; in this sense, the one-class classifiers also

