OPEN-WORLD SEMI-SUPERVISED LEARNING

Abstract

Supervised and semi-supervised learning methods have been traditionally designed for the closed-world setting which is based on the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, the real world is often open and dynamic, and thus novel previously unseen classes may appear in the test data or during the model deployment. Here, we introduce a new open-world semi-supervised learning setting in which the model is required to recognize previously seen classes, as well as to discover novel classes never seen in the labeled dataset. To tackle the problem, we propose ORCA, an approach that jointly learns a feature representation and a classifier on the labeled and unlabeled subsets of the data. The key idea in ORCA is in introducing uncertainty based adaptive margin that effectively circumvents the bias caused by the imbalance of variance between seen and novel classes. We demonstrate that ORCA accurately discovers novel classes and assigns samples to previously seen classes on standard benchmark image classification datasets, including CIFAR and ImageNet. Remarkably, despite solving the harder task ORCA outperforms semi-supervised methods on seen classes, as well as novel class discovery methods on unseen classes, achieving 7% and 151% improvements on seen and unseen classes of the ImageNet dataset.

1. INTRODUCTION

With the advent of deep learning, remarkable breakthroughs have been achieved and current machine learning systems excel on tasks with large quantities of labeled data. Despite the strengths, the vast majority of models are designed for the closed-world setting rooted in the assumption that training and test data come from the same set of predefined classes. This assumption, however, rarely holds in practice, as labeling data depends on the domain-specific knowledge which can be severely incomplete and insufficient to account for all possible scenarios. Thus, it is unrealistic to expect that one can identify and prelabel all categories/classes ahead of time, and manually supervise machine learning models. In contrast to the commonly assumed closed world, the real world is inherently dynamic and open -new classes can emerge in the test data that have never been encountered during training. Openworld setting requires the models to be able to classify previously seen classes, but also effectively handle never-before-seen classes. This task is very natural to human intelligence; children can effortlessly recognize previously learnt concepts, but also detect the patterns and differences of the new ones. However, it is still an open question whether we can design versatile models that can successfully deal with the world of unknown, while not forgetting the world of known. Semi-supervised learning (SSL) (Chapelle et al., 2009) aims in leveraging unlabeled data when labels are difficult and costly to obtain. Recent works (Oliver et al., 2018; Chen et al., 2020b) show that incorporating novel classes in the unlabeled set degrades performance of SSL methods. To alleviate this limitation, Guo et al. ( 2020) ensure safety of SSL with the presence of novel classes as well. However, the ability to differentiate between seen and unseen classes is not sufficient as we need methods that can properly handle unseen classes. On the other hand, methods for discovering novel classes (Hsu et al., 2018; 2019; Han et al., 2019; 2020) utilize labeled data solely to learn a richer representation, and are not able to recognize seen and discover unseen classes at the same time.

Labeled set

Unlabeled set "elephant" "cheetah" "octopus" To address the challenges of open-world SSL, we propose ORCA (open-world with uncertainty based adaptive margin), an approach that can discover novel classes, while at the same time achieves high performance on classifying previously seen classes. Using both labeled and unlabeled data, ORCA learns a joint embedding function parameterized by a convolutional neural network, and linear classifier that assigns samples to previously seen classes, or to novel classes discovered by ORCA. The starting point is to initialize the model using self-supervised pretraining which previously showed effectiveness for semi-supervised learning (Zhai et al., 2019) and novel class discovery (Han et al., 2020; Van Gansbeke et al., 2020) . The objective function in ORCA consists of three main components: (i) supervised loss on labeled data, (ii) pairwise loss on labeled and unlabeled data estimated using pseudo-labels inferred from the most confident pairwise similarities, and (iii) regularization that prevents the model to assign all unlabeled samples to the same class. However, naively combining supervised and pairwise losses leads to the bias towards seen classes which reduces the ability to adapt to novel classes. To mitigate the bias, the key idea in ORCA lies in introducing uncertainty based adaptive margin in the supervised loss that gradually decreases plasticity and increases discriminability of the model during training. We evaluate ORCA on benchmark image classification datasets. The results show that ORCA significantly outperforms SSL methods on the task of recognizing previously seen classes, as well as novel class discovery methods on the task of discovering unseen classes. On the latter task, ORCA improves performance of baseline methods by 51% on CIFAR-100 and 151% on ImageNet-100 dataset.

2. RELATED WORK

Open-world SSL lies on the intersection of semi-supervised learning, novel class discovery and open-world recognition. Semi-supervised learning (SSL). While the literature on SSL (Chapelle et al., 2009) is vast, two most explored directions are to utilize the structure of the unlabeled data using consistency regularization (Sajjadi et al., 2016; Laine & Aila, 2016) , or entropy minimization (Grandvalet & Bengio, 2005) . Closely related to our work are pseudo-labeling based approaches (Lee, 2013; Sohn et al., 2020) which generate pseudo-labels for more confident unlabeled samples and use them as targets in a standard supervised loss function. Under the typically assumed closed-world assumption, SSL methods achieve highly competitive performance to supervised methods; however, recent works



Figure 1: In the open-world semi-supervised learning, the unlabeled dataset may contain classes that have never been encountered in the labeled set. The model needs to be able to classify samples into previously seen classes, but also distinguish between unseen classes.Here, we introduce open-world semi-supervised learning. In this setting, the unlabeled dataset may contain classes that have never been seen in the labeled dataset, and the model needs to be able to: (i) recognize when a sample from the unlabeled data belongs to one of the seen classes present in the labeled dataset, and (ii) automatically discover novel/unseen classes without any previous knowledge (Figure1). The latter requires the ability to identify new features that can separate unseen classes. Open-world SSL is related, but differs from, continual learning (Kirkpatrick et al., 2017), generalized zero-shot learning (Xian et al., 2017), open-set (Scheirer et al., 2012) and open-world recognition (Bendale & Boult, 2015). In particular, by utilizing both labeled and unlabeled data open-world SSL relies on the transductive inference unlike the continual learning which sequentially learns new tasks while trying to mitigate catastrophic forgetting. In contrast to the generalized zeroshot learning, open-world SSL does not assume prior knowledge about the unseen classes. Finally, opposed to the open-set and open-world recognition, it requires separation of unknown classes and does not need any external supervision, respectively.

