EFFECTIVE CROSS-INSTANCE POSITIVE RELATIONS FOR GENERALIZED CATEGORY DISCOVERY

Abstract

We tackle the issue of generalized category discovery (GCD). GCD considers the open-world problem of automatically clustering a partially labelled dataset, in which the unlabelled data contain instances from novel categories and also the labelled classes. In this paper, we address the GCD problem without a known category number in the unlabelled data. We propose a framework, named CiP, to bootstrap the representation by exploiting Cross-instance Positive relations for contrastive learning in the partially labelled data which are neglected in existing methods. First, to obtain reliable cross-instance relations to facilitate the representation learning, we introduce a semi-supervised hierarchical clustering algorithm, named selective neighbor clustering (SNC), which can produce a clustering hierarchy directly from the connected components in the graph constructed by selective neighbors. We also extend SNC to be capable of label assignment for the unlabelled instances with the given class number. Moreover, we present a method to estimate the unknown class number using SNC with a joint reference score considering clustering indexes of both labelled and unlabelled data. Finally, we thoroughly evaluate our CiP framework on public generic image recognition datasets (CIFAR-10, CIFAR-100, and ImageNet-100) and challenging fine-grained datasets (CUB, Stanford Cars, and Herbarium19), all establishing the new state-of-the-art.

1. INTRODUCTION

After training on large-scale datasets with human annotations, existing machine learning models can achieve superb performance (e.g., (Krizhevsky et al., 2012) ). However, the success of these models heavily relies on the fact that they are only tasked to recognize images from the same set of classes with large-scale human annotations on which they are trained. This limits their application in the real open world where we will encounter data without annotations and from unseen categories. Indeed, more and more efforts have been devoted to dealing with more realistic settings. For example, semi-supervised learning (SSL) (Chapelle et al., 2006) aims at training a robust model using both labelled and unlabelled data from the same set of classes; few-shot learning (Snell et al., 2017) tries to learn models that can generalize to new classes with few annotated samples; open-set recognition (OSR) (Scheirer et al., 2012) learns to tell whether or not an unlabelled image belongs to one of the classes on which the model is trained. More recently, the problem of novel category discovery (NCD) (Han et al., 2019; 2020; Fini et al., 2021) has been introduced, which learns models to automatically partition unlabelled data from unseen categories by transferring knowledge from seen categories. One assumption in early NCD methods is that unlabelled images are all from unseen categories only. NCD has been recently extended to a more generalized setting, called generalized category discovery (GCD) (Vaze et al., 2022b) , by relaxing the assumption to reflect the real world better, i.e., unlabelled images are from both seen and unseen categories. In this paper, we tackle the problem of GCD by drawing inspiration from the baseline method (Vaze et al., 2022b) . In (Vaze et al., 2022b) , a vision transformer model was first trained for representation learning using supervised contrastive learning on labelled data and self-supervised contrastive learning on both labelled and unlabelled data. With the learned representation, semi-supervised k-means (Han et al., 2019) was then adopted for label assignment across all instances. In addition, based on semi-supervised k-means, (Vaze et al., 2022b) also introduced an algorithm to estimate the unknown category number for the unlabelled data by examining possible category numbers in a given range. However, this approach has several limitations. First, during representation learning, the method considers labelled and unlabelled data independently, and uses a stronger training signal for the

