EFFECTIVE CROSS-INSTANCE POSITIVE RELATIONS FOR GENERALIZED CATEGORY DISCOVERY

Abstract

We tackle the issue of generalized category discovery (GCD). GCD considers the open-world problem of automatically clustering a partially labelled dataset, in which the unlabelled data contain instances from novel categories and also the labelled classes. In this paper, we address the GCD problem without a known category number in the unlabelled data. We propose a framework, named CiP, to bootstrap the representation by exploiting Cross-instance Positive relations for contrastive learning in the partially labelled data which are neglected in existing methods. First, to obtain reliable cross-instance relations to facilitate the representation learning, we introduce a semi-supervised hierarchical clustering algorithm, named selective neighbor clustering (SNC), which can produce a clustering hierarchy directly from the connected components in the graph constructed by selective neighbors. We also extend SNC to be capable of label assignment for the unlabelled instances with the given class number. Moreover, we present a method to estimate the unknown class number using SNC with a joint reference score considering clustering indexes of both labelled and unlabelled data. Finally, we thoroughly evaluate our CiP framework on public generic image recognition datasets (CIFAR-10, CIFAR-100, and ImageNet-100) and challenging fine-grained datasets (CUB, Stanford Cars, and Herbarium19), all establishing the new state-of-the-art.

1. INTRODUCTION

After training on large-scale datasets with human annotations, existing machine learning models can achieve superb performance (e.g., (Krizhevsky et al., 2012) ). However, the success of these models heavily relies on the fact that they are only tasked to recognize images from the same set of classes with large-scale human annotations on which they are trained. This limits their application in the real open world where we will encounter data without annotations and from unseen categories. Indeed, more and more efforts have been devoted to dealing with more realistic settings. For example, semi-supervised learning (SSL) (Chapelle et al., 2006) aims at training a robust model using both labelled and unlabelled data from the same set of classes; few-shot learning (Snell et al., 2017) tries to learn models that can generalize to new classes with few annotated samples; open-set recognition (OSR) (Scheirer et al., 2012) learns to tell whether or not an unlabelled image belongs to one of the classes on which the model is trained. More recently, the problem of novel category discovery (NCD) (Han et al., 2019; 2020; Fini et al., 2021) has been introduced, which learns models to automatically partition unlabelled data from unseen categories by transferring knowledge from seen categories. One assumption in early NCD methods is that unlabelled images are all from unseen categories only. NCD has been recently extended to a more generalized setting, called generalized category discovery (GCD) (Vaze et al., 2022b) , by relaxing the assumption to reflect the real world better, i.e., unlabelled images are from both seen and unseen categories. In this paper, we tackle the problem of GCD by drawing inspiration from the baseline method (Vaze et al., 2022b) . In (Vaze et al., 2022b) , a vision transformer model was first trained for representation learning using supervised contrastive learning on labelled data and self-supervised contrastive learning on both labelled and unlabelled data. With the learned representation, semi-supervised k-means (Han et al., 2019) was then adopted for label assignment across all instances. In addition, based on semi-supervised k-means, (Vaze et al., 2022b) also introduced an algorithm to estimate the unknown category number for the unlabelled data by examining possible category numbers in a given range. However, this approach has several limitations. First, during representation learning, the method considers labelled and unlabelled data independently, and uses a stronger training signal for the labelled data which might compromise the representation of the unlabelled data. Second, the method requires a known category number for performing label assignment. Third, the category number estimation method is slow as it needs to run the clustering algorithm multiple times to test different category numbers. To overcome the above limitations, we propose a new approach for GCD which does not require a known unseen category number and considers Cross-instance Positive relations in unlabelled data for better representation learning (CiP). At the core of our approach is our novel semi-supervised hierarchical clustering algorithm with selective neighbor, named as selective neighbor clustering (SNC), that takes inspiration from the parameter-free hierarchical clustering method FINCH (Sarfraz et al., 2019) . SNC can not only generate reliable pseudo labels for cross-instance positive relations, but also estimate unseen category numbers without the need for repeated runs of the clustering algorithm. SNC builds a graph indicating all subtly selected neighbor relations constrained by the labelled instances, and produces clusters directly from the connected components in the graph. SNC iteratively constructs a hierarchy of partitions with different granularity, while satisfying the constraints imposed by the labelled instances. With a one-by-one merging strategy, SNC can quickly estimate a reliable class number without repeated runs of the algorithm, which makes it significantly faster than (Vaze et al., 2022b) . The main contributions of this paper can be summarized as follows: (1) we propose a new GCD framework, named CiP, exploiting more cross-instance positive relations in the partially labelled set to strengthen the connections among all instances, fostering the representation learning for better category discovery; (2) we introduce a semi-supervised hierarchical clustering algorithm, named SNC, that can be adopted for reliable pseudo label generation during training and label assignment during testing; (3) we further leverage SNC for class number estimation by exploring intrinsic and extrinsic clustering quality based on a joint reference score considering both labelled and unlabelled data; (4) we comprehensively evaluate our CiP framework on both generic image recognition datasets and challenging fine-grained datasets, and demonstrate state-of-the-art performance across the board.

2. RELATED WORK

Our work is related to novel/generalized category discovery, semi-supervised learning, and open-set recognition. Novel category discovery (NCD) aims at discovering new classes in unlabelled data by leveraging knowledge learned from labelled data. It was pioneered by (Han et al., 2019) with a transfer clustering approach. Some earlier works on cross-domain/task transfer learning (Hsu et al., 2018a; b) can also be adopted to tackle this problem. (Han et al., 2020) proposed an efficient method called AutoNovel (aka RankStats) using ranking statistics. They first learned a good embedding using low-level selfsupervised learning on all data followed by supervised learning on labelled data for higher level features. They introduced a robust ranking statistics to determine whether two unlabelled instances are from the same class for NCD. Several successive works based on RankStats were proposed. For example, (Jia et al., 2021) proposed to use WTA hashing (Yagnik et al., 2011) for NCD in single-and multi-modal data; Zhao and Han (Zhao & Han, 2021) extended NCD with dual ranking statistics and knowledge distillation. (Fini et al., 2021) proposed UNO which uses a unified cross entropy loss to train labelled and unlabelled data. (Chi et al., 2022) proposed meta discovery which links NCD to meta learning with limited labelled data. (Vaze et al., 2022b) introduced generalized category discovery (GCD) which extends NCD by allowing unlabelled data from both old and new classes. They first finetuned a pretrained DINO ViT (Caron et al., 2021) with both supervised contrastive loss and self-supervised contrastive loss. Semi-supervised k-means was then adopted for label assignment. A concurrent work called ORCA by (Cao et al., 2022) addressed a similar problem by formulating it as open-world semi-supervised learning. We draw inspiration from (Vaze et al., 2022b) and develop a novel method to tackle GCD by exploring cross-instance correlations on labelled and unlabelled data which have been neglected in (Vaze et al., 2022b) . Semi-supervised learning (SSL) has long been studied in the machine learning community (Chapelle et al., 2006) . It aims at learning a good model by leveraging unlabelled data from the same set of classes as the labelled data. Various methods have been proposed for SSL. For example, Πmodel (Laine & Aila, 2017) uses self-ensembling to leverage label predictions on different epochs and under different conditions; Mean Teacher (Tarvainen & Valpola, 2017) utilizes averaging model weights instead of label predictions; FixMatch (Sohn et al., 2020) and FlexMatch (Zhang et al., 2021) 

