CONTRASTIVE HIERARCHICAL CLUSTERING

Abstract

Deep clustering has been dominated by flat clustering models, which split a dataset into a predefined number of groups. Although recent methods achieve extremely high similarity with the ground truth on popular benchmarks, the information contained in the flat partition is limited. In this paper, we introduce CoHiClust, a Contrastive Hierarchical Clustering model based on deep neural networks, which can be applied to large-scale image data. By employing a self-supervised learning approach, CoHiClust distills the base network into a binary tree without access to any labeled data. The hierarchical clustering structure can be used to analyze the relationship between clusters as well as to measure the similarity between data points. Experiments performed on typical image benchmarks demonstrate that CoHiClust generates a reasonable structure of clusters, which is consistent with our intuition and image semantics. Moreover, by applying the proposed pruning strategy, we can restrict the hierarchy to the requested number of clusters (leaf nodes) and obtain the clustering accuracy outperforming existing hierarchical baselines.

1. INTRODUCTION

Clustering, a fundamental branch of unsupervised learning, is often one of the first steps in data analysis, which finds applications in anomaly detection Barai & Dey (2017) , personalized recommendations Zhang et al. (2014) Since augmentations used for image data are class invariant, the latter techniques often obtain very high similarity with the ground truth classes. However, we should be careful when comparing clustering techniques only by inspecting their accuracy with ground truth classes because the primary goal of clustering is to deliver information about data and not to perform classification. Most works in the area of deep clustering focus on producing flat partitions with a predefined number of groups. Although hierarchical clustering gained notable attention in classical machine learning and has been frequently applied in real-life problems Zou et al. ( 2020); Śmieja et al. ( 2014), its role has been drastically marginalized in the era of deep learning. In the case of hierarchical clustering, the exact number of clusters does not have to be specified because we can inspect the partition at various tree levels. Moreover, we can analyze the clusters' relationships, e.g. by finding superclusters or measuring the distance between groups in the hierarchy. The above advantages make hierarchical clustering an excellent tool for analyzing complex data. However, in order to take full advantage of hierarchical clustering, it is necessary to create an appropriate image representation, which is possible thanks to the use of deep neural networks. To the best of our knowledge, DeepECT Mautz et al. (2019; 2020) is the only hierarchical clustering model trained jointly with the neural network. Nevertheless, this method has not been examined to large image datasets, which appear in practical applications. To fill this gap, we introduce CoHiClust (Contrastive Hierarchical Clustering), which creates a hierarchy of clusters and can be applied to large image data. CoHiClust uses a neural network to generate a high-level representation of data, which is next distilled into the tree hierarchy by applying the projection head, see Figure 2 . The whole framework is trained jointly in an end-



or bioinformatics Lakhani et al. (2015). Since it does not use any information about class labels, representation learning becomes an integral part of deep clustering methods. Initial approaches use representations taken from pre-trained models Guérin et al. (2017); Naumov et al. (2021) or employ auto-encoders in joint training of the representation and the clustering model Guo et al. (2017a); Mautz et al. (2019). More recent works designed to image data frequently follow the self-supervised learning principle, where representation is trained on pairs of similar images automatically generated by data augmentations Li et al. (2021b); Dang et al. (2021).

