CLUSTERING-FRIENDLY REPRESENTATION LEARN-ING VIA INSTANCE DISCRIMINATION AND FEATURE DECORRELATION

Abstract

Clustering is one of the most fundamental tasks in machine learning. Recently, deep clustering has become a major trend in clustering techniques. Representation learning often plays an important role in the effectiveness of deep clustering, and thus can be a principal cause of performance degradation. In this paper, we propose a clustering-friendly representation learning method using instance discrimination and feature decorrelation. Our deep-learning-based representation learning method is motivated by the properties of classical spectral clustering. Instance discrimination learns similarities among data and feature decorrelation removes redundant correlation among features. We utilize an instance discrimination method in which learning individual instance classes leads to learning similarity among instances. Through detailed experiments and examination, we show that the approach can be adapted to learning a latent space for clustering. We design novel softmax-formulated decorrelation constraints for learning. In evaluations of image clustering using CIFAR-10 and ImageNet-10, our method achieves accuracy of 81.5% and 95.4%, respectively. We also show that the softmax-formulated constraints are compatible with various neural networks.



In order to learn representations for clustering, recent works utilize metric learning which automatically learns similarity functions from data Chang et al. (2017); Wu et al. (2019) . They assign pseudo-labels or pseudo-graph to unlabeled data by similarity measures in latent space, and learn discriminative representations to cluster data. These works improve clustering performance on real world images such as CIFAR-10 and ImageNet-10, and indicate the impact of representation learning on clustering. Although features from learned similarity function and pseudo-labels work well for clustering, algorithms still seem to be heuristic; we design a novel algorithm which is based on knowledge from established clustering techniques. In this work, we exploit a core idea of spectral clustering which uses eigenvectors derived from similarities. Spectral clustering has been theoretically and experimentally investigated, and known to outperform other traditional clustering methods Von Luxburg (2007) . The algorithm involves similarity matrix construction, transformation from similarity matrix to Laplacian, and eigendecomposition. Based on eigenvectors, data points are mapped into a lower dimensional representation which carries information of similarities and is preferable for clustering. We bring this idea of eigenvector representation into deep representation learning. We design the representation learning with two aims: 1) learning similarities among instances; and 2) reducing correlations within features. The first corresponds to Laplacian, and the second corresponds to feature orthogonality constrains in the spectral clustering algorithm. Learning process integrating both is relevant to eigendecomposition of Laplacian matrix in the spectral clustering. For the first aim, we adopt the instance discrimination method presented in Wu et al. (2018) , where each unlabeled instance is treated as its own distinct class, and discriminative representations are learned to distinguish between individual instance classes. This numerous-class discriminative learning enables learning partial but important features, such as small foreground objects in natural images. Wu et al. (2018) showed that the representation features retain apparent similarity among images and improve the performance of image classification by the nearest neighbor method. We extend their work to the clustering tasks. We clarify their softmax formulation works like similarity matrix in spectral clustering under the condition that temperature parameter τ , which was underexplored in Wu et al. (2018) , is set to be a larger value . For the second aim, we introduce constraints which have the effect of making latent features orthogonal. Orthogonality is often an essential idea in dimension reduction methods such as principal components analysis, and it is preferable for latent features to be independent to ensure that redundant information is reduced. Orthogonality is also essential to a connection between proposed method and spectral clustering, as stated in Section 3.4. In addition to a simple soft orthogonal constraint, we design a novel softmax-formulated decorrelation constraint. Our softmax constraint is "softer" than the soft orthogonal constraint for learning independent feature spaces, but realizes stable improvement of clustering performance. Finally, we combine instance discrimination and feature decorrelation into learning representation to improve the performance of complex image clustering. For the CIFAR-10 and ImageNet-10 datasets, our method achieves accuracy of 81.5% and 95.4%, respectively. Our PyTorch Paszke et al. ( 2019) implementation of IDFD is available at https://github.com/TTN-YKK/Clustering_ friendly_representation_learning. Our main contributions are as follows: • We propose a clustering-friendly representation learning method combining instance discrimination and feature decorrelation based on spectral clustering properties. • We adapt deep representation learning by instance discrimination to clustering and clarify the essential properties of the temperature parameter. • We design a softmax-formulated orthogonal constraint for learning latent features and realize stable improvement of clustering performance. • Our representation learning method achieves performance comparable to state-of-the-art levels for image clustering tasks with simple k-means. 



of the most fundamental tasks in machine learning. Recently, deep clustering has become a major trend in clustering techniques. In a fundamental form, autoencoders are used for feature extraction, and classical clustering techniques such as k-means are serially applied to the features. Recent deep clustering techniques integrate learning processes of feature extraction and clustering, yielding high performance for large-scale datasets such as handwritten digits Hu et al. (2017); Shaham et al. (2018); Xie et al. (2016); Tao et al. (2018). However, those methods have fallen short when targets become more complex, as in the case of real-world photograph dataset CIFAR-10 Krizhevsky et al. (2009). Several works report powerful representation learning leads to improvement of clustering performance on complex datasets Chang et al. (2017); Wu et al. (2019). Learning representation is a key challenge to unsupervised clustering.

Deep clustering methods offer state-of-the-art performance in various fields. Most early deep clustering methods, such as Vincent et al. (2010); Tian et al. (2014), are two-stage methods that apply clustering after learning low-dimensional representations of data in a nonlinear latent space. The autoencoder method proposed in Hinton & Salakhutdinov (2006) is one of the most effective methods for learning representations. Recent works have simultaneously performed representation learning and clustering Song et al. (2013); Xie et al. (2016); Yang et al. (2017); Guo et al. (2017); Tao et al. (2018). Several methods based on generative models have also been proposed Jiang et al. (2016); Dilokthanakul et al. (2016). These methods outperform conventional methods, and sometimes offer performance comparable to that of supervised learning for simple datasets. Deep-learning-based unsupervised image clustering is also being developed Chang et al. (2017); Wu et al. (2019); Ji et al. (2019); Gupta et al. (2020); Van Gansbeke et al. (2020).

