DEEP CLUSTERING AND REPRESENTATION LEARNING THAT PRESERVES GEOMETRIC STRUCTURES

Abstract

In this paper, we propose a novel framework for Deep Clustering and multimanifold Representation Learning (DCRL) that preserves the geometric structure of data. In the proposed DCRL framework, manifold clustering is done in the latent space guided by a clustering loss. To overcome the problem that clusteringoriented losses may deteriorate the geometric structure of embeddings in the latent space, an isometric loss is proposed for preserving intra-manifold structure locally and a ranking loss for inter-manifold structure globally. Experimental results on various datasets show that the DCRL framework leads to performances comparable to current state-of-the-art deep clustering algorithms, yet exhibits superior performance for manifold representation. Our results also demonstrate the importance and effectiveness of the proposed losses in preserving geometric structure in terms of visualization and performance metrics. The code is provided in the Supplementary Material.

1. INTRODUCTION

Clustering, a fundamental tool for data analysis and visualization, has been an essential research topic in data science and machine learning. Conventional clustering algorithms such as K-Means (MacQueen, 1965) , Gaussian Mixture Models (GMM) (Bishop, 2006) , and spectral clustering (Shi & Malik, 2000) perform clustering based on distance or similarity. However, handcrafted distance or similarity measures are rarely reliable for large-scale high-dimensional data, making it increasingly challenging to achieve effective clustering. An intuitive solution is to transform the data from the high-dimensional input space to the low-dimensional latent space and then to cluster the data in the latent space. This can be achieved by applying dimensionality reduction techniques such as PCA (Wold et al., 1987 ), t-SNE (Maaten & Hinton, 2008 ), and UMAP (McInnes et al., 2018) . However, since these methods are not specifically designed for clustering tasks, some of their properties may be contrary to our expectations, e.g., two data points from different manifolds that are close in the input space will be closer in the latent space derived by UMAP. Therefore, the first question here is how to learn the manifold representation that favors clustering? The two main points for the multi-manifold representation learning are Point (1) preserving the local geometric structure within each manifold and Point (2) ensuring the discriminability between different manifolds. Most previous work seems to start with the assumption that the label of each data point is known, and then design the algorithm in a supervised manner, which greatly simplifies the problem of multi-manifold learning. However, it is challenging to decouple complex crossover relations and ensure discriminability between different manifolds, especially in unsupervised settings. One natural strategy is to achieve Point (2) through performing clustering in the input space to get pseudo-labels and then performing representation learning for each manifold. However, clustering is in fact contradictory to Point (1) (which will be analyzed in detail in Sec. 3.3), making it important to alleviate this contradiction so that clustering helps both point (1) and point (2). Thus, the second question here is how to cluster data that favors learning manifold representation? To answer these two questions, some pioneering work has proposed to integrate deep clustering and representation learning into a unified framework by defining a clustering-oriented loss. Though promising performance has been demonstrated on various datasets, we observe that a vital factor has been ignored by these work that the defined clustering-oriented loss may deteriorate the geometric structure of the latent spacefoot_0 , which in turn hurts the performance of visualization, clustering generalization, and manifold representation. In this paper, we propose to jointly perform deep clustering and multi-manifold representation learning with geometric structure preservation. Inspired by Xie et al. ( 2016), the clustering centers are defined as a set of learnable parameters, and we use a clustering loss to simultaneously guide the separation of data points from different manifolds and the learning of the clustering centers. To prevent clustering loss from deteriorating the latent space, an isometric loss and a ranking loss are proposed to preserve the intra-manifold structure locally and inter-manifold structure globally. Finally, we achieve the following three goals related to clustering, geometric structure, and manifold representation: (1) Clustering helps to ensure inter-manifold discriminability; (2) Local structure preservation can be achieved with the presence of clustering; (3) Geometric structure preservation helps clustering. The contributions of this work are summarized as below: • Proposing to integrate deep clustering and multi-manifold representation learning into a unified framework with local and global structure preservation. • Unlike conventional multi-manifold learning algorithms that deal with all point pair relationships between different manifolds simultaneously, we set the clustering centers as a set of learnable parameters and achieve global structure preservation in a faster, more efficient, and easier to optimize manner by applying ranking loss to the clustering centers. • Analyzing the contradiction between two optimization goals of clustering and local structure preservation and proposing an elegant training strategy to alleviate it. • The proposed DCRL algorithm outperforms competing algorithms in terms of clustering effect, generalizability to out-of-sample, and performance in manifold representation.

2. RELATED WORK

Clustering analysis. As a fundamental tool in machine learning, it has been widely applied in various domains. One branch of classical clustering is K-Means (MacQueen, 1965) and Gaussian Mixture Models (GMM) (Bishop, 2006) , which are fast, easy to understand, and can be applied to a large number of problems. However, limited by Euclidean measure, their performance on high-dimensional data is often unsatisfactory. Spectral clustering and its variants (such as SC-Ncut (Bishop, 2006) ) extend clustering to high-dimensional data by allowing more flexible distance measures. However, limited by the computational efficiency of the full Laplace matrix, spectral clustering is challenging to extend to large-scale datasets. 



This claim was first made byIDEC (Guo et al., 2017), but they did not provide experiments to support it. In this paper, however, we show that the geometry of the latent space is indeed disrupted by visualization of learned embeddings (Fig.4), visualization of clustering process (Fig.A3), and statistical analysis (Fig.A5).



Deep clustering. The success of deep learning has contributed to the growth of deep clustering. One branch of deep clustering performs clustering after learning a representation through existing unsupervised techniques. For example,Tian et al. (2014)  use autoencoder to learn low dimensional features and then run K-Means to get clustering results (AE+K-Means). Considering the geometric structure of the data, N2D(McConville et al., 2019)  applies UMAP to find the best clusterable manifold of the obtained embedding, and then run K-Means to discover higher-quality clusters. The other category of algorithms tries to optimize clustering and representation learning jointly. The closest work to us is Deep Embedding Clustering (DEC)(Xie et al., 2016), which learns a mapping from the input space to a low dimensional latent space through iteratively optimizing clustering-oriented objective. As a modified version of DEC, while IDEC(Guo et al., 2017)  claims to preserve the local structure of the data, in reality, their contribution is nothing more than adding a reconstruction loss. JULE(Yang et al., 2016b)  unifies unsupervised representation learning with clustering based on the CNN architecture to improve clustering accuracy, which can be considered as a neural extension of hierarchical clustering. DSC devises a dual autoencoder to embed data into latent space, and then deep spectral clustering(Shaham et al., 2018)  is applied to obtain label assignments(Yang et al.,  2019). ASPC-DA(Guo et al., 2019)  combines data augmentation with self-paced learning to encourage the learned features to be cluster-oriented. While sometimes they both evaluate performance in terms of accuracy, we would like to highlight that deep clustering and visual self-supervised learning (SSL) are two different research fields. SSL typically uses more powerful CNN architecture (applicable only to image data), and uses sophisticated techniques such as contrastive learning (He

