A MIXTURE OF VARIATIONAL AUTOENCODERS FOR DEEP CLUSTERING Anonymous

Abstract

In this study, we propose a deep clustering algorithm that utilizes a variational autoencoder (VAE) framework with a multi encoder-decoder neural architecture. This setup enforces a complementary structure that guides the learned latent representations towards a more meaningful space arrangement. It differs from previous VAE-based clustering algorithms by employing a new generative model that uses multiple encoder-decoders. We show that this modeling results in both better clustering capabilities and improved data generation. The proposed method is evaluated on standard datasets and is shown to outperform state-of-the-art deep clustering methods significantly.

1. INTRODUCTION

Clustering is one of the most fundamental techniques used in unsupervised machine learning. It is the process of classifying data into several classes without using any label information. In the past decades, a plethora of clustering methods have been developed and successfully employed in various fields, including computer vision (Jolion et al., 1991) , natural language processing (Ngomo & Schumacher, 2009 ), social networks (Handcock et al., 2007) and medical informatics (Gotz et al., 2011) . The most well-known clustering approaches include the traditional k-means algorithm and the generative model, which assumes that the data points are generated from a Mixture-of-Gaussians (MoG), and the model parameters are learned via the Expectation-Maximization (EM) algorithm. However, using these methods over datasets that include high-dimensional data is problematic since, in these vector spaces, the inter-point distances become less informative. As a result, the respective methods have provided new opportunities for clustering (Min et al., 2018) . These methods incorporate the ability to learn a (non-linear) mapping of the raw features in a low-dimensional vector space that hopefully allow a more feasible application of clustering methods. Deep learning methods are expected to automatically discover the most suitable non-linear representations for a specified task. However, a straightforward implementation of "deep" k-means algorithm by jointly learning the embedding space and applying clustering to the embedded data, leads to a trivial solution, where the data feature vectors are collapsed into a single point in the embedded space, and thus, the k centroids are collapsed into a single spurious entity. For this reason, the objective function of many deep clustering methods is composed of both a clustering term computed in the embedded space and a regularization term in the form of a reconstruction error to avoid data collapsing. One broad family of successful deep clustering algorithms, which was shown to yield state-ofthe-art results, is the generative model-based methods. Most of these methods are based on the Variational Autoencoder framework (Kingma & Welling, 2014), e.g., Gaussian Mixture Variational Autoencoders (GMVAE) (Dilokthanakul et al., 2016) and Variational Deep Embedding (VaDE) . Instead of using an arbitrary prior to the latent variable, these algorithms proposed using specific distributions that will allow clustering at the bottleneck, such as MoG distributions. This design results in a VAE based training objective function that is composed of a significant reconstruction term and a second parameter regularization term, as discussed above. However, this objective seems to miss the clustering target since the reconstruction term is not related to the clustering, and actual clustering is only associated with the regularization term optimization. This might result in inferior clustering performance, degenerated generative model, and stability issues during training. We propose a solution to alleviate the issues introduced by previous deep clustering generative models. To that end, we propose the k-Deep Variational Auto Encoders (dubbed k-DVAE). Our

