META-K: TOWARDS SELF-SUPERVISED PREDICTION OF NUMBER OF CLUSTERS

Abstract

Data clustering is a well-known unsupervised learning approach. Despite the recent advances in clustering using deep neural networks, determining the number of clusters without any information about the given dataset remains an existing problem. There have been classical approaches based on data statistics that require the manual analysis of a data scientist to calculate the probable number of clusters in a dataset. In this work, we propose a new method for unsupervised prediction of the number of clusters in a dataset given only the data without any labels. We evaluate our method extensively on randomly generated datasets using the scikit-learn package and multiple computer vision datasets and show that our method is able to determine the number of classes in a dataset effectively without any supervision.

1. INTRODUCTION

Clustering is an important task in machine learning, and it has a wide range of applications (Lung et al. (2004) ; Aminzadeh & Chatterjee (1984) ; Gan et al. (2007) ). Clustering often consists of two steps: the feature extraction step and the clustering step. There have been numerous works on clustering (Xu & Tian (2015) ), and among the proposed algorithms, K-Means (Bock (2007) ) is renowned for its simplicity and performance. Despite its popularity, K-Means has several shortcomings discussed in (Ortega et al. (2009); Shibao & Keyun (2007) ). In particular, with an increase in the dimensionality of the input data, K-Means' performance decreases (Prabhu & Anbazhagan (2011) ). This phenomenon is called the curse of dimensionality (Bellman (2015)). Dimensionality reduction and feature transformation methods have been used to minimize this effect. These methods map the original data into a new feature space, in which the new data-points are easier to be separated and clustered (Min et al. (2018)) . Some examples of existing data transformation methods are: PCA (Wold et al. (1987) ), kernel methods (Hofmann et al. (2008) ) and spectral methods (Ng et al. (2002) ). Although these methods are effective, a highly complex latent structure of data can still challenge them ( (Saul et al., 2006; Min et al., 2018) ). Due to the recent enhancements in deep neural networks ( Liu et al. (2017) ) and because of their inherent property of non-linear transformations, these architectures have the potential to replace classical dimensionality reduction methods. In the research field of deep clustering, popularized by the seminal paper "Unsupervised Deep Embedding for Clustering Analysis" (Xie et al. ( 2016)), deep neural networks are adopted as the feature extractor and are combined with a clustering algorithm to perform the clustering task. A unique loss function is defined which updates the model. Deep clustering methods typically take k, the number of clusters, as a hyper-parameter. In real-world scenarios, where datasets are not labeled, assigning a wrong value to this parameter can reduce the overall accuracy of the model. Meta-learning, a framework that allows a model to use information from its past tasks to learn a new task quickly or with little data, has been adopted by a handful of papers (Ferrari & de Castro (2012); Ferrari & De Castro (2015) ; Garg & Kalai (2018) ; Kim et al. (2019) ; Jiang & Verma (2019)) to improve the performance of clustering tasks. Closest to our work is the approach proposed by Garg & Kalai (2018) that tries to predict the number of clusters in K-Means clustering using meta-information. To solve the same issue, we propose Meta-k, a gradient-based method for finding the optimal number of clusters and an attempt to have a self-supervised approach for clustering. Our work is based on the observation that a network can take input points and learn parameters to predict the best number 1

