MICE: MIXTURE OF CONTRASTIVE EXPERTS FOR UN-SUPERVISED IMAGE CLUSTERING

Abstract

We present Mixture of Contrastive Experts (MiCE), a unified probabilistic clustering framework that simultaneously exploits the discriminative representations learned by contrastive learning and the semantic structures captured by a latent mixture model. Motivated by the mixture of experts, MiCE employs a gating function to partition an unlabeled dataset into subsets according to the latent semantics and multiple experts to discriminate distinct subsets of instances assigned to them in a contrastive learning manner. To solve the nontrivial inference and learning problems caused by the latent variables, we further develop a scalable variant of the Expectation-Maximization (EM) algorithm for MiCE and provide proof of the convergence. Empirically, we evaluate the clustering performance of MiCE on four widely adopted natural image datasets. MiCE achieves significantly better results 1 than various previous methods and a strong contrastive learning baseline.

1. INTRODUCTION

Unsupervised clustering is a fundamental task that aims to partition data into distinct groups of similar ones without explicit human labels. Deep clustering methods (Xie et al., 2016; Wu et al., 2019) exploit the representations learned by neural networks and have made large progress on high-dimensional data recently. Often, such methods learn the representations for clustering by reconstructing data in a deterministic (Ghasedi Dizaji et al., 2017) or probabilistic manner (Jiang et al., 2016) , or maximizing certain mutual information (Hu et al., 2017; Ji et al., 2019 ) (see Sec. 2 for the related work). Despite the recent advances, the representations learned by existing methods may not be discriminative enough to capture the semantic similarity between images. The instance discrimination task (Wu et al., 2018; He et al., 2020) in contrastive learning has shown promise in pre-training representations transferable to downstream tasks through fine-tuning. Given that the literature (Shiran & Weinshall, 2019; Niu et al., 2020) shows improved representations can lead to better clustering results, we hypothesize that instance discrimination can improve the performance as well. A straightforward approach is to learn a classical clustering model, e.g. spherical k-means (Dhillon & Modha, 2001), directly on the representations pre-trained by the task. Such a two-stage baseline can achieve excellent clustering results (please refer to Tab. 1). However, because of the independence of the two stages, the baseline may not fully explore the semantic structures of the data when learning the representations and lead to a sub-optimal solution for clustering. To this end, we propose Mixture of Contrastive Experts (MiCE), a unified probabilistic clustering method that utilizes the instance discrimination task as a stepping stone to improve clustering. In particular, to capture the semantic structure explicitly, we formulate a mixture of conditional models by introducing latent variables to represent cluster labels of the images, which is inspired by the mixture of experts (MoE) formulation. In MiCE, each of the conditional models, also called an expert, learns to discriminate a subset of instances, while an input-dependent gating function partitions the dataset into subsets according to the latent semantics by allocating weights among experts. Further, we develop a scalable variant of the Expectation-Maximization (EM) algorithm (Dempster et al., 

