A GENERIC PARAMETERIZATION METHOD FOR UNSUPERVISED LEARNING

Abstract

We introduce a parameterization method called Neural Bayes which allows computing statistical quantities that are in general difficult to compute and opens avenues for formulating new objectives for unsupervised representation learning. Specifically, given an observed random variable x and a latent discrete variable z, we can express p(x|z), p(z|x) and p(z) in closed form in terms of a sufficiently expressive function (Eg. neural network) using our parameterization without restricting the class of these distributions. To demonstrate its usefulness, we develop two independent use cases for this parameterization: 1. Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution. This can be seen as a specific form of clustering where each disjoint manifold in the support is a separate cluster. We design clustering tasks that obey this formulation and empirically show that the model optimally labels the disjoint manifolds. 2. Mutual Information Maximization (MIM): MIM has become a popular means for self-supervised representation learning. Neural Bayes allows us to compute mutual information between observed random variables x and latent discrete random variables z in closed form. We use this for learning image representations and show its usefulness on downstream classification tasks.

1. INTRODUCTION

We introduce a generic parameterization called Neural Bayes that facilitates unsupervised learning from unlabeled data by categorizing them. Specifically, our parameterization implicitly maps samples from an observed random variable x to a latent discrete space z where the distribution p(x) gets segmented into a finite number of arbitrary conditional distributions. Imposing different conditions on the latent space z through different objective functions will result in learning qualitatively different representations. Our parameterization may be used to compute statistical quantities involving observed variables and latent variables that are in general difficult to compute (thanks to the discrete latent space), thus providing a flexible framework for unsupervised learning. To illustrate this aspect, we develop two independent use cases for this parameterization-disjoint manifold separation (DMS) and mutual information maximization (Linsker, 1988) , as described in the abstract. For the manifold separation task, we show experiments on 2D datasets and their high-dimensional counter-parts designed as per the problem formulation, and show that the proposed objective can optimally label disjoint manifolds. For the MIM task, we experiment with benchmark image datasets and show that the unsupervised representation learned by the network achieves performance on downstream classification tasks comparable with a closely related MIM method Deep InfoMax (DIM, (Hjelm et al., 2019) ). For both objectives we design regularizations necessary to achieve the desired behavior in practice. All the proofs can be found in the appendix.

2. RELATED WORK

Neural Bayes-DMS: Numerous recent papers have proposed clustering algorithm for unsupervised representation learning such as Deep Clustering (Caron et al., 2018) , information based clustering (Ji et al., 2019 ), Spectral Clustering (Shaham et al., 2018) , Assosiative Deep Clustering (Haeusser

