MIXTURE REPRESENTATION LEARNING WITH COU-PLED AUTOENCODING AGENTS

Abstract

Jointly identifying a mixture of discrete and continuous factors of variability can help unravel complex phenomena. We study this problem by proposing an unsupervised framework called coupled mixture VAE (cpl-mixVAE), which utilizes multiple interacting autoencoding agents. The individual agents operate on augmented copies of training samples to learn mixture representations, while being encouraged to reach consensus on the categorical assignments. We provide theoretical justification to motivate the use of a multi-agent framework, and formulate it as a variational inference problem. We benchmark our approach on MNIST and dSprites, achieving state-of-the-art categorical assignments while preserving interpretability of the continuous factors. We then demonstrate the utility of this approach in jointly identifying cell types and type-specific, activity-regulated genes for a single-cell gene expression dataset profiling over 100 cortical neuron types.

1. INTRODUCTION

Complex phenomena can be attributed to a mixture of discrete and continuous factors of variability. Such complexity is crucial to understand in a variety of different contexts, from learning models for image datasets to identifying factors underlying neuronal identity. A common approach to study these phenomena is clustering, which can produce representations that jointly capture the dependence on discrete and continuous factors. Generative models can learn such representations, which has recently received attention from the deep learning community. Deep Gaussian mixture models are among the first deep generative models to jointly represent discrete and continuous factors, in which a continuous representation is decomposed into discrete clusters (Johnson et al., 2016; Dilokthanakul et al., 2016; Jiang et al., 2017) . However, such models have mainly focused on clustering without regard to interpretability. Adversarial and variational methods have been proposed to learn mixture representations that can identify interpretable continuous factors. While adversarial learning, e.g. InfoGAN (Chen et al., 2016) is susceptible to stability issues (Kim & Mnih, 2018; Dupont, 2018; Jeong & Song, 2019) , variational approaches, e.g. JointVAE and CascadeVAE have produced promising and more stable results (Dupont, 2018; Jeong & Song, 2019) . However, such variational methods utilizing a single autoencoding agent rely either on a heuristic data-dependent embedding capacity, or on solving a separate optimization problem for the discrete variable. Thus, learning interpretable and stable mixture representations remains challenging. We introduce a multi-agent variational framework to jointly infer discrete and continuous factors through collective decision making, while sidestepping heuristic approaches used by single-agent frameworks. Coupling of autoencoding agents has been previously studied in the context of multimodal recordings, where each agent learns a continuous latent representation for one of the data modalities (Feng et al., 2014; Gala et al., 2019) . Here, we propose pairwise-coupled autoencoders to learn a mixture representation for a single data modality in an unsupervised fashion. Each autoencoder agent receives an augmented copy of the given sample with the same class label. To achieve this, we design a novel type-preserving augmentation that generates noisy copies of the data using withinclass variabilities, while preserving its class identity. Coupling across the agents is achieved by encouraging categorical variables to be invariant under the augmentation, which regularizes the agents to learn interpretable representations. We demonstrate that such a coupled multi-agent architecture can increase inference accuracy and robustness by exploiting within-cluster variabilities, without requiring a prior distribution on the relative abundances of categories. Our contributions can be summarized as follows: (i) We first provide theoretical justification to motivate the advantage of collective decision making for more accurate categorical assignments,

