CONTINUAL UNSUPERVISED DISENTANGLING OF SELF-ORGANIZING REPRESENTATIONS

Abstract

Limited progress has been made in continual unsupervised learning of representations, especially in reusing, expanding, and continually disentangling learned semantic factors across data environments. We argue that this is because existing approaches treat continually-arrived data independently, without considering how they are related based on their underlying semantic factors. We address this by a new generative model describing a topologically-connected mixture of spikeand-slab distributions in the latent space, learned end-to-end in a continual fashion via principled variational inference. The learned mixture automatically discovers the active semantic factors underlying each data environment, and to accordingly accumulate their relational structure. This distilled knowledge can further be used for generative replay and guiding continual disentangling of sequentially-arrived semantic factors. We tested the presented method on a split version of 3DShapes to provide the quantitative disentanglement evaluation of continually learned representations, and further demonstrated its ability to continually disentangle new representations and improve shared downstream tasks in benchmark datasets.

1. INTRODUCTION

The progress in continual learning has been mostly made for supervised discriminative learning, whereas continual unsupervised representation learning remains relatively under-explored (Ramapuram et al., 2020; Achille et al., 2018; Rao et al., 2019) . The few existing works have primarily focused on battling catastrophic forgetting in the generative performance of a model: for instance, a common approach known as generative-replay synthesizes past samples using a snapshot of the generative model trained from past data, and then continually trains the model to generate both new data and synthesized past samples (Achille et al., 2018; Rao et al., 2019; Ramapuram et al., 2020) . There is however another important yet under-explored question in continual unsupervised representation learning: how to reuse, expand, and continually disentangle latent semantic factors across different data environments? These are inherent in the human learning process: while learning from new data (e.g., learning cars after bicycles), we are naturally able to reuse shared semantic factors without re-learning (e.g., wheels), expand and disentangle new semantic factors (e.g., the shape of cars), while accumulating knowledge about the relationship among data environments based on these semantic factors (e.g., bicycles and cars both have wheels but are different in shapes). Disentangled representation learning, as a long-standing research topic, has demonstrated various benefits in generative modeling and downstream tasks (Higgins et al., 2017; Kumar et al., 2017; Kim & Mnih, 2018; Liu et al., 2021; Rhodes & Lee, 2021; Horan et al., 2021) . With increasing recent interests in unsupervised representation learning in a continual learning setting (Rao et al., 2019; Madaan et al., 2021) , it is important to investigate the challenges and solutions to achieve disentanglement of sequentially-arrived semantic factors in streaming data. Reusing latent dimensions for learned semantic factors has mainly been attempted by a teacher-student like approach where the student model is taught to infer and generate similarly to a snapshot of the past models (teacher) on replayed data (Achille et al., 2018; Ramapuram et al., 2020) . In Achille et al. (2018) , this is further facilitated by explicitly masking out latent dimensions that are not actively used in a data environment. Such masks however have to be heuristically defined before training on

