CONTINUAL UNSUPERVISED DISENTANGLING OF SELF-ORGANIZING REPRESENTATIONS

Abstract

Limited progress has been made in continual unsupervised learning of representations, especially in reusing, expanding, and continually disentangling learned semantic factors across data environments. We argue that this is because existing approaches treat continually-arrived data independently, without considering how they are related based on their underlying semantic factors. We address this by a new generative model describing a topologically-connected mixture of spikeand-slab distributions in the latent space, learned end-to-end in a continual fashion via principled variational inference. The learned mixture automatically discovers the active semantic factors underlying each data environment, and to accordingly accumulate their relational structure. This distilled knowledge can further be used for generative replay and guiding continual disentangling of sequentially-arrived semantic factors. We tested the presented method on a split version of 3DShapes to provide the quantitative disentanglement evaluation of continually learned representations, and further demonstrated its ability to continually disentangle new representations and improve shared downstream tasks in benchmark datasets.

1. INTRODUCTION

The progress in continual learning has been mostly made for supervised discriminative learning, whereas continual unsupervised representation learning remains relatively under-explored (Ramapuram et al., 2020; Achille et al., 2018; Rao et al., 2019) . The few existing works have primarily focused on battling catastrophic forgetting in the generative performance of a model: for instance, a common approach known as generative-replay synthesizes past samples using a snapshot of the generative model trained from past data, and then continually trains the model to generate both new data and synthesized past samples (Achille et al., 2018; Rao et al., 2019; Ramapuram et al., 2020) . There is however another important yet under-explored question in continual unsupervised representation learning: how to reuse, expand, and continually disentangle latent semantic factors across different data environments? These are inherent in the human learning process: while learning from new data (e.g., learning cars after bicycles), we are naturally able to reuse shared semantic factors without re-learning (e.g., wheels), expand and disentangle new semantic factors (e.g., the shape of cars), while accumulating knowledge about the relationship among data environments based on these semantic factors (e.g., bicycles and cars both have wheels but are different in shapes). Disentangled representation learning, as a long-standing research topic, has demonstrated various benefits in generative modeling and downstream tasks (Higgins et al., 2017; Kumar et al., 2017; Kim & Mnih, 2018; Liu et al., 2021; Rhodes & Lee, 2021; Horan et al., 2021) . With increasing recent interests in unsupervised representation learning in a continual learning setting (Rao et al., 2019; Madaan et al., 2021) , it is important to investigate the challenges and solutions to achieve disentanglement of sequentially-arrived semantic factors in streaming data. Reusing latent dimensions for learned semantic factors has mainly been attempted by a teacher-student like approach where the student model is taught to infer and generate similarly to a snapshot of the past models (teacher) on replayed data (Achille et al., 2018; Ramapuram et al., 2020) . In Achille et al. (2018) , this is further facilitated by explicitly masking out latent dimensions that are not actively used in a data environment. Such masks however have to be heuristically defined before training on Continually disentangling semantic factors, until now, is limited to the native disentangling ability inherent in VAE, or promoting the reusing of shared semantic factors (Achille et al., 2018; Ramapuram et al., 2020) . While the common strategy of generative replay teaches a model what latent dimensions to use for shared semantic factors on the replayed data (Achille et al., 2018; Ramapuram et al., 2020) , no such guidance is available on new data. As a result, as we will show, none of the existing approaches can prevent newly-learned semantic factors to be entangled with re-used ones. In this work, we show that the above limitations boil down to a fundamental bottleneck in existing continual unsupervised learning of representations: that the learner is asked to treat continually-arrived data independently, without knowing how they are related based on the underlying semantic factors. To overcome this, we argue that the model needs to learn two critical knowledge: latent dimensions explaining active semantic factors underlying each data environment, and the relationship among the latter based on the former. We present Continual Unsupervised Disentangling of self-Organizing representations (CUDOS) that is able to accumulate the relational structure of continually-arrived data based on their underlying active semantic factors, and exploiting this knowledge to guide disentangling of sequentially-arrived semantic factors. As illustrated in Fig. 1 , to accumulate the relational structure of the data, we model the latent representations with a topologically-connected mixture of distributions via Bayesian self-organizing maps (SOM) (Kohonen, 1990; Yin & Allinson, 2001) . To automatically discover active semantic factors underlying each data environment, we model each component of the SOM mixture with a spike-and-slab distribution (Titsias & Lazaro-Gredilla, 2011; Tonolini et al., 2020) , such that the sparse spike variable identifies latent dimensions explaining active semantic factors. This results in a generative model with a self-organizing mixture of slaband-spike distributions, where the distilled knowledge -the relational structure of data environments and their associated active semantic factors -supports 1) mixture-based generative replay and 2) continual disentangling of sequentially-arrived semantic factors. We evaluated CUDOS on both benchmark datasets for continual representation learning, and a split version of 3DShapes (Burgess & Kim, 2018) designed for quantitative evaluation of disentangling sequentially-arrived semantic factors. In comparison to existing works, we showed that CUDOS not only addressed catastrophic forgetting, but also improved -both quantitatively and qualitativelycontinual disentanglement of latent semantic factors and thereby downstream discriminative tasks.



Figure 1: Through a self-organizing spike-and-slab mixture, CUDOS continually distills knowledge about the relational structure of data environments with their shared and distinct semantic factors.

