PROPERTY CONTROLLABLE VARIATIONAL AUTOEN-CODER VIA INVERTIBLE MUTUAL DEPENDENCE

Abstract

Deep generative models have made important progress towards modeling complex, high dimensional data. Their usefulness is nevertheless often limited by a lack of control over the generative process or a poor understanding of the latent representation. To overcome these issues, attention is now focused on discovering latent variables correlated to the data properties and manipulating these properties. This paper presents the Property-controllable VAE (PCVAE), where a new Bayesian model is proposed to inductively bias the latent representation using explicit data properties via novel group-wise and property-wise disentanglement terms. Each data property corresponds seamlessly to a latent variable, by enforcing invertible mutual dependence between them. This allows us to move along the learned latent dimensions to control specific properties of the generated data with great precision. Quantitative and qualitative evaluations confirm that the PCVAE outperforms the existing models by up to 28% in capturing and 65% in manipulating the desired properties. The code for the proposed PCVAE is available at:https://github.com/xguo7/PCVAE.

1. INTRODUCTION

Important progress has been made towards learning the underlying low-dimensional representation and generative process of complex high dimensional data such as images (Pu et al., 2016) , natural languages (Bowman et al., 2016) , chemical molecules (Kadurin et al., 2017; Guo et al., 2019) and geo-spatial data (Zhao, 2020) via deep generative models. In recent years, a surge of research has developed new ways to further enhance the disentanglement and independence of the latent dimensions, creating models with better robustness, improved interpretability, and greater generalizability with inductive bias (see Figures 1(a ) and 1(b)) (Kingma et al., 2014; Kulkarni et al., 2015; Creswell et al., 2017) or without any bias (Higgins et al., 2017; Chen et al., 2018; Kumar et al., 2018) . Although it is generally assumed that the complex data is generated from the latent representations, their latent dimensions are typically not associated with physical meaning and hence cannot reflect real data generation mechanisms such as the relationships between structural and functional characteristics. A critical problem that remains unsolved is how to best identify and enforce the correspondence between the learned latent dimensions and key aspects of the data, such as the bio-physical properties of a molecule. Knowing such properties is crucial for many applications that depend on being able to interpret and control the data generation process with the desired properties. In an effort to achieve this, several researchers (Klys et al., 2018; Locatello et al., 2019b) have suggested methods that enforce a subset of latent dimensions correspond to targeted categorical properties, as shown in Figure 1(c ). Though the initial results have been encouraging, critical challenges remain unsolved such as: (1) Difficulty in handling continuous-valued properties. The control imposed on data generation limits existing techniques to categorical (typically binary) properties, to enable tractable model inference and sufficient coverage of the data. However, continuous-valued properties (e.g., the scale and light level of images) are also common in real world data, while their model inference usually can be easily intractable. Also, many cases require to generate data x refers to data, z and w refer to two subsets of latent variables, and y refers to the properties. with properties of which the values are unseen during training process. This cannot be achieved by conventional techniques such as conditional models without making strong assumption on the model distributions. (2) Difficulty in efficiently enhancing mutual independence among latent variables relevant and irrelevant to the properties. This problem requires to ensure that each property is only correlated to its corresponding latent variable(s) and independent of all the others. Directly enforcing such mutual independence inherently between all pairs of latent variables incurs quadratic number of optimization efforts. Hence an efficient way is imperative. (3) Difficulty in capturing and controlling correlated properties. It is feasible that several independent latent variables can capture multiple independent properties. But when the properties are correlated, they cannot be "one-on-one" mapped to corresponding independent latent variables anymore. However, correlated properties are commonly found in formatting a real world data.

To solve the above challenges, we propose a new model, Property-controllable VAE (PCVAE),

where a new Bayesian model is proposed to inductively bias the latent representation using explicit data properties via novel group-wise and property-wise disentanglement terms. Each data property is seamlessly linked to the corresponding latent variable by innovatively enforcing an invertible mutual dependence between them, as shown in Figure 1(d) . Hence, when generating data, the corresponding latent variables are manipulated to simultaneously control multiple desired properties without influencing the others. We have also further extended our model to handle inter-correlated properties. Our key contributions are summarized as follows: • A new Bayesian model that inductively biases the latent representation using explicit real data properties is proposed. A variational inference strategy and inference model have been customized to ensure effective Bayesian inference. • Group-wise and property-wise disentanglement terms are proposed to enhance the mutual independence among property, relevant and irrelevant latent variables . • The invertible mutual dependence between property-latent variable pair is achieved by enforcing an invertibility constraint over a residual-based decoder. • The quantitative and qualitative evaluation performed for this study revealed our PCVAE outperforms existing methods by up to 28% in capturing and 65% in manipulating the desired properties.

Disentanglement Representation

Learning. An important relevant area of research is disentangled representation learning (Alemi et al., 2017; Chen et al., 2018; Higgins et al., 2017; Kim & Mnih, 2018) , which structures the latent space by minimizing the mutual information between all pairs of latent variables. The goal here is to learn representations that separate out the underlying explanatory factors that are responsible for variations in the data, as these have been shown to be relatively resilient with respect to the complex variants involved (Bengio et al., 2013; Ma et al., 2019; Guo et al., 2020) , and thus can be used to enhance generalizability as well as improve robustness against adversarial attack. As noted by Locatello et al. (2019a) , it is impossible for disentangled representation learning to capture the desired properties without supervision and inductive biases. Learning latent representations via supervision. This ensures that the latent variables capture the desired properties though supervision, generally by directly defining properties as latent variables in the model (Locatello et al., 2019b) . Unfortunately, apart from providing an explicit variable for the labelled property, this yields no other easily interpretable structures, such as discovering latent variables that are correlated to the properties, as the model proposed in the current study does. This



Figure 1: While most existing models (e.g., Sub-figures (a) (Kingma et al., 2014; Kulkarni et al., 2015) and (b) (Creswell et al., 2017)) do not explicitly learn the correspondence between latent dimensions and data properties, some recent work (Sub-figures (c) (Klys et al., 2018) and (d)) has started to explore this. The generative model (right) and its model inference (left) are shown in each sub-figure. Dotted arrows represent the enforcement of independence and double arrows represent the invertible dependence between two variables.x refers to data, z and w refer to two subsets of latent variables, and y refers to the properties.

