GENERATIVE AUTO-ENCODER: CONTROLLABLE SYN-THESIS WITH DISENTANGLED EXPLORATION

Abstract

Autoencoders are a powerful information compression framework, but it lacks generative ability. We explore whether an autoencoder can gain generative ability without invoking GAN-based adversarial training, which is notoriously hard to train and control. VAE provides one solution by adding Gaussian prior knowledge that can be sampled for synthesis but faces a low-quality problem. One naive way is directly through exploration in latent space such as interpolation; however, randomly interpolating the latent code may wander out of the acceptable distribution of the decoder, which leads to inferior output. Here we propose a new method: Disentangled Exploration Autoencoder (DEAE), which uses disentangled representation and regularization to guarantee the validity of exploration in latent space and achieve controllable synthesis. The encoder of DEAE first turns the input sample into a disentangled latent code, then explores the latent code space through directed interpolation. To aid the interpolated latent code in successfully outputting a meaningful sample, after the decoder, we regularize the output by 'reusing' the encoder to force the obtained latent representation to maintain perfect disentanglement, which implicitly improves the quality of the interpolated sample. The disentanglement and exploration can boost each other and form a positive loop that empowers DEAE's generative ability. Experiments demonstrate that DEAE can improve the performance of downstream tasks by synthesizing attribute-controllable augmented samples. We also demonstrate that DEAE can help to eliminate dataset bias, which provides a solution for fairness problems.

1. INTRODUCTION AND RELATED WORK

Autoencoders (Ballard (1987) ) usually map the samples from a high dimension space to a latent, low-dimensional space with minimum information loss and employ regularization function to help downstream tasks. The compressed low-dimensional space provides a high-level and compact representation to help understand the original dataset (Berthelot et al. (2018)) . It is interesting to analyze the relationship between the latent space and image space (Bengio et al. 



(2013)), especially how to empower the autoencoder to obtain generative ability. Prior work exists which turns an autoencoder into a generative model. These methods can be divided into two types: Adversarial training and VAEs. As is shown in Fig. 1 (a), Makhzani et al. (2015); Zhao et al. (2018) uses adversarial learning in latent space to control the output of encoder which have similar distribution as the real data. Berthelot et al. (2018) uses an adversarial regularizer to improve interpolation in autoencoders. Other methods Sainburg et al. (2018) merge adversarial learning and autoencoder by adding a discriminator to guide the quality of synthesized samples. However, adversarial-based learning suffers from unstable training and synthesis that is hard to control (Gulrajani et al. (2017)). As is shown in Fig. 1 (b), VAEs (Kingma & Welling (2014); Higgins et al. (2017); Chen et al. (2018)), provide another solution by adding prior Gaussian constraints in latent space, then using latent code sampling to synthesize new samples. Zhang et al. (2019) add latent distribution constraint, which yields sharper outputs. However, VAEs suffer from low-quality problems and convergence difficulty. Moreover, both types of methods struggle to achieve controllable semantic synthesis. Adversarialbased controllable synthesis is mostly achieved by interpreting the latent space by finding the boundary (Shen et al. (2020a); Yang et al. (2019)), which has a high cost; most of the controllable synthesis is then restricted to binary attribute values and one cannot precisely control the synthesized image without influencing the other attributes. VAE-based controllable synthesis achieved by disentangling

