CI-VAE: a Class-Informed Deep Variational Autoencoder for Enhanced Class-Specific Data Interpolation

Abstract

We proposed Class-Informed Variational Autoencoder (CI-VAE) to enable interpolation between arbitrary pairs of observations of the same class. CI-VAE combines the general VAE architecture with a linear discriminator layer on the latent space to enforce the construction of a latent space such that observations from different classes are linearly separable. In conventional VAEs, class overlapping on the latent space usually occurs. However, in CI-VAE, the enforced linear separability of classes on the latent space allows for robust latent-space linear traversal and data generation between two arbitrary observations of the same class. Classspecific data interpolation has extensive potential applications in science, particularly in biology, such as uncovering the biological trajectory of diseases or cancer. We used the MNIST dataset of handwritten digits as a case study to compare the performance of CI-VAE and VAE in class-specific data augmentation. We showed that CI-VAE significantly improved class-specific linear traversal and data augmentation compared with VAE while maintaining comparable reconstruction error. In a study of Colon cancer genomics data, we showed that the interpolation between normal cells and tumor cells using CI-VAE may enhance our understanding of cancer development.



Suppose we would like to know how two observations transform from one to another. Through linear traversal on the latent space in VAEs, we can generate a trajectory and observe how this transformation may take place. A wide variety of applications can be benefited from this type of solution. For instance, one may attempt to uncover the disease mechanism of Parkinson's through understanding how a neuronal cell in a healthy brain tissue transformed to a neural cell with Parkinson's disease Hook et al. (2018); Blauwendraat et al. (2020) . Such investigations are often intended for a specific subset/class of the data. In a Parkinson's disease study, for example, we may intend to investigate the neuronal cells as a cell-type/class of the entire population of cells. Therefore, to perform linear traversal within neuronal cells, we need a latent space that is linearly separable among classes to ensure that, during linear traversal, there is no overlapping of classes (cell types in this example). With this motivation, we proposed Class-Informed Variational AutoEncoders (CI-VAE), a novel deep learning model architecture that includes an additional linear discriminator applied on the latent space, extending the capabilities of VAEs to form a latent space where observations from different 

2. AUTOENCODERS

Autoencoders are unsupervised neural network models, composed of an encoder that provides a deterministic mapping of data x into a lower-dimensional latent space z, followed by a decoder to reconstruct the data into its original form x (see figure 2a ). Among the applications of autoencoders are dimensionality reduction, data compression as well as data denoising Simonyan & Zisserman (2014); Kramer (1991) . The total cost function in Autoencoders are the reconstruction error, described by the distance between original input data x and the reconstructed observation x. J AE = L(x, x) In Variational Autoencoders (VAEs), however, instead of mapping data x to latent variable z, we map the input data x to the posterior distribution p(z|x). More specifically, with the assumption of the Gaussian distribution for p(z|x), in VAEs, there is a deterministic mapping of the input data x to the mean µ z|x and standard deviation ⌃ z|x to construct posterior distribution p(z|x). Followed by random sampling from the posterior p(z|x), latent variable z is constructed. Finally, through the decoder, the latent variable z is deterministically mapped to the reconstructed input data x An & Cho (2015); Kingma & Welling (2019). The total cost function of the variational autoencoder is comprised of 1) The reconstruction error and 2) The regularization term, enforcing the latent space distribution p(z|x) to be as close as to the prior p(z) = N ( 0, Ĩ) to enact orthogonality/independence of the z axis as well as to provide regularization for the network weights. This architecture provides the advantage of 1) projecting data into a lower-dimensional space and 2) approximating posterior distribution p(z|x) and 3) enabling the generation of new synthetic data



VAEs) have emerged as a popular unsupervised probabilistic neural network models Kingma & Welling (2013); Dilokthanakul et al. (2016); Hsu et al. (2017) with a variety of applications in computer vision Hsieh et al. (2018); Vahdat & Kautz (2020); Tabacof et al. (2016); Huang et al. (2018), natural language processing Wu et al. (2019); Bahuleyan et al. (2017); Semeniuta et al. (2017), genomics and precision medicine Grønbech et al. (2020); Minoura et al. (2021) and many other domains. In VAEs, through the encoder, a probabilistic latent representation of data in lower dimensional space is inferred. Besides many applications of dimensionality reduction using VAEs, these probabilistic models have proven to be very effective in synthetic data generation. Although VAEs are initially designed for unsupervised learning, several variants of VAEs have been designed in other domains such supervised learning and semi-supervised learning Kameoka et al. (2019); Gómez-Bombarelli et al. (2018); Ehsan Abbasnejad et al. (2017); Xu et al. (2017); Wu et al. (2019); Sohn et al. (2015); Higgins et al. (2016); Zhao et al. (2019).

Figure 1: a. Autoencoders(AEs): Input data x is deterministically encoded to a lower-dimensional latent variable z and then mapped via decoder to reconstruct the original data x. b. Variational Autoencoder (VAEs): Input data is mapped to the parameters (mean and standard deviation) of the probability function of lower-dimensional latent variable z. z is then forming the inferred probability function and the decoder maps the data back to its original form x. c. Class-Informed Variational Autoencoders (CI-VAEs): In addition to the VAE network, a linear discriminator network with trainable weights predicts classes of observations on the latent space, and its classification loss is added to the total loss function of the network model.

