HOW DO VARIATIONAL AUTOENCODERS LEARN? INSIGHTS FROM REPRESENTATIONAL SIMILARITY

Abstract

The ability of Variational Autoencoders (VAEs) to learn disentangled representations has made them popular for practical applications. However, their behaviour is not yet fully understood. For example, the questions of when they can provide disentangled representations, or suffer from posterior collapse are still areas of active research. Despite this, there are no layerwise comparisons of the representations learned by VAEs, which would further our understanding of these models. In this paper, we thus look into the internal behaviour of VAEs using representational similarity techniques. Specifically, using the CKA and Procrustes similarities, we found that the encoders' representations are learned long before the decoders', and this behaviour is independent of hyperparameters, learning objectives, and datasets. Moreover, the encoders' representations in all but the mean and variance layers are similar across hyperparameters and learning objectives.

1. INTRODUCTION

Variational Autoencoders (VAEs) are considered state-of-the-art techniques to learn unsupervised disentangled representations, and have been shown to be beneficial for fairness (Locatello et al., 2019a) . As a result, VAEs producing disentangled representations have been extensively studied in the last few years (Locatello et al., 2019b; Mathieu et al., 2019; Rolinek et al., 2019; Zietlow et al., 2021) , but they still suffer from poorly understood issues such as posterior collapse (Dai et al., 2020) . While some work using explainability techniques has been done to shed light on the behaviour of VAEs (Liu et al., 2020) , a comparison of the representations learned by different methods is still lacking (Zietlow et al., 2021) . Moreover, the layer-by-layer similarity of the representations within models has yet to be investigated. Fortunately, the domain of deep representational similarity is an active area of research and metrics such as SVCCA (Raghu et al., 2017; Morcos et al., 2018) , Procrustes distance (Schönemann, 1966) , or Centred Kernel Alignment (CKA) (Kornblith et al., 2019) have proven very useful in analysing the learning dynamics of various models (Wang et al., 2019; Kudugunta et al., 2019; Raghu et al., 2019; Neyshabur et al., 2020) , and even helped to design UDR (Duan et al., 2020) , an unsupervised metric for model selection for VAEs. The fact that good models are more similar to each other than bad ones in the context of classification (Morcos et al., 2018) generalised well to Unsupervised Disentanglement Ranking (UDR) (Rolinek et al., 2019; Duan et al., 2020) . However, such a generalisation may not always be possible, and without sufficient evidence, it would be wise to expect substantial differences between the representations learned by supervised and unsupervised models. In this paper, our aim is to take a first step toward investigating the representational similarity of generative models by analysing the similarity scores obtained for a variety of VAEs learning disentangled representations, and by providing some insights into why various VAE-specific methods preventing posterior collapse (Bowman et al., 2016; He et al., 2019) or providing better reconstruction (Liu et al., 2021) are successful. Our contributions are as follows: (i) We provide the first experimental study of the representational similarity between VAEs, and have released more than 45 million similarity scores (https://t.ly/0GLe3 foot_0 ). (ii) We have released the library created for this experiment (https://t.ly/VMIm). It can be reused with other similarity metrics or models for further research in the domain. (iii) During our analysis, we found that (1) the encoder is learned before the decoder; (2) all the layers of the encoder, except the mean and variance layers, learn very similar representations regardless of the learning objective and regularisation strength used; and (3) linear CKA could be an efficient tool to track posterior collapse.

2. BACKGROUND

2.1 VARIATIONAL AUTOENCODERS Variational Autoencoders (VAEs) (Kingma & Welling, 2014; Rezende & Mohamed, 2015) are deep probabilistic generative models based on variational inference. The encoder, q φ (z|x), maps some input x to a latent representation z, which the decoder, p θ (x|z), uses to attempt to reconstruct x. This can be optimised by maximising L, the evidence lower bound (ELBO) L(θ, φ; x) = E q φ (z|x) [log p θ (x|z)] reconstruction term -D KL (q φ (z|x)||p(z)) regularisation term , where p(z) is generally modelled as a multivariate Gaussian distribution N (0, I) to permit closed form computation of the regularisation term (Doersch, 2016) . We refer to the regularisation term of Equation 1 as regularisation in the rest of the paper, and we do not tune any other forms of regularisation (e.g., L1, dropout). While our goal is not to study disentanglement, our experiments will focus on a range of VAEs designed to disentangle (Higgins et al., 2017; Chen et al., 2018; Burgess et al., 2018; Kumar et al., 2018) because they possess useful properties: (1) posterior collapse situations are easy to create; (2) these models are non-identifiable (Khemakhem et al., 2020 ), but we are interested in verifying if the representations learned still retain some linear relationship between models. We refer the reader to Appendix A for more details on these models. Polarised regime and posterior collapse The polarised regime, also known as selective posterior collapse, is the ability of VAEs to "shut down" superfluous dimensions of their sampled latent representations while providing a high precision on the remaining ones (Rolinek et al., 2019; Dai et al., 2020) . The existence of the polarised regime is a necessary condition for the VAEs to provide a good reconstruction (Dai & Wipf, 2018; Dai et al., 2020) . However, when the weight on the regularisation term of the ELBO given in Equation 1 becomes too large, the representations collapse to the prior (Lucas et al., 2019a; Dai et al., 2020) . Recently, Bonheme & Grzes (2021) have also shown that the passive variables, which are "shut down" during training, are very different in mean and sampled representations (see Appendix B). This indicates that representational similarity could be a valuable tool in the study of posterior collapse.

2.2. REPRESENTATIONAL SIMILARITY METRICS

Representational similarity metrics aim to compare the geometric similarity between two representations. In the context of deep learning, these representations correspond to R n×p matrices of activations, where n is the number of data examples and p the number of neurons in a layer. Such metrics can provide various information on deep neural networks (e.g., the training dynamics of neural networks, common and specialised layers between models). Centred Kernel Alignment Centred Kernel Alignment (CKA) (Cortes et al., 2012; Cristianini et al., 2002) is a normalised version of the Hillbert-Schmit Independence Criterion (HSIC) (Gretton et al., 2005) . As its name suggests, it measures the alignment between the n × n kernel matrices of two representations, and works well with linear kernels (Kornblith et al., 2019) for representational similarity of centred layer activations. We thus focus on the linear CKA, also known as



Due to their size and to preserve anonymity, the 300 models trained for this paper will be released after the review.

