RG-FLOW: A HIERARCHICAL AND EXPLAINABLE FLOW MODEL BASED ON RENORMALIZATION GROUP AND SPARSE PRIOR

Abstract

Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, called RG-Flow, which can separate information at different scales of images with disentangled representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representations at different scales enable semantic manipulation and style mixing of the images. To visualize the latent representations, we introduce receptive fields for flow-based models and find that the receptive fields learned by RG-Flow are similar to those in convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by a sparse prior distribution to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has O(log L) complexity for image inpainting compared to previous generative models with O(L 2 ) complexity.

1. INTRODUCTION

One of the most important unsupervised learning tasks is to learn the data distribution and build generative models. Over the past few years, various types of generative models have been proposed. Flow-based generative models are a particular family of generative models with tractable distributions (Dinh et al., 2017; Kingma & Dhariwal, 2018; Chen et al., 2018b; 2019; Behrmann et al., 2019; Hoogeboom et al., 2019; Brehmer & Cranmer, 2020; Rezende et al., 2020; Karami et al., 2019 ). Yet the latent variables are on equal footing and mixed globally. Here, we propose a new flow-based model, RG-Flow, which is inspired by the idea of renormalization group in statistical physics. RG-Flow imposes locality and hierarchical structure in bijective transformations. It allows us to access information at different scales in original images by latent variables at different locations, which offers better explainability. Combined with sparse priors (Olshausen & Field, 1996; 1997; Hyvärinen & Oja, 2000) , we show that RG-Flow achieves hierarchical disentangled representations. Renormalization group (RG) is a powerful tool to analyze statistical mechanics models and quantum field theories in physics (Kadanoff, 1966; Wilson, 1971) . It progressively extracts more coarse-scale statistical features of the physical system and decimates irrelevant fine-grained statistics at each scale. Typically, the local transformations used in RG are designed by human physicists and they are not bijective. On the other hand, the flow-based models use cascaded invertible global transformations to progressively turn a complicated data distribution into Gaussian distribution. Here, we would like to combine the key ideas from RG and flow-based models. The proposed RG-flow enables the machine to learn the optimal RG transformation from data, by constructing local invertible transformations and build a hierarchical generative model for the data distribution. Latent representations are introduced at different scales, which capture the statistical features at the corresponding scales. Together, the latent representations of all scales can be jointly inverted to generate the data. This method was recently proposed in the physics community as NeuralRG (Li & Wang, 2018; Hu et al., 2020) . Our main contributions are two-fold: First, RG-Flow can separate the signal statistics of different scales in the input distribution naturally, and represent information at each scale in its latent vari-ables z. Those hierarchical latent variables live on a hyperbolic tree. Taking CelebA dataset (Liu et al., 2015) as an example, the network will not only find high-level representations, such as the gender factor and the emotion factor for human faces, but also mid-level and low-level representations. To visualize representations of different scales, we adopt the concept of receptive field from convolutional neural networks (CNN) (LeCun, 1988; LeCun et al., 1989) and visualize the hidden structures in RG-flow. In addition, since the statistics are separated into a hierarchical fashion, we show that the representations can be mixed at different scales. This achieves an effect similar to style mixing. Second, we introduce the sparse prior distribution for latent variables. We find the sparse prior distribution is helpful to further disentangle representations and make them more explainable. The widely adopted Gaussian prior is rotationally symmetric. As a result, each of the latent variables in a flow model usually does not have a clear semantic meaning. By using a sparse prior, we demonstrate the clear semantic meaning in the latent space.

2. RELATED WORK

Some flow-based generative models also possess multi-scale latent space (Dinh et al., 2017; Kingma & Dhariwal, 2018) , and recently hierarchies of features have been utilized in Schirrmeister et al. (2020) , where the top-level feature is shown to perform strongly in out-of-distribution (OOD) detection task. Yet, previous models do not impose hard locality constraint in the multi-scale structure. In Appendix C, the differences between globally connected multi-scale flows and RG-Flow are discussed, and we see that semantic, meaningful receptive fields do not show up in the globally connected cases. Recently, other more expressive bijective maps have been developed (Hoogeboom et al., 2019; Karami et al., 2019; Durkan et al., 2019) , and those methods can be incorporated into the proposed structure to further improve the expressiveness of RG-Flow. Some other classes of generative models rely on a separate inference model to obtain the latent representation. Examples include variational autoencoders (Kingma & Welling, 2014), adversarial autoencoders (Makhzani et al., 2015) , InfoGAN (Chen et al., 2016), and BiGAN (Donahue et al., 2017; Dumoulin et al., 2017) . Those techniques typically do not use hierarchical latent variables, and the inference of latent variables is approximate. Notably, recent advances suggest that having hierarchical latent variables may be beneficial (Vahdat & Kautz, 2020) . In addition, the coarseto-fine fashion of the generation process has also been discussed in other generative models, such as Laplacian pyramid of adversarial networks (Denton et al., 2015) , and multi-scale autoregressive models (Reed et al., 2017) . Disentangled representations (Tenenbaum & Freeman, 2000; DiCarlo & Cox, 2007; Bengio et al., 2013) is another important aspect in understanding how a model generates images (Higgins et al., 2018) . Especially, disentangled high-level representations have been discussed and improved from information theoretical principles (Cheung et al., 2015; Chen et al., 2016; 2018a; Higgins et al., 2017; Kipf et al., 2020; Kim & Mnih, 2018; Locatello et al., 2019; Ramesh et al., 2018) . Apart from the high-level representations, the multi-scale structure also lies in the distribution of natural images. If a model can separate information of different scales, then its multi-scale representations can be used to perform other tasks, such as style transfer (Gatys et al., 2016; Zhu et al., 2017) , face mixing (Karras et al., 2019; Gambardella et al., 2019; Karras et al., 2020) , and texture synthesis (Bergmann et al., 2017; Jetchev et al., 2016; Gatys et al., 2015; Johnson et al., 2016; Ulyanov et al., 2016) . Typically, in flow-based generative models, Gaussian distribution is used as the prior for the latent space. Due to the rotational symmetry of Gaussian prior, an arbitrary rotation of the latent space would lead to the same likelihood. Sparse priors (Olshausen & Field, 1996; 1997; Hyvärinen & Oja, 2000) was proposed as an important tool for unsupervised learning and it leads to better explainability in various domains (Ainsworth et al., 2018; Arora et al., 2018; Zhang et al., 2019) . To break the symmetry of Gaussian prior and further improve the explainability, we introduce a sparse prior to flow-based models. Please refer to Figure 12 for a quick illustration on the difference between Gaussian prior and the sparse prior, where the sparse prior leads to better disentanglement. Renormalization group (RG) has a broad impact ranging from particle physics to statistical physics. Apart from the analytical studies in field theories (Wilson, 1971; Fisher, 1998; Stanley, 1999) , RG has also been useful in numerically simulating quantum states. The multi-scale entanglement renormalization ansatz (MERA) (Vidal, 2008; Evenbly & Vidal, 2014) implements the hierarchical structure of RG in tensor networks to represent quantum states. The exact holographic mapping (EHM)

