ANALYZING THE LATENT SPACE OF GAN THROUGH LOCAL DIMENSION ESTIMATION Anonymous authors Paper under double-blind review

Abstract

The impressive success of style-based GANs (StyleGANs) in high-fidelity image synthesis has motivated research to understand the semantic properties of their latent spaces. Recently, a close relationship was observed between the semantically disentangled local perturbations and the local PCA components in W-space. However, understanding the number of disentangled perturbations remains challenging. Building upon this observation, we propose a local dimension estimation algorithm for an arbitrary intermediate layer in a pre-trained GAN model. The estimated intrinsic dimension corresponds to the number of disentangled local perturbations. In this perspective, we analyze the intermediate layers of the mapping network in StyleGANs. Our analysis clarifies the success of W-space in StyleGAN and suggests a method for finding an alternative. Moreover, the intrinsic dimension estimation opens the possibility of unsupervised evaluation of global-basis-compatibility and disentanglement for a latent space. Our proposed metric, called Distortion, measures an inconsistency of intrinsic tangent space on the learned latent space. The metric is purely geometric and does not require any additional attribute information. Nevertheless, the metric shows a high correlation with the global-basis-compatibility and supervised disentanglement score. Our work is the first step towards selecting the most disentangled latent space among various latent spaces in a GAN without attribute labels.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have achieved remarkable success in generating realistic high-resolution images (Karras et al., 2018; 2019; 2020b; 2021; 2020a; Brock et al., 2018) . Nevertheless, understanding how GAN models represent the semantics of images in their latent spaces is still a challenging problem. To this end, several recent works investigated the disentanglement (Bengio et al., 2013) properties of the latent space in GAN (Goetschalckx et al., 2019; Jahanian et al., 2019; Plumerault et al., 2020; Shen et al., 2020) . In this work, we concentrate on finding a disentangled latent space in a pre-trained model. A latent space is called (globally) disentangled if there is a bijective correspondence between each semantic attribute and each axis of latent space when represented with the optimal basis. (See the appendix for detail.) The style-based GAN models (Karras et al., 2019; 2020b) have been popular in previous studies for identifying a disentangled latent space in a pre-trained model. First, the space of style vector, called W-space, was shown to provide a better disentanglement property compared to the latent noise space Z (Karras et al., 2019) . After that, several attempts have been made to discover other disentangled latent spaces, such as W + -space (Abdal et al., 2019) and S-space (Wu et al., 2020) . However, their better disentanglement was assessed by the manual inspection (Karras et al., 2019; Abdal et al., 2019; Wu et al., 2020) or by the quantitative scores employing a pre-trained feature extractor (PPL (Karras et al., 2019) ) or an attribute annotator (Separability (Karras et al., 2019) and DCI metric (Eastwood & Williams, 2018; Wu et al., 2020) ). The manual inspection is vulnerable to sample dependency, and the quantitative scores depend on the pre-trained models and the set of selected target attributes. Therefore, we need an unsupervised quantitative evaluation scheme for the disentanglement of latent space that does not rely on pre-trained models. In this paper, we investigate the semantic property of a latent space by analyzing its geometrical property. In this regard, we propose a local intrinsic dimension estimation scheme for a learned intermediate latent space in pre-trained GAN models. The local intrinsic dimension is the number of dimensions required to properly approximate the latent space locally (Fig 1a ). We discover this intrinsic dimension by estimating the robust rank of Jacobian of the subnetwork. The estimated dimension is interpreted as the number of disentangled local perturbations. Furthermore, the intrinsic dimension of latent manifold leads to an unsupervised quantitative score for the global disentanglement property. The experiments demonstrate that our proposed metric shows a high correlation with the global-basis-compatibility and supervised disentanglement score. (The global-basis-compatibility will be rigorously defined in Sec 4.) Our contributions are as follows: 1. We propose a local intrinsic dimension estimation scheme for an intermediate latent space in pre-trained GAN models. The scheme is derived from the rank estimation algorithm applied to the Jacobian matrix of a subnetwork. 2. We propose a layer-wise global disentanglement score, called Distortion, that measures the inconsistency of intrinsic tangent space. The proposed metric shows a high correlation with the global-basis-compatibility and supervised disentanglement score. 3. We analyze the intermediate layers of the mapping network through the proposed Distortion metric. Our analysis elucidates the superior disentanglement of W-space compared to the other intermediate layers and suggests a criterion for finding a similar-or-better alternative. (Karras et al., 2018) , the generator synthesizes an image by transforming a latent noise with a sequence of convolutional layers. On the other hand, the style-based generator consists of two subnetworks: mapping network f : Z → W and synthesis network g : R n0 × W L → X . The synthesis network is similar to conventional generators in that it is composed of a series of convolutional layers

2. RELATED WORKS

{g i } i=1,••• ,L . The key difference is that the synthesis network takes the learned constant feature y 0 ∈ R n0 at the first layer g 0 , and then adjusts the output image by injecting the layer-wise styles w and noise (Layer-wise noise is omitted for brevity.): y i = g i (y i-1 , w) with w = f (z) for i = 1, • • • , L, where the style vector w is attained by transforming a latent noise z via the mapping network f . Understanding Latent Semantics. et al., 2019; Abdal et al., 2019; Wu et al., 2020) or by quantitative metrics relying on pre-trained models (Karras et al., 2019; Wu et al., 2020) . Also, the previous works on (ii) are classified into local and global methods. The local methods find sample-wise perturbation directions (Ramesh et al., 2018; Patashnik et al., 2021; Abdal et al., 2021; Zhu et al., 2021; Choi et al., 2022b) . On the other hand, the global methods search layer-wise perturbation directions that perform the same semantic manipulation on the entire latent space (Härkönen et al., 2020; Shen & Zhou, 2021; Voynov & Babenko, 2020) . Throughout this paper, we refer to these local methods as local basis and these global methods as global basis. GANSpace (Härkönen et al., 2020) showed that the principal components obtained by PCA can serve as the global basis. SeFa (Shen & Zhou, 2021) suggested the singular vectors of the first weight parameter applied to latent noise as the global basis. These global basis showed promising results, but they were successful in a limited area. Depending on the sampled latent variables, these methods exhibited limited semantic factorization and sharp degradation of image fidelity Choi et al. (2022b; a) . In this regard, (Choi et al., 2022b) suggested the need for diagnosing a global-basis-compatibility of latent space. Here, the global-basis-compatibility means how well the optimal global basis can work on the target latent space.



The previous attempts to understand the semantic property of latent spaces in StyleGANs are categorized into two topics: (i) finding more disentangled latent space in a model; (ii) discovering meaningful perturbation directions in a latent space corresponding to disentangled semantics. Several studies on (i) suggested various disentangled latent spaces in StyleGAN models, for example, W (Karras et al., 2019), W + (Abdal et al., 2019), P N(Zhu et al.,  2020), and S-space(Wu et al., 2020). However, the superiority of the newly proposed latent space was demonstrated only through comparison with the previous latent spaces, not by selecting the best one among all candidates. Moreover, the comparison was conducted by manual inspections (Karras

