ANALYZING THE LATENT SPACE OF GAN THROUGH LOCAL DIMENSION ESTIMATION Anonymous authors Paper under double-blind review

Abstract

The impressive success of style-based GANs (StyleGANs) in high-fidelity image synthesis has motivated research to understand the semantic properties of their latent spaces. Recently, a close relationship was observed between the semantically disentangled local perturbations and the local PCA components in W-space. However, understanding the number of disentangled perturbations remains challenging. Building upon this observation, we propose a local dimension estimation algorithm for an arbitrary intermediate layer in a pre-trained GAN model. The estimated intrinsic dimension corresponds to the number of disentangled local perturbations. In this perspective, we analyze the intermediate layers of the mapping network in StyleGANs. Our analysis clarifies the success of W-space in StyleGAN and suggests a method for finding an alternative. Moreover, the intrinsic dimension estimation opens the possibility of unsupervised evaluation of global-basis-compatibility and disentanglement for a latent space. Our proposed metric, called Distortion, measures an inconsistency of intrinsic tangent space on the learned latent space. The metric is purely geometric and does not require any additional attribute information. Nevertheless, the metric shows a high correlation with the global-basis-compatibility and supervised disentanglement score. Our work is the first step towards selecting the most disentangled latent space among various latent spaces in a GAN without attribute labels.

1. INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) have achieved remarkable success in generating realistic high-resolution images (Karras et al., 2018; 2019; 2020b; 2021; 2020a; Brock et al., 2018) . Nevertheless, understanding how GAN models represent the semantics of images in their latent spaces is still a challenging problem. To this end, several recent works investigated the disentanglement (Bengio et al., 2013) properties of the latent space in GAN (Goetschalckx et al., 2019; Jahanian et al., 2019; Plumerault et al., 2020; Shen et al., 2020) . In this work, we concentrate on finding a disentangled latent space in a pre-trained model. A latent space is called (globally) disentangled if there is a bijective correspondence between each semantic attribute and each axis of latent space when represented with the optimal basis. (See the appendix for detail.) The style-based GAN models (Karras et al., 2019; 2020b) have been popular in previous studies for identifying a disentangled latent space in a pre-trained model. First, the space of style vector, called W-space, was shown to provide a better disentanglement property compared to the latent noise space Z (Karras et al., 2019). After that, several attempts have been made to discover other disentangled latent spaces, such as W + -space (Abdal et al., 2019) and S-space (Wu et al., 2020) . However, their better disentanglement was assessed by the manual inspection (Karras et al., 2019; Abdal et al., 2019; Wu et al., 2020) or by the quantitative scores employing a pre-trained feature extractor (PPL (Karras et al., 2019) ) or an attribute annotator (Separability (Karras et al., 2019) and DCI metric (Eastwood & Williams, 2018; Wu et al., 2020) ). The manual inspection is vulnerable to sample dependency, and the quantitative scores depend on the pre-trained models and the set of selected target attributes. Therefore, we need an unsupervised quantitative evaluation scheme for the disentanglement of latent space that does not rely on pre-trained models. In this paper, we investigate the semantic property of a latent space by analyzing its geometrical property. In this regard, we propose a local intrinsic dimension estimation scheme for a learned

