FINDING THE GLOBAL SEMANTIC REPRESENTATION IN GAN THROUGH FR ÉCHET MEAN

Abstract

The ideally disentangled latent space in GAN involves the global representation of latent space with semantic attribute coordinates. In other words, considering that this disentangled latent space is a vector space, there exists the global semantic basis where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. This semantic basis represents sample-independent meaningful perturbations that change the same semantic attribute of an image on the entire latent space. The proposed global basis, called Fréchet basis, is derived by introducing Fréchet mean to the local semantic perturbations in a latent space. Fréchet basis is discovered in two stages. First, the global semantic subspace is discovered by the Fréchet mean in the Grassmannian manifold of the local semantic subspaces. Second, Fréchet basis is found by optimizing a basis of the semantic subspace via the Fréchet mean in the Special Orthogonal Group. Experimental results demonstrate that Fréchet basis provides better semantic factorization and robustness compared to the previous methods. Moreover, we suggest the basis refinement scheme for the previous methods. The quantitative experiments show that the refined basis achieves better semantic factorization while constrained on the same semantic subspace given by the previous method.

1. INTRODUCTION

Generative Adversarial Networks (GANs, (Goodfellow et al., 2014) ) have achieved impressive success in high-fidelity image synthesis, such as ProGAN (Karras et al., 2018) , BigGAN (Brock et al., 2018), and StyleGANs (Karras et al., 2019; 2020a; b; 2021) . Interestingly, even when a GAN model is trained without any information about the semantics of data, its latent space often represents the semantic property of data (Radford et al., 2016; Karras et al., 2019) . To understand how GAN models represent the semantics, several studies investigated the disentanglement (Bengio et al., 2013) property of latent space in GANs (Goetschalckx et al., 2019; Jahanian et al., 2019; Plumerault et al., 2020; Shen et al., 2020) . Here, a latent space in GAN is called disentangled if there exists an optimal basis of the latent space where each basis coefficient corresponds to one disentangled semantics (generative factor). One approach to studying the disentanglement property is to find meaningful latent perturbations that induce the disentangled semantic variation on generated images (Ramesh et al., 2018; Härkönen et al., 2020; Shen & Zhou, 2021; Choi et al., 2022b) . This approach can be interpreted as investigating how the semantics are represented around each latent variable. We classify the previous works on meaningful latent perturbations into local and global methods depending on whether the proposed perturbation is sample-dependent or sample-ignorant. In this work, we focus on the global methods (Härkönen et al., 2020; Shen & Zhou, 2021) . If the latent space is ideally disentangled, the optimal semantic basis becomes the global semantic perturbation that represents a change in the same generative factor on the entire latent space. In this regard, these global methods are attempts to find the best-possible semantic basis on the target latent space. Throughout this work, the semantic subspace represents the subspace generated by the corresponding semantic basis.

