FINDING THE GLOBAL SEMANTIC REPRESENTATION IN GAN THROUGH FR ÉCHET MEAN

Abstract

The ideally disentangled latent space in GAN involves the global representation of latent space with semantic attribute coordinates. In other words, considering that this disentangled latent space is a vector space, there exists the global semantic basis where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. This semantic basis represents sample-independent meaningful perturbations that change the same semantic attribute of an image on the entire latent space. The proposed global basis, called Fréchet basis, is derived by introducing Fréchet mean to the local semantic perturbations in a latent space. Fréchet basis is discovered in two stages. First, the global semantic subspace is discovered by the Fréchet mean in the Grassmannian manifold of the local semantic subspaces. Second, Fréchet basis is found by optimizing a basis of the semantic subspace via the Fréchet mean in the Special Orthogonal Group. Experimental results demonstrate that Fréchet basis provides better semantic factorization and robustness compared to the previous methods. Moreover, we suggest the basis refinement scheme for the previous methods. The quantitative experiments show that the refined basis achieves better semantic factorization while constrained on the same semantic subspace given by the previous method.

1. INTRODUCTION

Generative Adversarial Networks (GANs, (Goodfellow et al., 2014) ) have achieved impressive success in high-fidelity image synthesis, such as ProGAN (Karras et al., 2018) , BigGAN (Brock et al., 2018), and StyleGANs (Karras et al., 2019; 2020a; b; 2021) . Interestingly, even when a GAN model is trained without any information about the semantics of data, its latent space often represents the semantic property of data (Radford et al., 2016; Karras et al., 2019) . To understand how GAN models represent the semantics, several studies investigated the disentanglement (Bengio et al., 2013) property of latent space in GANs (Goetschalckx et al., 2019; Jahanian et al., 2019; Plumerault et al., 2020; Shen et al., 2020) . Here, a latent space in GAN is called disentangled if there exists an optimal basis of the latent space where each basis coefficient corresponds to one disentangled semantics (generative factor). One approach to studying the disentanglement property is to find meaningful latent perturbations that induce the disentangled semantic variation on generated images (Ramesh et al., 2018; Härkönen et al., 2020; Shen & Zhou, 2021; Choi et al., 2022b) . This approach can be interpreted as investigating how the semantics are represented around each latent variable. We classify the previous works on meaningful latent perturbations into local and global methods depending on whether the proposed perturbation is sample-dependent or sample-ignorant. In this work, we focus on the global methods (Härkönen et al., 2020; Shen & Zhou, 2021) . If the latent space is ideally disentangled, the optimal semantic basis becomes the global semantic perturbation that represents a change in the same generative factor on the entire latent space. In this regard, these global methods are attempts to find the best-possible semantic basis on the target latent space. Throughout this work, the semantic subspace represents the subspace generated by the corresponding semantic basis. . Fréchet basis B s is discovered by selecting the optimal basis of S s using the Fréchet mean in the Special Orthogonal Group. In this paper, we propose an unsupervised method for finding the global semantic perturbations in a latent space in GAN, called Fréchet Basis. Fréchet Basis is based on the Fréchet mean on the Riemannian manifold. Fréchet mean is a generalization of centroid to the general metric space (Fréchet, 1948) and the Riemannian manifold is the metric space (Lee, 2013). In particular, Fréchet Basis is discovered in two steps (Fig 1 ). First, we find the global semantic subspace S s of latent space as Fréchet mean in the Grassmannian manifold (Boothby, 1986) of the intrinsic tangent spaces. Here, the intrinsic tangent space represent the local semantic subspace (Choi et al., 2022b) . Second, Fréchet basis B s is discovered by selecting the optimal basis of S s via Fréchet mean in the Special Orthogonal Group (Lang, 2012). Our experiments show that Fréchet basis provides better semantic factorization and robustness compared to the previous unsupervised global methods. Moreover, the second step in finding Fréchet basis provides the basis refinement scheme for the previous global methods. In our experiments, the basis refinement achieves better semantic factorization than the previous methods while keeping the same semantic subspace. Our contributions are as follows: 1. We propose unsupervised global semantic perturbations, called Fréchet basis. Fréchet basis is discovered by introducing Fréchet mean to the local semantic perturbations. 2. We show that Fréchet basis achieves better semantic factorization and robustness compared to the previous global approaches. 3. We propose the basis refinement scheme, which optimizes the semantic basis on the given semantic subspace. We can refine the previous global approaches by applying the basis refinement on their semantic subspaces.

2. RELATED WORKS AND BACKGROUND

Latent Perturbation for Image Manipulation The latent space of GANs often represents the semantics of data even when the model is trained without the supervision of the semantic attributes. Early approaches to understanding the semantic property of latent space showed that the vector arithmetic on latent space leads to the semantic arithmetic on the image space (Radford et al., 2016; Upchurch et al., 2017) . In this regard, a line of research has been conducted to find meaningful latent perturbations that perform image manipulation in disentangled semantics. (Choi et al., 2022b) . By contrast, the global methods offer the sample-independent meaningful perturbations for each latent space, e.g., Global directions in StyleCLIP (Patashnik et al., 2021) , GANSpace (Härkönen et al., 2020), and SeFa (Shen & Zhou, 2021) . Among them, StyleCLIP is a supervised method requiring text descriptions of each generated image to train CLIP model (Radford et al., 2021) . Throughout this paper, we investigate unsupervised methods such as GANSpace and



Figure 1: Overview of Fréchet basis. The global semantic subspace S s is defined as the Fréchet mean of intrinsic tangent spaces T wi W d W wi in the Grassmannian manifold Gr(d W , R d W ). Fréchet basis B s is discovered by selecting the optimal basis of S s using the Fréchet mean in the Special Orthogonal Group.

We categorize the previous works into local and global methods according to sample dependency. The local method finds meaningful perturbations for each latent variable for one semantic attribute, e.g., Ramesh et al. (2018), Latent Mapper in StyleCLIP (Patashnik et al., 2021), Attribute-conditioned normalizing flow in StyleFlow (Abdal et al., 2021), Local image editing in Zhu et al. (2021), and Local Basis

