FINDING THE GLOBAL SEMANTIC REPRESENTATION IN GAN THROUGH FR ÉCHET MEAN

Abstract

The ideally disentangled latent space in GAN involves the global representation of latent space with semantic attribute coordinates. In other words, considering that this disentangled latent space is a vector space, there exists the global semantic basis where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. This semantic basis represents sample-independent meaningful perturbations that change the same semantic attribute of an image on the entire latent space. The proposed global basis, called Fréchet basis, is derived by introducing Fréchet mean to the local semantic perturbations in a latent space. Fréchet basis is discovered in two stages. First, the global semantic subspace is discovered by the Fréchet mean in the Grassmannian manifold of the local semantic subspaces. Second, Fréchet basis is found by optimizing a basis of the semantic subspace via the Fréchet mean in the Special Orthogonal Group. Experimental results demonstrate that Fréchet basis provides better semantic factorization and robustness compared to the previous methods. Moreover, we suggest the basis refinement scheme for the previous methods. The quantitative experiments show that the refined basis achieves better semantic factorization while constrained on the same semantic subspace given by the previous method.

1. INTRODUCTION

Generative Adversarial Networks (GANs, (Goodfellow et al., 2014) ) have achieved impressive success in high-fidelity image synthesis, such as ProGAN (Karras et al., 2018) , BigGAN (Brock et al., 2018) , and StyleGANs (Karras et al., 2019; 2020a; b; 2021) . Interestingly, even when a GAN model is trained without any information about the semantics of data, its latent space often represents the semantic property of data (Radford et al., 2016; Karras et al., 2019) . To understand how GAN models represent the semantics, several studies investigated the disentanglement (Bengio et al., 2013) property of latent space in GANs (Goetschalckx et al., 2019; Jahanian et al., 2019; Plumerault et al., 2020; Shen et al., 2020) . Here, a latent space in GAN is called disentangled if there exists an optimal basis of the latent space where each basis coefficient corresponds to one disentangled semantics (generative factor). One approach to studying the disentanglement property is to find meaningful latent perturbations that induce the disentangled semantic variation on generated images (Ramesh et al., 2018; Härkönen et al., 2020; Shen & Zhou, 2021; Choi et al., 2022b) . This approach can be interpreted as investigating how the semantics are represented around each latent variable. We classify the previous works on meaningful latent perturbations into local and global methods depending on whether the proposed perturbation is sample-dependent or sample-ignorant. In this work, we focus on the global methods (Härkönen et al., 2020; Shen & Zhou, 2021) . If the latent space is ideally disentangled, the optimal semantic basis becomes the global semantic perturbation that represents a change in the same generative factor on the entire latent space. In this regard, these global methods are attempts to find the best-possible semantic basis on the target latent space. Throughout this work, the semantic subspace represents the subspace generated by the corresponding semantic basis. . Fréchet basis B s is discovered by selecting the optimal basis of S s using the Fréchet mean in the Special Orthogonal Group. In this paper, we propose an unsupervised method for finding the global semantic perturbations in a latent space in GAN, called Fréchet Basis. Fréchet Basis is based on the Fréchet mean on the Riemannian manifold. Fréchet mean is a generalization of centroid to the general metric space (Fréchet, 1948) and the Riemannian manifold is the metric space (Lee, 2013). In particular, Fréchet Basis is discovered in two steps (Fig 1 ). First, we find the global semantic subspace S s of latent space as Fréchet mean in the Grassmannian manifold (Boothby, 1986) of the intrinsic tangent spaces. Here, the intrinsic tangent space represent the local semantic subspace (Choi et al., 2022b) . Second, Fréchet basis B s is discovered by selecting the optimal basis of S s via Fréchet mean in the Special Orthogonal Group (Lang, 2012) . Our experiments show that Fréchet basis provides better semantic factorization and robustness compared to the previous unsupervised global methods. Moreover, the second step in finding Fréchet basis provides the basis refinement scheme for the previous global methods. In our experiments, the basis refinement achieves better semantic factorization than the previous methods while keeping the same semantic subspace. Our contributions are as follows: 1. We propose unsupervised global semantic perturbations, called Fréchet basis. Fréchet basis is discovered by introducing Fréchet mean to the local semantic perturbations. 2. We show that Fréchet basis achieves better semantic factorization and robustness compared to the previous global approaches. 3. We propose the basis refinement scheme, which optimizes the semantic basis on the given semantic subspace. We can refine the previous global approaches by applying the basis refinement on their semantic subspaces.

2. RELATED WORKS AND BACKGROUND

Latent Perturbation for Image Manipulation The latent space of GANs often represents the semantics of data even when the model is trained without the supervision of the semantic attributes. Early approaches to understanding the semantic property of latent space showed that the vector arithmetic on latent space leads to the semantic arithmetic on the image space (Radford et al., 2016; Upchurch et al., 2017) . In this regard, a line of research has been conducted to find meaningful latent perturbations that perform image manipulation in disentangled semantics. We categorize the previous works into local and global methods according to sample dependency. The local method finds meaningful perturbations for each latent variable for one semantic attribute, e.g., Ramesh et al. (2018) , Latent Mapper in StyleCLIP (Patashnik et al., 2021) , Attribute-conditioned normalizing flow in StyleFlow (Abdal et al., 2021) , Local image editing in Zhu et al. (2021) , and Local Basis (Choi et al., 2022b) . By contrast, the global methods offer the sample-independent meaningful perturbations for each latent space, e.g., Global directions in StyleCLIP (Patashnik et al., 2021) , GANSpace (Härkönen et al., 2020) , and SeFa (Shen & Zhou, 2021) . Among them, StyleCLIP is a supervised method requiring text descriptions of each generated image to train CLIP model (Radford et al., 2021) . Throughout this paper, we investigate unsupervised methods such as GANSpace and SeFa. GANSpace (Härkönen et al., 2020) suggested the principal components of latent space obtained by performing PCA as global meaningful perturbations. SeFa (Shen & Zhou, 2021) proposed the singular vectors of the first weight matrix as global disentangled perturbations. Unsupervised global disentanglement score The disentanglement of latent space is expressed as the correspondence between the semantic attributes of data and the axes of latent space. Because the definition of disentanglement depends on attributes, most of the existing disentanglement metrics for latent spaces are supervised ones, e.g., DCI score (Eastwood & Williams, 2018) , β-VAE metric (Higgins et al., 2017) , and FactorVAE metric (Kim & Mnih, 2018) . They require the attribute annotations of generated images. This requirement restricts the broad applicability of disentanglement evaluation on real datasets. To address this restriction, Choi et al. (2022a) proposed an unsupervised global disentanglement score, called Distortion. Distortion metric measures the variation of tangent space on the learned latent manifold W. Hence, Distortion metric relies purely on the geometric property of the latent space and does not require attribute labels. Background Choi et al. (2022b) suggested a framework for analyzing the semantic property of intermediate latent space by its local geometry. This analysis is performed on the learned latent manifold W = f (Z), where f denotes the subnetwork f from the input noise space Z to the target latent space. Here, we assume Z is the entire Euclidean space, i.e., Z = R d Z for some d Z . Note that this is satisfied for the usual Gaussian prior p(z) = N (0, I d Z ). Choi et al. (2022b) proposed a method for finding the k-dimensional local approximation W k w of W around w = f (z) ∈ W.

This local approximation W k

w is discovered by the low-rank approximation problem of df z and this solution is given by SVD. Then, W k w is given as follows: For the i-th singular vector u z i ∈ R d Z , v w i ∈ R d W , and i-th singular value σ z i ∈ R of df z with σ z 1 ≥ • • • ≥ σ z m and m = min(d Z , d W ), df z (u z i ) = σ z i • v w i for ∀i, Local Basis(w = f (z)) = {v w i } 1≤i≤n , W k w = f z + i t i • u z i | t i ∈ (-ϵ i , ϵ i ), for 1 ≤ i ≤ k , T w W k w = span{v w i : 1 ≤ i ≤ k}. (3) Choi et al. (2022b) showed that the codomain singular vectors, called Local Basis (Eq 1), serve as the local semantic perturbations around w. In this respect, the tangent space at w represents the local semantic subspace because it is spanned the local semantic perturbations (Eq 3). Upon this framework, Choi et al. (2022a) proposed the intrinsic local dimension estimation scheme for the latent manifold W through the robust rank estimate (Kritchman & Nadler, 2008) of df z . Geometrically, the intrinsic local dimension represents the number of dimensions required to properly approximate the denoised W. Choi et al. (2022a) showed that this local dimension corresponds to the number of local semantic perturbations. Using this correspondence, Choi et al. (2022a) introduced the unsupervised disentanglement score called Distortion. Distortion metric is defined as the normalized variation of intrinsic tangent space on the latent manifold. The normalized variation is expressed as the ratio of the distance between two tangent spaces at two random w ∈ W (Eq 4) to the distance between two tangent spaces at two close w (Eq 5). The distance between tangent spaces is measured by the dimension-normalized Geodesic Metric d k geo Choi et al. (2022a) in Grassmannian manifold Boothby (1986) . Specifically, Distortion of W is defined as D W = I rand /I local with I rand = E zi∼p(z),wi=f (zi) d k geo T w1 W k w1 , T w2 W k w2 for k = min(k 1 , k 2 ) , I local = E z1∼p(z),|z2-z1|=ϵ d k geo T w1 W k w1 , T w2 W k w2 for k = min(k 1 , k 2 ) . ( ) where k i denotes the local dimension estimate at w i = f (z i ). Interestingly, although Distortion metric does not exploit any semantic information, Distortion metric provides a strong correlation between the supervised disentanglement score and the global-basis-compatibility (Choi et al., 2022a) .

3. FR ÉCHET MEAN GLOBAL BASIS

In this section, we propose an unsupervised method for finding global linear perturbation directions that make the same semantic manipulation on the entire latent space, called Fréchet basis. If we have such global meaningful perturbations, the vector space representation of latent space along these semantic basis provides the global semantic representation of a model. The proposed scheme is based on finding the Fréchet mean (Fréchet, 1948; Karcher, 1977) of the local disentangled perturbations. The scheme is in two steps. First, the optimal subspace representing the global semantics is discovered by the Fréchet mean in the Grassmannian manifold (Boothby, 1986) of intrinsic tangent spaces (Choi et al., 2022b) on the target latent space. Second, the optimal basis is obtained as the Fréchet mean in the Special Orthogonal Group (Lang, 2012) of the projected local disentangled perturbations.

3.1. METHOD

Notation Throughout this work, we follow the notation presented in Sec 2. Let W = R d W be a ambient target latent space where we want to find global semantic perturbations. We analyze the learned latent manifold W = f (Z) ⊂ W embedded in this latent space, which is given as an image of the subnetwork from the input noise Z to W. Motivation We investigate the problem of discovering global semantic perturbations through the local geometry of learned latent manifold W. Recently, Choi et al. (2022a) discovered that the intrinsic tangent space T w W k w at each w ∈ W represents the local semantic variation of the generated image from w. Specifically, the intrinsic local dimension at w, denoted as k in T w W k w , corresponds to the number of local semantic perturbations. The top-k components of Local Basis Choi et al. (2022b) are these local semantic perturbations and are the basis vectors of T w W k w (Eq 3). Hence, the intrinsic tangent space T w W k w describes the local semantic variation of an image because it is spanned by the local meaningful perturbations. In this regard, we interpret the global semantic variation as the mean of these local semantic variations. One of the most popular methods for defining the mean of subspaces is through Fréchet mean (Marrinan et al., 2014) . Fréchet mean is a generalization of the mean in vector space to the general metric space. The mean of vectors is the minimizer of the sum of squared distances to each vector. Similarly, Fréchet mean µ f r in the metric space X with metric d is defined as the minimizer of squared metrics, i.e., for x 1 , x 2 , . . . , x n ∈ X, µ f r = arg min µ∈X 1≤i≤n d (µ, x i ) 2 . ( ) In particular, a Riemannian manifold is an example of metric space where we can introduce the Fréchet mean (Lou et al., 2020) . In this work, we utilize the Grassmannian manifold (Boothby, 1986) to find the subspace of latent space for the global semantic representation and the Special Orthogonal Group (Lang, 2012) to choose the optimal basis on it.

3.1.1. GLOBAL SEMANTIC SUBSPACE

Our goal is to find a Riemannian manifold where we can embed these intrinsic tangent spaces {T wi W ki wi } 1≤i≤n describing local semantic variations at each w i . The Grassmannian manifold Gr(k, V ) denotes the set of k-dimensional linear subspaces of vector space V (Boothby, 1986). Hence, these tangent spaces can be embedded to one Grassmannian manifold Gr d W , R d W by matching their dimensions to the dimension of learned latent manifold d W . Specifically, we match the dimensions of tangent spaces by refining or extending them to the subspaces spanned by the top-d W components of Local Basis (Eq 3). This is equivalent to approximating W with the d W -dimensional local estimate W d W w at all w ∈ W (Eq 2). We estimate the layer-wise dimension d W of learned latent manifold W by averaging local dimensions {k i } 1≤i≤n of n i.i.d. samples, T wi W d W wi = span{v wi i : 1 ≤ i ≤ d W } ∈ Gr d W , R d W , where v wi i denotes Local Basis at w i (Eq 1). Then, we define the global semantic subspace S s of W as the Fréchet mean on Gr d W , R d W with the geodesic metric d geo (Ye & Lim, 2016) : S s = arg min µ∈Gr(d W ,R d W ) 1≤i≤n d geo µ, T wi W d W wi 2 . (8) Here, the geodesic metric d geo is defined as d geo (W, W ′ ) = k i=1 θ 2 i 1/2 for W, W ′ ∈ Gr(k, R n ) where θ i denotes the i-th principal angle between W and W ′ . That is, θ i = cos -1 (σ i (M ⊤ W M W ′ )) where M W ∈ R n×k denotes the column-wise concatenation of orthonormal basis for W . For optimization, we used the gradient descent algorithm in the Pymanopt (Townsend et al., 2016) .

3.1.2. GLOBAL SEMANTIC BASIS

The aim of this work is to find global meaningful perturbations that represent the disentangled semantics. However, the global semantic subspace S s is discovered by solving an optimization problem in the Grassmannian manifold, the set of subspaces. We need an additional step to find a specific basis on S s . In particular, we utilize the Fréchet mean on the Special Orthogonal Group. Why Special Orthogonal Group Let the columns of M S , M ′ S ∈ R d W ×d W be the two distinct orthonormal basis of S s . Then, there exists an orthogonal matrix O ∈ R d W ×d W , i.e., O ⊤ O = OO ⊤ = I, which satisfies M ′ S = M S O. Therefore, finding the optimal basis of S s is equivalent to finding the orthogonal matrix O given the initial M S . The Special Orthogonal Group SO(n) consists of n × n orthogonal matrices with determinant +1 (Lang, 2012) . Note that the determinant of an arbitrary orthogonal matrix is +1 or -1. We consider SO(n) instead of the orthogonal matrices for two reasons. First, our task is independent of the flipping ((-1)-multiplication) of each basis component. The positive perturbation along v is identical to the negative perturbation along -v. The flipping of a basis component in M ′ S leads to the flipping of the corresponding column in O. This results in the (-1)-multiplication at the determinant of O. Therefore, without loss of generality, we may assume that O is a special orthogonal matrix. Second, the Orthogonal Group is disconnected while the Special Orthogonal Group is connected. Hence, the Orthogonal Group is inadequate for finding the Fréchet mean, which is optimized by the gradient descent algorithm. Basis Refinement We propose the optimization scheme for finding the global semantic basis from the global semantic subspace S s . Here, we denote the column-wise concatenation of local semantic basis at each w i as M wi ∈ R d W ×d W , i.e., each column is the top-d W Local Basis at w ∈ W. Note that the column space of M wi is the local semantic subspace T wi W d W wi . Likewise, M S refers to an initial orthonormal basis of S s . As an overview, the proposed scheme is as follows: Before the above optimization, we preprocess each local semantic basis at w i , i.e., the columns of M wi , to be positively aligned to each column of M S , i.e., ⟨M wi [:, i], M S [:, i]⟩ > 0 for all i. (i) As a first step, we project each local semantic basis at w i onto the global semantic subspace S s , i.e., M ⊤ S M wi . (ii) Then, the matrix of projected local semantic basis M ⊤ S M wi ∈ R d W ×d W is projected to SO(d W ) 1 . The projection on the orthogonal group P o and the proposed projection on the Special Orthogonal Group P so can be obtained via SVD. (See the appendix for proof.): Let X = U ΣV ⊤ be a SVD of X ∈ R d W ×d W , P so (X) = U diag (1, 1, . . . , 1, det (P o (X))) V ⊤ where P o (X) = U V ⊤ . (iii) Finally, we find the optimal orthogonal matrix O, that transforms the initial basis M S to the global semantic basis B s , via Fréchet mean of projected local semantic basis {P so M ⊤ S M wi } i ⊂ SO(d W ). B s = M S O where O = arg min µ∈SO(d W ) 1≤i≤n d µ, P so M ⊤ S M wi 2 . ( ) Here, the Riemannian metric on SO(d W ) is defined as d(X 1 , X 2 ) = ∥ Skew log(X ⊤ 1 X 2 ) ∥ F for X 1 , X 2 ∈ SO(d W ) where log denotes a matrix logarithm and Skew refers to the skew-symmetric component of a matrix, i.e., Skew(X 1 ) =foot_0 2 X 1 -X ⊤ 1 . As in the Grassmannian manifold, we utilized the Pymanopt (Townsend et al., 2016) The Fréchet basis can be interpreted as the minimizer of the unsupervised disentanglement metric, Distortion (Choi et al., 2022a) . Distortion metric is based on the inconsistency of intrinsic tangent spaces. Specifically, Distortion metric D W is the ratio between the inconsistency at two random w ∈ W and at two close w ∈ W, i.e., D W = I rand /I local (Eq 4 and 5). Here, the intrinsic tangent space represents the local semantic variations. From this point of view, the Distortion-based global semantic subspace S D would be a representative of these tangent spaces that minimize the inconsistency to each tangent space. S D = arg min µ ≤ R d W I global (µ) with I global (µ) = E z∼p(z), w=f (z) d k geo µ k , T w W k w , where µ is a subspace of R d W , k refers to the local dimension at w and µ k denotes the k-dimensional refinement of µ. The Fréchet basis assumes that the entire latent manifold W is approximated with d W -dimensional local estimate at all w ∈ W. Under this assumption, I global (µ) becomes I global (µ) = 1/ d W • E z∼p(z), w=f (z) d geo µ, T w W d W w for µ ∈ Gr(d W , R d W ). ( ) The comparison with Eq 8 shows that the global semantic subspace by Fréchet mean S s can be interpreted as L 2 -Distortion minimizer, i.e., d 2 geo instead of d geo . Although the original L 1 -Distortion was proven to provide high correlations with the global-basis-compatibility and the supervised disentanglement score, L 2 -Distortion was not tested (Choi et al., 2022a) . Therefore, we evaluated whether L 2 -Distortion is also a meaningful metric to verify the validity of minimizing it. Following the experiments in Choi et al. (2022a) , we assessed the global-basis-compatibility by the FID (Heusel et al., 2017) Gap between Local Basis and GANSpace (Härkönen et al., 2020) under the same perturbation intensity. Also, DCI score (Eastwood & Williams, 2018 ) is adopted as the supervised disentanglement score. We utilized 40 binary attribute classifiers pre-trained on CelebA (Liu et al., 2015) to annotate the 10k generated images. Figure 2 demonstrates that L 2 -Distortion achieves high correlations comparable to the original Distortion score in the global-based compatibility and DCI. These results prove that our framework of minimizing L 2 -Distortion by Fréchet mean is also valid.

4.1. FR ÉCHET BASIS AS GLOBAL SEMANTIC PERTURBATIONS

We evaluate the Fréchet basis as the global semantic perturbations on the intermediate layers of the mapping network in various StyleGAN models (Karras et al., 2019; 2020b) . For each StyleGAN model, we used the layers from 3rd to 8th because the local dimension estimate is rather unstable for the 1st and 2nd layers depending on the preprocessing hyperparameter θ pre (Choi et al., 2022a) . We chose these intermediate layers for evaluation because they are diverse and properly disentangled latent spaces. In this manner, the Fréchet basis can be tested on six latent spaces for each pre-trained Figure 3 shows how the image changes as we perturb the latent variable along each global basis. For a fair comparison, we took the annotated basis in GANSpace (Härkönen et al., 2020 ) and compared those with Fréchet basis on W-space of three StyleGAN models. Because GANSpace performs layerwise edits, we matched the set of layers, where the perturbed latent variable is fed, in the synthesis network as annotated. The corresponding Fréchet basis component is selected by the cosine-similarity. Each subfigure shows the three images traversed with the same global basis, perturbation intensity, and the set of perturbed layers. The original image is placed at the center. Hence, these subfigures also show the semantic consistency of the global basis. In StyleGAN2 trained on FFHQ, GANSpace shows image failure on the left side and semantic inconsistency on the third row (not representing hairy on the left) (Fig 3b ). In StyleGAN2 trained on LSUN-cat (Yu et al., 2015) , GANSpace presents entangled semantic manipulation (Fig 3d ). The latent traversal along GANSpace changes the light position as annotated, but also darkens the striped pattern of cats. On the other hand, Fréchet basis achieves better semantic factorization without showing those problems (Fig 3a  and 3c ). (See the appendix G for additional examples of other attributes and datasets. Also, since Fréchet basis is an average of Local Basis, we compared these two methods and GANSpace in the appendix F.) For the quantitative comparison of semantic factorization, we compared DCI as in Sec 3.2. DCI is a supervised disentanglement metric that assesses the axis-wise alignment of semantics. Hence, we measured DCIs of the latent space representations with two global basis, GANSpace and Fréchet basis. Specifically, we converted a latent variable w ∈ R of each latent space is estimated with θ pre = 0.01 (Choi et al., 2022a) . In all latent spaces except for the 5-th layer in StyleGAN2-e, the latent space achieves a higher DCI score when represented with Fréchet basis. This quantitative result shows that Fréchet basis provides a better semantic factorization along each basis component at the same latent space. Robustness We tested the robustness of Fréchet basis by comparing the image fidelity under the latent perturbation. For each global basis, we evaluated FID (Heusel et al., 2017) of 50k i.i.d. latent perturbed images. The perturbation direction is selected to be the 1st component for the GANSpace. In Fréchet basis, we chose the component with the highest cosine-similarity to the 1st component of GANSpace. The perturbation intensity is 2 in StyleGAN1, and 5 in StyleGAN2-e and StyleGAN2. The FID scores of the latent spaces in three models are provided in Fig 5 . (See the appendix for FID scores under various perturbations intensity.) We think this higher robustness is because the global semantic subspace S s is the Fréchet mean of the intrinsic tangent spaces of learned latent manifold. The strong robustness of traversing along the tangent space at each latent variable was observed in Choi et al. (2022b) . Therefore, traversing along Fréchet basis can be interpreted as traversing along the mean of these locally robust perturbation directions, because Fréchet basis is a basis of S s . B global in S global via Sec 3.1.2. Table 1 shows FID and DCI scores of Fréchet basis, GANSpace, GANSpace refinement, SeFA, and SeFa refinement in W-space of StyleGAN2 trained on FFHQ. These two scores are evaluated in the same manner as in Sec 4.1 with the perturbation intensity 2 for FID. Most importantly, Fréchet basis achieves the best FID and DCI. Also, the basis refinement monotonically improves the two scores of the two previous methods, GANSpace and SeFa. The comparison with the previous global method and its refinement proves the contribution of basis optimization. Also, the superior performance of Fréchet basis over the two refinements shows the contribution of subspace optimization.

4.3. GEODESIC INTERPOLATION ON GRASSMANNIAN MANIFOLD

We further investigate the optimality of global semantic subspace S s by analyzing the interpolation from S s to GANSpace subspace S GS . Similar to Sec 4.2, this experiment examines the contribution of the first step in Fréchet basis. In the Grassmannian manifold Gr(k, R n ), there exists at least one length-minimizing geodesic between any two subspaces in Gr(k, R n ). Moreover, there is an explicit parametrization of this length-minimizing geodesic (Eq 14) (Chakraborty & Vemuri, 2015) . For X , Y ∈ Gr(k, R n ), let X, Y ∈ R n×k be the column-wise concatenation of their orthonormal basis. Then, the length-minimizing geodesic Γ(X , Y, t) from X to Y is defined as: Γ (X , Y, t) = span{ (XV cos(Θt) + U sin(Θt)) V ⊤ } for t ∈ [0, 1], (14) where X ⊤ Y is non-singular, (Y -XX ⊤ Y )(X ⊤ Y ) -1 = U ΣV ⊤ is the thin SVD, and Θ = arctan Σ. Note that the last V ⊤ multiplication is unnecessary as the subspace. However, we added it to match the basis to X at t = 0. We used this geodesic to perform the interpolation and extrapolation from S s to S GS : S i = Γ (X , Y, (i -1)/n) for i = 0, 1, . . . , n + 2. (15) with n = 6. Note that i = 0, n + 2 represent the extrapolations of t = (-1/n), 1 + (1/n). Because the above interpolation is performed on a subspace-scale, we conducted the basis refinement (Sec 4.2) for each interpolation subspace S i to find the interpolation basis B i . Figure 6 

5. CONCLUSION

In this paper, we proposed the unsupervised global semantic basis on the intermediate latent space in a GAN, called Fréchet basis. Fréchet basis is discovered by utilizing the Fréchet mean on the Grassmannian manifold and the Special Orthogonal Group. Our experiments demonstrate that Fréchet basis achieves better semantic factorization and robustness than the previous unsupervised global methods. In addition, we suggest the basis refinement scheme using the Fréchet mean. Given the same semantic subspace generated by the previous global methods, the refined basis attains better semantic factorization and robustness.

A PROJECTION ONTO SPECIAL ORTHOGONAL GROUP

In this section, we provide proof for the projection of an invertible matrix A ∈ GL(n) onto the special orthogonal group SO(n). Formally, the set of invertible matrices GL(n), orthogonal group O(n), and special orthogonal group SO(n) are defined as follows: GL(n) := {A ∈ R n×n : det(A) ̸ = 0}, O(n) := {A ∈ R n×n : A ⊤ A = AA ⊤ = I}. ( ) SO(n) := {A ∈ R n×n : A ⊤ A = AA ⊤ = I, det(A) = 1}. For a non-invertible matrix, the projection onto SO(n) is not uniquely defined because of the subspace generated by singular vectors with σ = 0. However, the set of non-invertible matrices has measure-zero in the set of n × n matrices R n×n . Thus, this did not happen in practice during the Fréchet basis optimization. Theorem 1. The following optimization problem, i.e., the projection of A ∈ GL(n) onto the Special Orthogonal Group SO(n), arg min X∈SO(n) ∥X -A∥ F where A ∈ GL(n), has a solution P so (A) = X * = U diag 1, 1, . . . , 1, det U V ⊤ V ⊤ where A = U ΣV ⊤ is the Singular Value Decomposition (SVD) of A with the projection onto orthogonal group P o (A) = U V ⊤ . (The projection is unique if we assume σ 1 > σ 2 > . . . > σ n > 0 where {σ i } 1≤i≤n denote the singular values of A.) Proof. 1. If P o (A) = U V ⊤ ∈ SO(n), done. (∵ SO(n) ⊂ O(n) and det (P o (A))) = 1). 2. If P o (A) = U V ⊤ / ∈ SO(n), i.e., det(A) < 0, ∥X -A∥ 2 F = ∥I -X ⊤ A∥ 2 F = n -2 tr(X ⊤ A) + tr(A ⊤ A). For any skew-symmetric matrix K, define f (t) as f (t) = -2 tr(A ⊤ Xe tK ), for t ∈ R. Note that Xe tK ∈ SO(n) and tr(A ⊤ A) is given. Therefore, if X is the minimizer of ∥X -A∥ F , we have f ′ (0) = -2 tr(A ⊤ XK) = 0 for all skew-symmetric K. Thus, A ⊤ X is symmetric. Without loss of generality, we may assume that A = U 0 Σ 0 V ⊤ 0 , X = U 0 X ′ V ⊤ 0 for some U 0 , V 0 ∈ SO(n) where Σ 0 is the diagonal matrix as Σ in SVD but Σ 0 might have the negative elements. This decomposition can be obtained by flipping the singular vectors in U, V of SVD to make U 0 , V 0 ∈ SO(n) and letting X ′ = U ⊤ 0 XV 0 . Explicitly, from det(A) < 0, U 0 = U diag (1, . . . , 1, det (U )) , V 0 = V diag (1, . . . , 1, det (V )) , Σ 0 = diag (σ 1 , . . . , σ n-1 , -σ n ) . Then, since X ⊤ A = A ⊤ X ⊤ , X ⊤ A = V 0 (X ′ ) ⊤ Σ 0 V ⊤ 0 is symmetric and has negative determinant. ( ) Therefore, (X ′ ) ⊤ Σ 0 is also symmetric and thus diagonalizable. ∥X -A∥ 2 F = ∥I -X ⊤ A∥ 2 F = ∥I -V 0 (X ′ ) ⊤ Σ 0 V ⊤ 0 ∥ 2 F = ∥I -(X ′ ) ⊤ Σ 0 ∥ 2 F = i (1 -λ i ) 2 , ( ) where λ i denotes the i-th eigenvalue of (X ′ ) ⊤ Σ 0 with |λ 1 | ≥ |λ 2 | ≥ . . . ≥ |λ n |. Note that since X ′ = U ⊤ 0 XV 0 ∈ SO(n), the i-th singular value σ i of A satisfies σ i = |λ i |. Moreover, an odd number of signed singular values in Σ 0 is negative because det(A) < 0. Also, det((X ′ ) ⊤ Σ 0 ) = det(X ⊤ A) < 0 implies that an odd number of eigenvalues is negative. Hence, ∥X -A∥ F is minimized when λ i = σ i for 1 ≤ i ≤ n -1 and λ n = -σ n . If X ′ is the diagonal matrix satisfying this condition, X = U diag (1, 1, . . . , 1, -1) V ⊤ . Lemma 1 proves that when σ 1 > σ 2 > . . . > σ n is satisfied, the uniqueness of X ′ is guaranteed. Published as a conference paper at ICLR 2023 Lemma 1. Let X ′ ∈ SO(n) and Σ 0 = diag (σ 1 , . . . , σ n-1 , -σ n ) with σ 1 > . . . > σ n-1 > σ n > 0. If Y = (X ′ ) ⊤ Σ 0 is symmetric, X ′ is a diagonal matrix. Proof. Since Y is symmetric, it is diagonalizable with the orthogonal matrix P ∈ O(n). Y = (X ′ ) ⊤ Σ 0 = P DP t . ( ) We interpret these two decompositions Y = (X ′ ) ⊤ Σ 0 I = P DP t as two SVD-like representations of Y because (X ′ ) ⊤ ∈ SO(n) and I ⊤ = I. The ordered singular vectors in the domain of Y are uniquely determined as the basis for each eigenspace of Y ⊤ Y . Note that if the dimension of eigenspace is bigger than 1, there is a freedom of choosing a basis in it. From Σ 0 , the possible eigenvalues of Y are {±σ i } 1≤i≤n and the eigenvalues of Y ⊤ Y are {σ 2 i } 1≤i≤n . Therefore, since the standard basis {e i } 1≤i≤n of R n are the domain singular vectors of Y from Y = (X ′ ) ⊤ Σ 0 I and Y is diagonalizable, Y (e i ) = σ i e i for 1 ≤ i ≤ n -1 (∵ σ 1 > . . . > σ n-1 > 0) -σ n e n for i = n (∵ σ n-1 > σ n > 0). Hence, the standard basis are also the codomain singular vectors of Y , which implies that X ′ is diagonal.

B IMPLEMENTATION DETAIL

In this section, we summarize the hyperparameters for Frechet basis presented in the experimental results in Section 4. )UpFKHW%DVLV),'  (b) I = 1.5 *$16SDFH),' )UpFKHW%DVLV),' (c) I = 2.0 *$16SDFH),' )UpFKHW%DVLV),'

E ABLATION STUDY ON THE OTHER MEANS OF GRASSMANNIAN MANIFOLD

In this section, we conducted an ablation study on defining the global semantic subspace in the Grassmanifold. In particular, we compared the Fréchet mean Fréchet (1948) ; Marrinan et al. (2014) and extrinsic mean Srivastava & Klassen (2002) of the Grassmannian manifold. The extrinsic mean µ E is defined as the minimizer of squared extrinsic metrics d E , i.e., for x 1 , . . . , x n ∈ Gr(k, R n ), µ E = arg min µ∈X 1≤i≤n d E (µ, x i ) 2 , where d E (µ, x i ) = d Φ (Φ(µ), Φ(x i )) , where Φ denotes an appropriate embedding of Gr(k, R n ). Following Marrinan et al. (2014) , we set the embedding Φ to be the corresponding projection P xi , i.e., Φ(x i ) = P xi = M ⊤ xi M xi where M xi ∈ R n×k indicates the column-wise concatenation of an orthonormal basis of x i . Also, the Frobenius norm is adopted for d Φ . ( In this section, we introduced a new experiment to quantitatively compare semantic factorization between Fréchet basis and Local Basis Choi et al. (2022b) . Intuitively, this experiment measures the average of local DCI scores. To be more specific, consider n-samples {z i } 1≤i≤n ⊂ Z of input Gaussian noise. (We set n = 100) Then, take m-samples from the neighborhood of each z i and map them to the target latent space W = f (Z), where f denotes the subnetwork from Z to W: [WULQVLF%DVLV'&, )UpFKHW%DVLV'&, (a) DCI (↑) ([WULQVLF%DVLV),' )UpFKHW%DVLV),' w i,j = f (z i + ϵ i,j ) where ϵ i,j ∼ N (0, σ 2 I) and w i = f (z i ), for sufficiently small σ > 0. (We set m = 1, 000 and σ = 0.5) Then, we measure DCI score for each neighborhood of w i , i.e., {w i,j } 1≤j≤m , after representing these latent variables with each semantic basis as in Eq 13 ( 



This projection is underdetermined for the matrix with determinant 0 because of the subspace generated by singular vectors with σ = 0. Because these matrixes are measure-zero set, this did not happen in practice.



Figure 1: Overview of Fréchet basis. The global semantic subspace S s is defined as the Fréchet mean of intrinsic tangent spaces T wi W d Wwi in the Grassmannian manifold Gr(d W , R d W ). Fréchet basis B s is discovered by selecting the optimal basis of S s using the Fréchet mean in the Special Orthogonal Group.

(i) Project each local semantic basis to the d W -dimensional global semantic subspace S s . (ii) Project these projected local semantic basis to SO(d W ). (iii) Find the Fréchet mean O in SO(d W ) and embed O back to the ambient space R d W .

Figure 2: Correlations of L 2 -Distortion (↓) to FID gap (↓) and DCI (↑) when θ pre = 0.005. L 1 -Distortion shows the correlations of 0.98 to FID-Gap in StyleGAN2-cat and of -0.91 to DCI in StyleGAN2-e. (See the appendix for full correlation results in six models.) 3.2 FR ÉCHET BASIS AS L 2 -DISTORTION MINIMIZER

Figure 3: Comparison of Semantic Factorization between Fréchet basis and GANSpace. (v i , l 1 -l 2 ) denotes the layer-wise edit along the i-th GANSpace component at the l1-l2 layers. The image traversals are performed on StyleGAN2-FFHQ (Fig 3a, 3b) and StyleGAN2-LSUN cat (Fig 3c, 3d).StyleGAN model. In this section, all Fréchet basis are discovered using 1,000 i.i.d. samples of Local Basis with θ pre = 0.01. The max iteration is set to 200 when optimizing Fréchet mean with Pymanopt. The evaluation is performed on two properties: Semantic Factorization and Robustness. Fréchet basis is compared with GANSpace(Härkönen et al., 2020) and SeFa(Shen & Zhou, 2021)   because these two methods are also unsupervised global basis (Sec 2). Note that GANSpace and Fréchet basis can be applied to arbitrary latent space, but SeFa is only applicable to W-space(Karras et al., 2019), i.e., the last layer of the mapping network in StyleGANs.Semantic FactorizationIn Fig3 and 4, we evaluated the semantic factorization of Fréchet basis. Figure3shows how the image changes as we perturb the latent variable along each global basis. For a fair comparison, we took the annotated basis in GANSpace(Härkönen et al., 2020) and compared those with Fréchet basis on W-space of three StyleGAN models. Because GANSpace performs layerwise edits, we matched the set of layers, where the perturbed latent variable is fed, in the synthesis network as annotated. The corresponding Fréchet basis component is selected by the cosine-similarity. Each subfigure shows the three images traversed with the same global basis, perturbation intensity, and the set of perturbed layers. The original image is placed at the center. Hence, these subfigures also show the semantic consistency of the global basis. In StyleGAN2 trained on FFHQ, GANSpace shows image failure on the left side and semantic inconsistency on the third row (not representing hairy on the left) (Fig3b). In StyleGAN2 trained on LSUN-cat(Yu et al., 2015), GANSpace presents entangled semantic manipulation(Fig 3d). The latent traversal along GANSpace changes the light position as annotated, but also darkens the striped pattern of cats. On the other hand, Fréchet basis achieves better semantic factorization without showing those problems (Fig3a and 3c). (See the appendix G for additional examples of other attributes and datasets. Also, since Fréchet basis is an average of Local Basis, we compared these two methods and GANSpace in the appendix F.)

Figure4presents the DCI results. StyleGAN2-e denotes the config-e model of StyleGAN2(Karras  et al., 2020b), and all three models are trained on FFHQ(Karras et al., 2019). The intrinsic dimension

Figure 6: Geodesic Interpolation from Fréchet basis (i = 1) to GANSpace (i = 7).

presents the geodesic metric d geo in the Grassmannian manifold for each interpolation subspace to GANSpace and Fréchet basis (Fig 6a), and the DCI score evaluated at each interpolation basis B i (Fig 6b) on W-space of StyleGAN2-FFHQ. First, Fig 6a shows that S i performs interpolation from Fréchet basis at i = 0 to GANSpace at i = 7. This interpolation is linear in the Grassmannian metric d geo . Second, the DCI score at each interpolation basis B i are presented in Fig 6b. Fréchet basis at i = 0 achieves the best DCI score among the interpolation basis. Note that the DCI score of original GANSpace without refinement is 0.312 (Tab 1), which is lower than the score after refinement at i = 7 as in Sec 4.2.

Preprocossing hyperparameter θ pre for local dimension estimation Choi et al. (2022a): θ pre = 0.01. • Global Semantic Subspace Optimization -Number of samples n = 1000. -Max iteration in Frechet mean Optimization using Pymanopt Townsend et al. (2016) = 1, 000. -Max time in Frechet mean Optimization using Pymanopt = 2, 000. • Global Semantic Basis Optimization -Number of samples n = 1000. -Max iteration in Frechet mean Optimization using Pymanopt = 200.-Max time in Frechet mean Optimization using Pymanopt = 10, 000.

Figure 9: Quantitative Comparison of Robustness between GANSpace and Fréchet basis on StylGAN1-FFHQ (Karras et al., 2019) for various perturbation intensity I. The perturbation intensity I is measured by the L 2 -norm on each latent space. The image fidelity under the latent traversal is evaluated by FID (↓). The black line indicates where two FIDs are equal.

Figure 12: Ablation study on the means of Grassmannian manifold on StyleGAN2-FFHQ. Extrinsic basis is a variant of Fréchet basis where the global semantic basis is discovered by the extrinsic mean of the Grassmannian manifold. In both scores, our Fréchet basis outperforms Extrinsic basis in 5 out 6 intermediate layers in the mapping network.

Figure 13: Comparison of Latent Traversals on StyleGAN2-FFHQ. We used the annotated GANSpace on the semantics of "Bald". The corresponding Fréchet Basis and Local Basis are chosen by the cosine similarity. The traversal images along GANSpace are more deteriorated than the other two bases. The red box indicates where the image deterioration occurred.

Figure 14: Comparison of Latent Traversals on StyleGAN2-LSUN Church. We used the annotated GANSpace on the semantics of "Clouds". The corresponding Fréchet Basis and Local Basis are chosen by the cosine similarity. Some traversal examples along Local Basis do not show the annotated semantic variations compared to the other two bases. In the yellow box, no clouds appeared. In the red box, clouds appeared in all images.

Figure 15: Comparison of Latent Traversals on StyleGAN2-LSUN Horse. We used the annotated GANSpace on the semantics of "White horse". The traversal images of GANspace and Local Basis in the yellow box are less affected than that of Frechet Basis.

Figure 16: Comparison of Latent Traversals on StyleGAN2-LSUN Car. We used the annotated GANSpace on the semantics of "Side to Front". The traversal images of Local Basis are not affected by perturbations.

Figure 18: Comparison of Semantic Factorization on StyleGAN2-FFHQ between Fréchet basis and GANSpace. (v i , l 1 -l 2 ) denotes the layer-wise edit along the i-th GANSpace component at the l1-l2 layers.

Figure 19: Comparison of Semantic Factorization between Fréchet basis and GANSpace. (v i , l 1 -l 2 ) denotes the layer-wise edit along the i-th GANSpace component at the l1-l2 layers. The image traversals are performed on StyleGAN2 (LSUN Car, Horse, Cat, and Church).

Figure 20: Comparison of Semantic Factorization between Fréchet basis and GANSpace. (v i , l 1 -l 2 ) denotes the layer-wise edit along the i-th GANSpace component at the l1-l2 layers. The image traversals are performed on StyleGAN1-FFHQ.

Figure 21: Comparison of Supervised method (InterfaceGAN) and Unsupervised methods (GANspace, Fréchet Basis). The first row shows the results of supervised method (InterfaceGAN (Shen et al., 2020)), the second row shows GANspace results, and the last shows our traversing results along Fréchet Basis.

Basis Refinement by Fréchet mean.

Fréchet basis and Local Basis at w i ). The averages of these local DCI scores are compared between Fréchet basis and Local Basis. Below, we included the evaluated scores on W-space of StyleGAN2, StyleGAN2-config-e, and StyleGAN1 trained on FFHQ. Note that the overall DCI scores are higher than Fig 4 because it is easier for a semantic basis to satisfy semantic consistency on the local region than on the entire latent space. As in the qualitative examples, Fréchet basis outperforms Local Basis in the quantitative assessment.

Quantitative Comparison of Semantic Factorization by Local DCI (↑). Local DCI score is evaluated at W-space of each StyleGAN model.

ACKNOWLEDGMENTS

This work was supported by a KIAS Individual Grant [AP087501] via the Center for AI and Natural Sciences at Korea Institute for Advanced Study, the NRF grant [2012R1A2C3010887], and the MSIT/IITP ([1711117093], [2021-0-00077], [No. 2021-0-01343, Artificial Intelligence Graduate School Program(SNU)]).

