SORTED EIGENVALUE COMPARISON d Eig : A SIMPLE ALTERNATIVE TO d FID

Abstract

For i = 1, 2, let S i be the sample covariance of Z i with n i p-dimensional vectors. First, we theoretically justify an improved Fréchet Inception Distance (d FID ) algorithm that replaces np.trace(sqrtm(S 1 S 2 )) with np.sqrt(eigvals(S 1 S 2 )).sum(). With the appearance of unsorted eigenvalues in the improved d FID , we are then motivated to propose sorted eigenvalue comparison (d Eig ) as a simple alternative: , and λ i j is the j-th largest eigenvalue of S i . Second, we present two main takeaways for the improved d FID and proposed d Eig . (i) d FID : The error bound for computing non-negative eigenvalues of diagonalizable S 1 S 2 is reduced to O(ε)∥S 1 ∥∥S 1 S 2 ∥, along with reducing the run time by ∼ 25%. (ii) d Eig : The error bound for computing non-negative eigenvalues of sample covariance S i is further tightened to O(ε)∥S i ∥, with reducing ∼ 90% run time. Taking a statistical viewpoint (random matrix theory) on S i , we illustrate the asymptotic stability of its largest eigenvalues, i.e., rigidity estimates of O( n). Last, we discuss limitations and future work for d Eig .

1. INTRODUCTION

import numpy as np from scipy.linalg import eigvals, eigvalsh # The square of improved d FID def dFID(mean1, cov1, mean2, cov2): eigval = eigvals(cov1 @ cov2) # Round computational errors (if exist) # that lead to negative eigenvalues close to 0 eigval[eigval < 0] = 0 dif = mean1 -mean2 res = dif.dot(dif) + np.trace(cov1 + cov2) return res -2 * np.sqrt(eigval).sum() # The square of proposed d Eig def dEig(scm1, scm2): # Sorted eigenvalues eigval1 = eigvalsh(scm1) eigval1[eigval1 < 0] = 0 eigval2 = eigvalsh(scm2) eigval2[eigval2 < 0] = 0 dif = np.sqrt(eigval1) -np.sqrt(eigval2) return dif.dot(dif) In the image domain, it is of great interest to analyze the distribution shift between two collections of data entries (Wiles et al., 2021; Borji, 2019) . On one hand, this is driven by the increasing awareness about the violation of the assumption of 'identical distribution' between training and (real-world) test datasets (Wu et al., 2022b) . As for instance illustrated in the leaderboard of WILDS (Koh et al., 2021; Sagawa et al., 2021) , many algorithms suffer from performance degradation and fail to generalize to heterogeneous testing settings. On the other hand, the importance of assessing distribution shift has been recognized with the rise of generative adversarial nets (GAN) (Goodfellow et al., 2014; Heusel et al., 2017) . The rapid development of GAN variants (Karras et al., 2019; 2020b) urges reliable and accurate metric(s) to assess the discrepancy between generated and real images (Borji, 2019) . To objectively assess GAN models, researchers have proposed a plethora of evaluation scores including Inception Score (Salimans et al., 2016) , Kernel Inception Distance (d KID ) (Bińkowski et al., 2018) , and Precision/Recall (Kynkäänniemi et al., 2019; Sajjadi et al., 2018) (please also see (Borji, 2019; 2022) for in-depth review). Among various scores, Fréchet Inception Distance (d FID ) (Heusel et al., 2017) is arguably the most widely-used metric for benchmarking GAN performance (Parmar et al., 2022) . This is mainly due to the favorable theoretical property of being a mathematical metric (Dowson & Landau, 1982) and practical property of being well-correlated with perceived image quality (Sajjadi et al., 2018) . Meanwhile, Chong & Forsyth (2020) argued that d FID is a biased estimator and Kynkäänniemi et al. (2022) observed its undesirable sensitivity towards fringe features or classes. Despite these weaknesses, d FID currently remains the 'gold standard' for GAN evaluation and continuously attracts broad attention. In a recent study, Mathiasen & Hvilshøj (2020) proposed to compute eigenvalues rather than square root of a matrix as in d FID . We view this as a promising simplification and improvement, nonetheless a precise theoretical analysis has not been performed and therefore becomes the starting point of this paper. The study of random matrix theory (RMT), with an emphasis on understanding the properties of (random) eigenvalues (Paul & Aue, 2014) , has brought novel insights in the domain of deep learning (Liao & Couillet, 2018; Pastur, 2022; Baskerville et al., 2022) , among which Seddik et al. ( 2020) analyzed deep learning representations of GAN generated images through the lens of eigenvalues of their sample covariance matrix (SCM). Driven by the need to efficiently quantify the distribution shift between two collections of heterogeneous data entries, we propose to compare sorted eigenvalues (d Eig ) as a simple alternative to d FID . Our contributions are summarized as follows: For i = 1, 2, let S i be the sample covariance of Z i = (z i 1 , . . . , z i ni ) with n i p-dimensional vectors. • (d FID ) We articulate the fact that S 1 S 2 is diagonalizable and has non-negative eigenvalues. This allows us to theoretically justify an improved algorithm of d FID , i.e., by replacing the unique principal square root of a matrix with the element-wise square root of its eigenvalues. Therefore, the error bound for computing its eigenvalues is reduced to O(ε)∥S 1 ∥∥S 1 S 2 ∥, reducing the run time by ∼ 25%. • (d Eig ) Since S i is symmetric positive semidefinite, the error bound for computing its nonnegative eigenvalues is further tightened to O(ε)∥S i ∥, along with reducing ∼ 90% run time. From the viewpoint of random matrix theory (RMT), we demonstrate the asymptotically stable behavior of the largest eigenvalues (spikes). 2 THE IMPROVED d FID (Linear Algebra) Notation: Lower case Roman or Greek letters (e.g., s, ϵ, γ, λ) denote scalars, bold lower case letters (e.g., v, z, µ) denote vectors, and bold upper case letters (e.g., Q, S, U , Z, Λ) denote matrices. T is matrix transpose, ∥.∥ is L 2 norm, ≲ denotes asymptotically less than.

2.1. PRINCIPAL SQUARE ROOT OF A MATRIX

Without loss of accuracy, we discuss d FID through the lens of linear algebra. More specifically, scalars, vectors and matrices discussed in the section are deterministic, while a statistical viewpoint on these objects will be later introduced in the proposed d Eig section. For i = 1, 2, let Z i = (z i 1 , . . . , z i ni ) be a collection of n i p-dimensional vectors. For simplicity, we assume sample mean 1 ni ni k=1 z i k = 0 throughout Sec. 2. Accordingly, S i = 1 ni Z i Z T i denotes the sample covariance matrix (SCM) of Z i . We start the discussion with revisiting standard the definition(s) of d FID (Givens & Shortt, 1984) , then we elaborate the properties of principal square root -the key computational challenge of d FID .

2.1.1. Trace((S

1 2 1 S 2 S 1 2 1 ) 1 2 ) Definition 1. Let S i be the SCM of Z i and w.l.o.g. S 1 is non-singular, then we define d FID (S 1 , S 2 ) 2 = Trace(S 1 + S 2 -2(S 1 2 1 S 2 S 1 2 1 ) 1 2 ). (1) To compute the Trace() of d FID , we first need to clarify the symbol 1 2 in Eq. 1. As mentioned in (Dowson & Landau, 1982) , 1 2 denotes the positive (or principal (Higham, 2008) ) square root of a matrix S such that S 1 2 S 1 2 = S, and 'principal' specifies the square root(s) S 1 2 with non-negative eigenvalues. In general, the square root of a matrix may neither exist nor be unique (Higham, 2008) . Consider now the special case where S is symmetric positive semidefinite (PSD), then we have Theorem 2. (Principal square root) Let a symmetric PSD S be decomposed as S = QΛQ T , where Q is an orthogonal matrix and Λ is a diagonal matrix with non-negative eigenvalues, then S 1 2 := QΛ 1 2 Q T is the unique principal square root of S. Based on the definition of S 1 2 , it is easy to see that S 1 and S 2 are symmetric PSD, we make the following claim. Corollary 3. S 1 2 1 S 2 S 1 2 1 is a symmetric PSD matrix and therefore has a unique principal square root. Claim. Accordingly, Trace((S 1 2 1 S 2 S 1 2 1 ) 2 ) (Eq. 1) can be derived after eigenvalue decomposition suggested in Thm. 2. As a computational routine, this formulation is nevertheless undesirable. Because we need to call the eigenvalue decomposition function twice, which potentially increases computational time and error risk. Instead, we seek for another equivalent formulation of Eq. 1 (Givens & Shortt, 1984) . 2.1.2 Trace((S 1 S 2 ) 1 2 ) Lemma 4. Following the specifications of S i in Eq. 1, then we have d FID (S 1 , S 2 ) 2 = Trace(S 1 + S 2 -2(S 1 S 2 ) 1 2 ). (2) Because of non-singular S 1 , it is not difficult to see that eigenvalues of S 1 2 1 S 2 S 1 2 1 and S 1 S 2 are identical, and the corresponding eigenvectors are identical up to an invertible linear transformation S 1 2 1 (or S -1 2 1 ). Due to the fact that S 1 2 1 S 2 S 1 2 1 is symmetric PSD, then we have Corollary 5. S 1 S 2 is a diagonalizable matrix with non-negative eigenvalues and therefore has a unique principal square root. Remark 6. Note that at least one of S 1 and S 2 should be non-singular, or the null space of S 1 should be contained in that of S 2 (Dowson & Landau, 1982) . If S 1 is singular, then the above discussions remain the same after switching the role of S 1 and non-singular S 2 . Claim. Consequently, eigenvalues of (S 1 S 2 ) 1 2 are mathematically equivalent to the elementwise square root of eigenvalues of S 1 S 2 . Since S 1 S 2 and S 1 2 1 S 2 S 1 2 1 have identical eigenvalues and the trace of a diagonalizable matrix is the sum over its eigenvalues, then we have Trace((S 1 S 2 ) 1 2 ) = Trace((S 1 2 1 S 2 S 1 2 1 ) 2 ) and prove Lem. 4. Importantly, this rigorously justifies the workaround algorithm of d FID that computes the element-wise square root of eigenvalues, which bypasses the expansive computation of the square root of a matrix.

2.2. ELEMENT-WISE SQUARE ROOT OF EIGENVALUES

Before substituting the square root component of d FID , let us take a step back and re-examine its widely-used implementation scipy.lingalg.sqrtm() 1 . In a nutshell, the underlying computational routine is a blocked Schur algorithm (Björck & Hammarling, 1983; Deadman et al., 2012) , which includes two phases: Schur decomposition (schur()) and solving (triangular) Sylvester equation. For computing Trace() of p × p diagonalizable matrix (S 1 S 2 ) 1 2 , we show the latter phase is redundant. 1 https://github.com/GaParmar/clean-fid/blob/main/cleanfid/fid.py 

2.2.1. NUMERICAL ERROR BOUND

Corollary 7. (Schur decomposition) Let the diagonalizable matrix S 1 S 2 be decomposed as QU Q T . Here, Q is an orthogonal matrix and U is an upper triangular matrix. Then Trace((S 1 S 2 ) 1 2 ) = p j=1 √ u jj , where u 11 , . . . , u pp are diagonal entries of U . This equation is derived from the fact that diagonal entries of U are exactly (non-negative) eigenvalues of S 1 S 2 . As an immediate consequence, it suffices to compute schur() for obtaining Trace((S 1 S 2 )   1   2 ). By default, schur() simultaneously computes U and Q. In our case, we only want to compute diagonal entries of U . This leads to a more speedy eigvals() that shares the same core routine as schur(). Eventually, we replace the standard pythonic implementation np.trace(sqrtm()) with np.sqrt(eigvals()).sum() (See Fig. 1 for more details and Mathiasen & Hvilshøj (2020) for reference). Such a series of algorithmic simplification allows us to propose a (strictly) tighter error bound compared to the original case w.r.t. sqrtm(). Error bound of eigvals(). As discussed in (Anderson et al., 1999) , for the computed eigenvalue γj and eigenvalue γ j of S 1 S 2 we have |γ j -γ j | ≲ O(ϵ)s -1 j ∥S 1 S 2 ∥ , where ϵ is machine epsilon. The remaining task is to compute s j . Since if v j is the right eigenvector for γ j , then the left eigenvector is v T j S -1 1 . Because of s j := |v T j S -1 1 v j | = ∥S -1 2 1 v j ∥ 2 we have s -1 j ≤ ∥S 1 2 1 ∥ 2 = ∥S 1 ∥. For j = 1, . . . , p, the (asymptotic) error bound for computing eigenvalue γ j can be formulated as |γ j -γ j | ≲ O(ϵ)∥S 1 ∥∥S 1 S 2 ∥. (3) Moreover, if we want to compute eigenvalues of the p × p SCM S i that is symmetric PSD, we can utilize the eigvalsh() with lower run time and obtain a tighter error bound. Error bound of eigvalsh(). For j = 1, . . . , p, the error bound for computing eigenvalue λ i j of S i can be formulated as (Anderson et al., 1999 ) | λi j -λ i j | ≤ O(ϵ)∥S i ∥.

2.2.2. EIGENVALUE COMPARISON

Now, we discuss an important variant of d FID when S 1 and S 2 commute, i.e., S 1 S 2 = S 2 S 1 . Corollary 8. (Unsorted eigenvalue comparison) Let S 1,2 be two SCMs that are simultaneously diagonalizable by an orthogonal matrix Q, then d FID (S 1 , S 2 ) 2 = Trace((S 1 2 1 -S 1 2 2 ) 2 ) = p j=1 ( λ1 j -λ2 j ) 2 , ( ) where λi j is the j-th eigenvalue of S i w.r.t. Q. Under such a special case where S 1 and S 2 share the same eigenbasis, d FID is reduced to computing the Euclidean distance between unsorted eigenvalues. Motivated by this reduction, we propose to compare sorted eigenvalues as a simple alternative to d FID . Definition 9. (Sorted eigenvalue comparison) Let S 1,2 be two SCMs, then we define d Eig (S 1 , S 2 ) 2 = p j=1 ( λ 1 j -λ 2 j ) 2 , ( ) where λ i j is the j-th largest eigenvalue of S i . Accordingly, d Eig is a pseudometric on the set of SCMs with order p. Note that S 1 and S 2 in Eq. 6 do not necessarily commute. Instead of eigvals() used for computing eigenvalues of non-symmetric S 1 S 2 (Eq. 2), d Eig can be obtained with a more numerically stable and faster eigvalsh(), which is customized to compute λ i j of symmetric S i . As a pseudometric, d Eig satisfies non-negativity, symmetry and triangular inequality, while SCMs need not to be indistinguishable regarding d Eig . Following the convention, d Eig and d FID scores reported in the following are always the square of d Eig and d FID resp. Mathematical equivalence between d Eig and d FID . For proof of principle, we conduct toy studies with multivariate Gaussian data. Concretely, we construct non-negative diagonal entries of a pdim covariance matrix Σ with np.abs(np.random.randn(p)), while keeping the off-diagonal entries zero. By multiplying Σ 1 2 and X i = np.random.randn(p, n i ), we obtain n i Gaussian data entries Y i = Σ 1 2 X i that are drawn from (0, Σ). Then we compare S 1 = 1 n1 Y 1 Y T 1 to ground-truth Σ (Fig. 2(a)) and to S 2 = 1 n2 Y 2 Y T 2 (Fig. 2(b)) . Following above theoretical discussions, we instantiate Eq. 6 of d Eig with sqrtm(), schur() and eigvals(). Because S i is a symmetric SPD, we implement Eq. 2 of d FID with svdvals() and eigvalsh(). Throughout our experiments, we notice that the results of implementation variants are identical up to very small rounding errors. Therefore, we experimentally confirm the validity of improved d FID and the equivalency between svdvals() and eigvalsh(). Because identical (sample) covariances are simultaneously diagonalizable, we have d Eig = d FID in theory. Since S 1 ≈ S 2 ≈ Σ with sufficient amount of data, we expect d Eig ≈ d FID ≈ 0 in practice. Numerical difference between d Eig and d FID . When comparing S 1 to Σ, Fig. 2 (a) shows that d Eig and d FID have a comparable trend of decreasing scores with a growing number of data entries (5k → 50k). This indicates that both d Eig and d FID are meaningful metrics and can converge to their theoretical limit. When comparing S 1 to S 2 , Fig. 2 (b) illustrates that d Eig is more resistant to the data size difference. In contrast to d FID , it suffices to use a smaller amount of data to achieve a good estimation for d Eig . Arguably, d Eig represents a more reliable score than d FID due to the fact that 1) d Eig demonstrates favorable convergence curves that are overall closer to 0, and 2) in comparison with the standard d FID (Eq. 2), d Eig (Eq. 6) is a more faithful routine to approximate Eq. 5 -the simplified d FID for our toy setting. Run time. When comparing different variants for implementing d Eig and d FID , Fig. 2 (c ) shows 18% -32% reduction of run time by replacing sqrtm() with eigvals(), and we further reduce the run time by 85% -94% when utilizing eigvalsh(). As a result, it is beneficial to apply the improved d Eig and proposed d FID for computing distribution shifts, especially in the high dimensional cases such as p = 16384. From now on, d Eig and d FID are computed with eigvals() and eigvalsh() by default. 3 THE PROPOSED d Eig (Probability) Notation: Sans serif lower case letters (e.g., x, y, z, λ) denote random variables, sans serif bold lower case letters (e.g., x, y, z, µ) denote random vectors, and sans serif bold upper case letters (e.g., X, Y, Z, Q, S) denote random matrices. ≍ denotes asymptotic equivalence, N + is the set of positive integers, E is expectation and P is probability distribution. Following the computational analysis on deterministic eigenvalues, we here investigate the statistical facets of d Eig from the viewpoint of RMT. Given a SCM, researchers have a keen interest in understanding the asymptotic behavior of the largest eigenvalues (spikes) (Izenman, 2021) . This is motivated by the observation that spikes reflect the direction of largest variance and reserve the most critical information (Perry et al., 2018) . Since spikes also dominate the computation of d Eig , it is important to analyze their asymptotic behavior in our study.

3.1. ASYMPTOTICALLY STABLE BEHAVIOR

In this section, eigenvalue, vector and matrix are non-deterministic and indicate random eigenvalue (variable), random vector and random matrix unless stated otherwise. Firstly, we recall a canonical case where X = (x 1 , . . . , x n ) is a random matrix with IID entries. For j = 1, . . . , p and k = 1, . . . , n, let n ≍ p and x jk be the IID entries of X satisfying E|x jk | 2 = 1 and E|x jk | m ≤ c m for all fixed m ∈ N + . Similar to Sec. 2, we further assume Ex jk = 0. As shown in one of the pioneer studies (Marčenko & Pastur, 1967) , the asymptotic eigenvalue density of 1 n XX T can be well characterized with limiting Stieltjes transform. However, the assumptions of IID entries and diagonal covariance structure are very stringent and do not reflect real-world data statistics. Regarding GAN assessment as a concrete example, samples drawn from x j are usually representations obtained with the penultimate layer (pool3) of an Inception V3 model (Szegedy et al., 2016) . Therefore, x 1k , . . . x pk of x k can be dependent and have a more general covariance structure Σ. To resolve the gap, we assume Σ satisfies the stability condition as in (Bao et al., 2015, Condition 1.1 (iii) ) and propose to investigate Y = Σ 1 2 X, a linear transformation of X. Theorem 10. (Case of zero expectation: Y = Σ 1 2 X) Fix r and let λ1 ≥ • • • ≥ λr be the r largest eigenvalues of Q = 1 n YY T , then for any j = 1, . . . , r we can find deterministic θj such that for any (small) α > 0 and (big) β > 0, we have P(| λj -θj | ≥ n -2 3 +α ) ≤ c α,β n -β 7) for some constant c α,β independent of n, p. Discussion. Here, Thm. 10 is a direct result of the local density law (Knowles & Yin, 2017) . Note that 1 n EYY T = Σ holds true and we impose linear dependency among x 1k , . . . x pk of x k to approximate real-world scenario. In the meantime, x 1 , . . . , x n remain identically distributed, which reflects the key fact that data entries such as image representations of a GAN model are drawn from the same probability distribution. As learned representations commonly have non-zero expectations m ̸ = 0, we further introduce a deterministic rank-1 matrix M = me T to model this scenario. Here, e = (1, . . . , 1) T and m = dv satisfying d ≍ p ≍ n and ∥v∥ = 1. For Z = M + Σ 1 2 X, we have Lemma 11. (Case of non-zero expectation: Z = M + Σ 1 2 X) Fix r and let λ 1 ≥ • • • ≥ λ r be the r largest eigenvalues of S = 1 n ZZ T , then for any j = 1, . . . , r we can find deterministic θ j such that for any (small) α > 0 and (big) β > 0, we have P(|λ j -θ j | ≥ n -1 2 +α ) ≤ c α,β n -β 8) for some constant c α,β independent of n, p. Discussion. As shown in (Bai, 1999, Lemma 2. 2), the eigenvalue counting functions of Q and S differ by at most 1 p . Thus, λj+1 ≤ λ j ≤ λj-1 for j = 2, 3, ..., r. Then, the rigidity estimation of λ j for j = 2, 3, ..., r can be obtained by considering θ j := θj and applying Thm. 10. As to the case of j = 1, we consider Eq. 8 for λ 1 (≥ λ1 ) as a conjecture and leave the proof for future work. Together with discussions in Sec. 2, we illustrate both the numerical and asymptotic stability of d Eig . Remark 12. Similar to (Louart & Couillet, 2018 , Remark 0.1), the above lemma suggests a (abusive) definition of SCM S = 1 n ZZ T without subtracting the mean expectation. Taking S as the input of d Eig , we show that it is feasible for d Eig to quantify the distribution shift in follow-up GAN studies.

3.2. GAN STUDIES: d

Eig IS A SIMPLE ALTERNATIVE TO d FID . Recently, Parmar et al. (2022) discovered surprising subtleties of image pre-processing steps for downstream GAN evaluation. To faithfully benchmark the GAN performance of state-of-the-art (sota) models, the authors published new APIs to reproduce the evaluation results. Hence, the implementation of our GAN experiments is built on top of these APIs. Next, we summarize four key aspects of GAN evaluation that we examine in this study. 4 Scores. Similar to Parmar et al. (2022) , we take two widely-used scores d FID and d KID as baselines. Then, we investigate two variants of the proposed metric: d Eig (S 1 , S 2 ) 2 = p j=1 ( λ 1 j -λ 2 j ) 2 (Eq. 6) and d ′ Eig (S 1 , S 2 ) 2 = p j=1 ( λ1 j -λ2 j ) 2 + ∥m 1 -m 2 ∥ 2 . The λ i j of the former are eigenvalues of S i , and λi j of the latter are eigenvalues of S i -1 ni m i m T i , where m i is the sample mean. Differing from toy settings of Gaussian distribution (0, Σ) that lead to d Eig ≈ d FID ≈ 0 with sufficient data, we do not have such a theoretical limit or ground-truth score in GAN studies. As a workaround, we consider d FID to be the 'gold standard' score for analyzing d Eig . Without loss of accuracy, we take d KID × 10 3 and d Eig × 10 for clearer comparisons. 3 Models. To illustrate the strength of d Eig for challenging cases, we investigate three sota GAN models and probe their nuances when visual evaluations are non-trivial: StyleGAN2 with the recommended Config (Style2) (Karras et al., 2020a) , StyleGAN3 with translation equivariance Config (Style3t) and with translation and rotation equivariance Config (Style3r) (Karras et al., 2021) . 3 Interpolations. Following the practice of Parmar et al. (2022) , we also present results that are influenced by different image interpolations such as Clean (Clean), PyTorch_legacy (Py_legacy) and TensorFlow_legacy (TF_legacy). 5 Datasets. Lastly, we run thorough comparisons on commonly-used datasets including FFHQ, AFHQ, and LSUN (Horse, Church, Cat categories) for GAN model training. For each dataset, we generate 100k fake images and repeat each experiment 4 times by randomly sampling a given number of image entries from 100k fake images. In the following, we discuss the main results of our GAN studies. As displayed in Fig. 3 As displayed in Fig. 4 , we report the eigenvalue and eigenvector behaviors for the 10 largest spikes. The reported cutoffs were determined by the dominant percentage (> 80%) taken by these spikes compared to the complete spectrum. Notably, the 10 largest spikes present small fluctuations (std) obtained with four random seeds, which serves as complementary evidence to support the theoretical rigidity estimations discussed in Thm. 10 and Lem. 11. Except for the few cases marked with *, the largest cosine similarity is mostly obtained with the i-th largest eigenvector for both GAN and real images. If we decompose the distribution shift to scale shift (eigenvalue shift) and rotation shift (eigenvector shift), such results suggest that the dominant eigenvector shift is only determined by the cosine of the angle between them, and is not influenced by eigenvector permutation. By weighing the estimation challenges of eigenvectors, d Eig makes a meaningful trade-off that only takes eigenvalue differences into account. (Mathiasen & Hvilshøj, 2020) . When n 1 ≪ p, the researchers suggested to compute the eigenvalues of n 1 ×n 1 matrix Z T 1 Z 2 Z T 2 Z 1 to supervise the model training. Accordingly, the differences between our study and Fast d FID lie in the fact that we do not assume n 1 ≪ p and we compute S 1 S 2 = 1 n1n2 Z 1 Z T 1 Z 2 Z T 2 instead. Since a more precise justification w.r.t. eigenvalues was not present in Mathiasen & Hvilshøj (2020) , the key contributions of our theoretical analysis on d FID come from articulating the unique principal square root, diagonalizable S 1 S 2 with non-negative eigenvalues and proposing a tight asymptotic error bound.

Seddik et al. (2020).

From a RMT viewpoint, Seddik et al. ( 2020) studied the SCM of GAN image representations and argued that such representations behave asymptotically as if they are drawn from a Gaussian mixture. Following this insight, we further show that comparing sorted eigenvalues of SCMs is useful and efficient for measuring high-dimensional distribution shift, which is a novel and distinct contribution by our theoretical study.

5.1. EIGENVECTOR

In contrast to the improved d FID that implicitly takes eigenvectors of S i into account via the matrix multiplication S 1 S 2 , the proposed d Eig only measures the eigenvalue difference. Admittedly, the exclusion of eigenvectors in d Eig is mainly due to the disencouraging properties such as more loose numerical error bound (Anderson et al., 1999) and more strict conditions for distribution estimation (Knowles & Yin, 2013) . Nevertheless, eigenvectors carry plausibly critical information and should be carefully examined in subsequent work. Similar to existing measurements, d Eig remains a scalar-valued score for measuring high-dimensional distribution shifts. A more comprehensive quantification is still missing for applications in both the natural and medical image domains. Due to the inherent data heterogeneity and critical implications for real-world application, facilitating in-depth analysis of distribution shifts underlying high-dimensional images (or representations) is of key importance to support the development and application of high-quality data science approaches, e.g., in the medical domain (Yue et al., 2020; Cios & Moore, 2002) . In such a scenario where inaccurate analysis can have severe consequences, existing scalar-valued scores including d Eig is not sufficient. To resolve this issue, a direct follow-up on d Eig is to individually compare the eigenvalue difference along each dimension. Naturally, the scalar-valued d Eig is decomposed to a multi-dimensional vector-valued measurement and enables a more complete overview of data heterogeneity. In addition, the d Eig builds the bridge between the classical principal component analysis (PCA) (Abdi & Williams, 2010) and latent semantic understanding (Shen & Zhou, 2021; Härkönen et al., 2020) .

5.2. FUTURE STUDIES

Taking cancer studies as an example (Fremond et al., 2022; Wu et al., 2022a) , the fine-grained multidimensional analysis with d Eig could pave a promising way towards precise risk stratification by validating well-established and proposing novel features with prognostic importance in complex medical images. This can be concretely supported with biologically interpretable visualization examples generated by perturbing the largest eigenvalue(s)/eigenvector(s) in a given dataset of interest. This approach could thus be used to control for inherent variance in existing data repositories and generate prototypical examples of disease states such as highly aggressive tumors in radiological or pathological time series. In combination of comprehensive quantification and biologically meaningful visualization, d Eig thus adds a valuable tool for future work in the natural and medical image domains.



+α i



Figure 1: Python codes for the square of improved d FID and proposed d Eig .

sample covariance of 50k, 40k, …, 5k data VS ground-truth covariance. d 𝖥𝖨𝖣 d 𝖤𝗂𝗀 b. (Real-world comparison )The and scores computed between sample covariance of 50k, 40k, …, 5k data VS sample covariance of 50k data.

Figure 2: The toy studies of multivariate Gaussian distribution (0, Σ). Here, all the experiments are computed with Intel(R) i9-9940X CPU @ 3.30GHZ and repeated with four random seeds. Since the coefficient of variance std/mean < 0.01 for both d Eig and d FID , we only report the mean score in Plot a and b. Besides, sqrtm(), schur() and eigvals() achieve identical numerical results for computing d FID up to negligible rounding error. So do svdvals() and eigvalsh() for computing d Eig . Therefore, only two curves are presented in Plot a, b.

Figure 3: The main results of GAN studies. Here, 70k, 15803 and 50k real images for FFHQ, AFHQ and LSUN datasets resp. are applied to compute the reported scores.

(a, b), d Eig and d FID show similar evaluation curves and correlate well with each other in terms of different combinations of models and interpolations. When observing the convergence curve with an increasing amount of GAN generated images Fig.3 (d, f), we observe an identical behavior as in the toy studies. That is, d Eig is more favorable than d FID in the sense that it suffices to use a small amount of image entries to obtain a good estimation for d Eig . Similar claims can be made for the LSUN dataset. As shown in Fig.3(c), d Eig illustrates comparably increasing scores from the Horse to the Cat category, indicating less satisfying GAN generation results for the Cat images. Meanwhile, the convergence speed remains faster for d Eig compared to d FID (Fig.3 (f)). With regard to d ′ Eig , the variant of sorted eigenvalue comparison show less consistency with the gold standard d FID (See Fig.3 (a, b)) and is less desirable in our GAN studies. Based on the investigations of the four key aspects and theoretical advantages of d Eig discussed above, the proposed d Eig represents a simple alternative to d FID . By applying d Eig in GAN model evaluation, we take a critical step towards a more comprehensive analysis of high-dim distribution shift between two collections of image entries.

Figure 4: The eigenvalue fluctuation and eigenvector similarity for 10 largest spikes of d Eig .Here, all the experiments are obtained with Style2 under Clean configuration. The eigenvalue fluctuation (standard deviation) is obtained by repeating experiments with 4 random seeds. Also, we report the largest cosine similarity between the i-th largest eigenvector of GAN images and its counterpart of real images. The * indicates that the largest cosine similarity is not obtained between the the i-th largest eigenvectors of GAN and real images.

: d Eig MAY BE MORE COMPREHENSIVE AND INFORMATIVE THAN d FID .



The nuance between d FID and d Eig . Here, all the experiments are conducted with the FFHQ dataset.Lastly, we report a nuanced case when comparing d FID and d Eig . Fig. 5 shows that the face generalization performance w.r.t. d Eig tends to be improved from Style2 to Style3r and Style3t, which is not compatible with d FID . By imposing the translation and rotation equivariance in StyleGAN3, Karras et al. (2021) reported anti-aliasing improvements over StyleGAN2 by resolving the 'texture sticking' issue. Such clear visual improvements are supported by the decreasing d Eig scores. However, due to the lack of ground-truth, whether such a correlation between visual improvements and d Eig supports the effectiveness of d Eig remains inconclusive. Fast d FID . The role of eigenvalues played in d FID has been firstly noticed in the study of Fast Fréchet Inception Distance

