RELIABILITY OF CKA AS A SIMILARITY MEASURE IN DEEP LEARNING

Abstract

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several sensitivities of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.

1. INTRODUCTION

In the last decade, increasingly complex deep learning models have dominated machine learning and have helped us solve, with remarkable accuracy, a multitude of tasks across a wide array of domains. Due to the size and flexibility of these models it has been challenging to study and understand exactly how they solve the tasks we use them on. A helpful framework for thinking about these models is that of representation learning, where we view artificial neural networks (ANNs) as learning increasingly complex internal representations as we go deeper through their layers. In practice, it is often of interest to analyze and compare the representations of multiple ANNs. However, the typical high dimensionality of ANN internal representation spaces makes this a fundamentally difficult task. To address this problem, the machine learning community has tried finding meaningful ways to compare ANN internal representations and various representation (dis)similarity measures have been proposed (Li et al., 2015; Wang et al., 2018; Raghu et al., 2017; Morcos et al., 2018) . Recently, Centered Kernel Alignment (CKA) (Kornblith et al., 2019) was proposed and shown to be able to reliably identify correspondences between representations in architecturally similar networks trained on the same dataset but from different initializations, unlike past methods such as linear regression or CCA based methods (Raghu et al., 2017; Morcos et al., 2018) . While CKA can capture different notions of similarity between points in representation space by using different kernel functions, it was empirically shown in the original work that there are no real benefits to using CKA with a nonlinear kernel over its linear counterpart. As a result, linear CKA has been the preferred representation similarity measure of the machine learning community in recent years and other similarity measures (including nonlinear CKA) are seldomly used. CKA has been utilized in a number of works to make conclusions regarding the similarity between different models and their behaviours such as wide versus deep ANNs (Nguyen et al., 2021) and transformer versus CNN based ANNs (Raghu et al., 2021) . They have also been used to draw conclusions about transfer learning (Neyshabur et al., 2020) and catastrophic forgetting (Ramasesh et al., 2021) . Due to this widespread use, it is important to understand how reliable the CKA similarity measure is and in what cases it fails to provide meaningful results. In this paper, we study CKA sensitivity to a class of simple transformations and show how CKA similarity values can be directly manipulated without noticeable changes in the model final output behaviour. In particular our contributions are as follows: In Sec. 3 and with Thm. 1 we characterize CKA sensitivity to a large class of simple transformations, which can naturally occur in ANNs. With Cor. 3 and 4 we extend our theoretical results to cover CKA sensitivity to outliers, which has been empirically observed in previous work (Nguyen et al., 2021; Ding et al., 2021; Nguyen et al., 2022) , and to transformations preserving linear separability of data, an important characteristic for generalization. Concretely, our theoretical contributions show how the CKA value between two copies of the same set of representations can be significantly decreased through simple, functionality preserving transformations of one of the two copies. In Sec. 4 we empirically analyze CKA's reliability, illustrating our theoretical results and subsequently presenting a general optimization procedure that allows the CKA value to be heavily manipulated to be either high or low without significant changes to the functional behaviour of the underlying ANNs. We use this to revisit previous findings (Nguyen et al., 2021; Kornblith et al., 2019) .

2. BACKGROUND ON CKA AND RELATED WORK

Comparing representations Let X ∈ R n×d1 denote a set of ANN internal representations, i.e., the neural activations of a specific layer with d 1 neurons in a network, in response to n ∈ N input examples. Let Y ∈ R n×d2 be another set of such representations generated by the same input examples but possibly at a different layer of the same, or different, deep learning model. It is standard practice to center these representations column-wise (feature or "neuron" wise) before analyzing them. We are interested in representation similarity measures, which try to capture a certain notion of similarity between X and Y . Quantifying similarity Li et al. ( 2015) have considered one-to-one, many-to-one and many-tomany mappings between neurons from different neural networks, found through activation correlation maximization. Wang et al. (2018) extended that work by providing a rigorous theory of neuron activation subspace match and algorithms to compute such matches between neurons. Alternatively, Raghu et al. (2017) introduced SVCCA where singular value decomposition is used to identify the most important directions in activation space. Canonical correlation analysis (CCA) is then applied to find maximally correlated singular vectors from the two sets of representations and the mean of the correlation coefficients is used as a similarity measure. In order to give less importance to directions corresponding to noise, Morcos et al. ( 2018) introduced projection weighted CCA (PWCCA). The PWCCA similarity measure corresponds to the weighted sum of the correlation coefficients, assigning more importance to directions in representation space contributing more to the output of the layer. Many other representation similarity measures have been proposed based on linear classifying probes (Alain & Bengio, 2016; Davari et al., 2022) , fixed points topology of internal dynamics in recurrent neural networks (Sussillo & Barak, 2013; Maheswaranathan et al., 2019) , solving the orthogonal Procrustes problem between sets of representations (Ding et al., 2021; Williams et al., 2021) and many more (Laakso & Cottrell, 2000; Lenc & Vedaldi, 2018; Arora et al., 2017) . We also note that a large body of neuroscience research has focused on comparing neural activation patterns in biological neural networks (Edelman, 1998; Kriegeskorte et al., 2008; Williams et al., 2021; Low et al., 2021) . CKA Centered Kernel Alignment (CKA) (Kornblith et al., 2019) is another such similarity measure based on the Hilbert-Schmidt Independence Criterion (HSIC) (Gretton et al., 2005) that was presented as a means to evaluate independence between random variables in a non-parametric way. For K i,j = k(x i , x j ) and L i,j = l(y i , y j ) where k, l are kernels and for H = I -1 n 11 ⊤ the centering matrix, HSIC can be written as: HSIC(K, L) = 1 (n-1) 2 tr(KHLH). CKA can then be computed as: CKA(K, L) = HSIC(K, L) HSIC(K, K)HSIC(L, L) In the linear case k and l are both the inner product so K = XX ⊤ , L = Y Y ⊤ and we use the notation CKA(X, Y ) = CKA(XX ⊤ , Y Y ⊤ ). Intuitively, HSIC computes the similarity structures of X and Y , as measured by the kernel matrices K and L, and then compares these similarity structures (after centering) by computing their alignment through the trace of KHLH. Recent CKA results CKA has been used in recent years to make many claims about neural network representations. Nguyen et al. (2021) used CKA to establish that parameter initialization drastically impact feature similarity and that the last layers of overparameterized (very wide or deep) models

