RELIABILITY OF CKA AS A SIMILARITY MEASURE IN DEEP LEARNING

Abstract

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several sensitivities of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.

1. INTRODUCTION

In the last decade, increasingly complex deep learning models have dominated machine learning and have helped us solve, with remarkable accuracy, a multitude of tasks across a wide array of domains. Due to the size and flexibility of these models it has been challenging to study and understand exactly how they solve the tasks we use them on. A helpful framework for thinking about these models is that of representation learning, where we view artificial neural networks (ANNs) as learning increasingly complex internal representations as we go deeper through their layers. In practice, it is often of interest to analyze and compare the representations of multiple ANNs. However, the typical high dimensionality of ANN internal representation spaces makes this a fundamentally difficult task. To address this problem, the machine learning community has tried finding meaningful ways to compare ANN internal representations and various representation (dis)similarity measures have been proposed (Li et al., 2015; Wang et al., 2018; Raghu et al., 2017; Morcos et al., 2018) . Recently, Centered Kernel Alignment (CKA) (Kornblith et al., 2019) was proposed and shown to be able to reliably identify correspondences between representations in architecturally similar networks trained on the same dataset but from different initializations, unlike past methods such as linear regression or CCA based methods (Raghu et al., 2017; Morcos et al., 2018) . While CKA can capture different notions of similarity between points in representation space by using different kernel functions, it was empirically shown in the original work that there are no real benefits to using CKA with a nonlinear kernel over its linear counterpart. As a result, linear CKA has been the preferred representation similarity measure of the machine learning community in recent years and other similarity measures (including nonlinear CKA) are seldomly used. CKA has been utilized in a number of works to make conclusions regarding the similarity between different models and their behaviours such as wide versus deep ANNs (Nguyen et al., 2021) and transformer versus CNN based ANNs (Raghu et al., 2021) . They have also been used to draw conclusions about transfer learning (Neyshabur et al., 2020) and catastrophic forgetting (Ramasesh et al., 2021) . Due to this widespread use, it is important to

