CAUSAL TESTING OF REPRESENTATION SIMILARITY METRICS Anonymous

Abstract

Representation similarity metrics are widely used to compare learned representations in neural networks, as is evident in extensive literature investigating metrics that accurately captures information encoded in the network. However, aiming to capture all of the information available in the network may have little to do with what information is actually used by the network. One solution is to experiment with causal measures of function. By ablating groups of units thought to carry information and observing whether those ablations affect network performance, we can focus on an outcome that causally links representations to function. In this paper, we systematically test representation similarity metrics to evaluate their sensitivity to causal functional changes induced by ablation. We use network performance changes after ablation as way to causally measure the influence of representation on function. These measures of function allow us to test how well similarity metrics capture changes in network performance versus changes to linear decodability. Network performance measures index the information used by the network, while linear decoding methods index available information in the representation. We show that all of the tested metrics are more sensitive to decodable features than network performance. Within these metrics, Procrustes and CKA outperform regularized CCA-based methods on average. Although Procrustes and CKA outperform on average, for AlexNet, Procrustes and CKA no longer outperform CCA methods when looking at network performance. We provide causal tests of the utility of different representational similarity metrics. Our results suggest that interpretability methods will be more effective if they are based on representational similarity metrics that have been evaluated using causal tests.

1. INTRODUCTION

Neural networks already play a critical role in systems where understanding and interpretation are paramount like in self-driving cars and the criminal justice system. To understand and interpret neural networks, representation similarity metrics have been used to compare learned representations between and across networks (Kornblith et al. ( 2019 What it means for two representations to be similar, however, is not straightforward. Many similarity metrics have been proposed with different underlying assumptions and strategies for comparing representation spaces. For example, some similarity metrics are invariant under linear transformations while others are not (see Kornblith et al. (2019) for a theoretical comparison). These different assumptions and strategies can lead to quantitatively different predictions. For instance, Ding et al. (2021) show that certain metrics are insensitive to changes to the decodable information present in representations. In another study, Davari et al. (2022) demonstrate that the centered kernel alignment metric predicts a high similarity between random and fully trained representations. It is therefore



); Raghu et al. (2017); Morcos et al. (2018b); Wang et al. (2018); Li et al. (2015); Feng et al. (2020); Nguyen et al. (2020)). Using these similarity metrics, researchers evaluate whether networks trained from different random initializations learn the same information, whether different layers learn redundant or complementary information, and how different training data affect learning (Kornblith et al. (2019); Li et al. (2015); Wang et al.(2018)). Apart from helping to answer these fundamental questions, similarity metrics have the potential to provide a general-purpose metric over representationsBoix-Adsera et al. (2022).

