CONTRASIM -A SIMILARITY MEASURE BASED ON CONTRASTIVE LEARNING

Abstract

Recent work has compared neural network representations via similarity-based analyses, shedding light on how different aspects (architecture, training data, etc.) affect models' internal representations. The quality of a similarity measure is typically evaluated by its success in assigning a high score to representations that are expected to be matched. However, existing similarity measures perform mediocrely on standard benchmarks. In this work, we develop a new similarity measure, dubbed ContraSim, based on contrastive learning. In contrast to common closed-form similarity measures, ContraSim learns a parameterized measure by using both similar and dissimilar examples. We perform an extensive experimental evaluation of our method, with both language and vision models, on the standard layer prediction benchmark and two new benchmarks that we develop: the multilingual benchmark and the image-caption benchmark. In all cases, Con-traSim achieves much higher accuracy than previous similarity measures, even when presented with challenging examples.

1. INTRODUCTION

Representation learning is a key property in deep neural networks. But how can we assess the similarity of representations learned by two models? A recent line of work is concerned with developing similarity measures and using them to analyze the models' internal representations. Similarity-based analyses may shed light on how different datasets, architectures, etc., change the model's learned representations. For example, a similarity analysis showed that lower layers in different models are more similar to each other, while fine-tuning affects mostly the top layers (Wu et al., 2020) . Various similarity measures have been proposed for comparing representations, among them the most popular ones are based on centered kernel alignment (CKA) (Kornblith et al., 2019) and canonical correlation analysis (CCA) (Hotelling, 1936; Morcos et al., 2018) . They all share a similar methodology: given a pair of feature representations of the same input, they estimate the similarity between them, without considering other examples. However, they all perform mediocrely on standard benchmarks. Motivated by that, we propose a new learnable similarity measure. In this paper, we introduce ContraSim, a new similarity measure, based on contrastive learning (CL) (Chen et al., 2020; He et al., 2020) . In contrast to prior work, which defines closed-form generalpurpose similarity measures, ContraSim is a task-specific learnable similarity measure that uses examples a high similarity (the positive set) and examples that have a low similarity (the negative set), to train an encoder that maps representations to the space where similarity is measured. In the projected space, we maximize the representation similarity with examples from the positive set, and minimize it with examples from the negative set. We experimentally evaluate ContraSim on one standard similarity metrics benchmark and two new benchmarks we introduce in this paper, and demonstrate its superiority compared to common similarity measures. First, we use the known layer prediction benchmark (Kornblith et al., 2019) , which assesses whether high similarity is assigned to two architecturally-corresponding layers in two models differing only in their weight initialization. Second, in our proposed multilingual benchmark, we assume a multilingual model and a parallel dataset of translations in two languages. A good similarity measure should assign a higher similarity to the (multi-lingual) representations of a sentence in language A and its translation in language B, compared to the similarity of the same sentence in language A and a random sentence in language B. Third, we design the image-caption benchmark, based on a similar idea. Given an image and its text caption, and correspondingly a vision model and a language model, a good similarity measure should assign a high similarity to representations of the image and its caption, compared to the similarity of the same image and a random caption. In both of our new benchmarks, we investigate a more challenging scenario, where instead of choosing a random sentence, we retrieve highly similar sentences as confusing examples, using the Facebook AI Similarity Search (FAISS) library (Johnson et al., 2019) . While other similarity measures are highly affected by this change, our method maintains a high accuracy with a very small degradation. We attribute this to the highly separable representations that our method learns. Finally, in all benchmarks, we show that if we change the training procedure of the encoder to only maximize the similarity of similar examples, the projected representations have poor separation, indicating that the CL procedure is a crucial part of the method's success. In summary, this work makes the following contributions: • We introduce a new similarity measure -ContraSim. Inspired by contrastive learning, it uses positive and negative sets to train an encoder that maps representations to the space where similarity is measured. • We propose two new benchmarks for the evaluation of similarity measures: the multilingual benchmark and the image-caption benchmark. • We show that ContraSim outperforms existing similarity measures in all benchmarks, and maintains a high accuracy even when faced with more challenging examples.

2. RELATED WORK

Comparing different models allows one to analyze how different aspects like network architecture, training set, and model size affect the model's learned representations. For instance, Kornblith et al. ( 2019) showed that adding too many layers to a convolutional neural network, trained for image classification, hurts its performance. Using CKA, they found that more than half of the network's layers are very similar to the last. They further found that two models trained on different image datasets (CIFAR-10 and CIFAR-100, Krizhevsky et al. 2009 ) learn representations that are similar in the shallow layers. Similar findings were noted for language models by Wu et al. (2020) . The latter also evaluated the effect of fine-tuning on language models, and found that the top layers are most affected by fine-tuning. Investigating the effect of layer width, Kornblith et al. (2019) and Morcos et al. (2018) found that increasing the model's layer width results in more similar representations between models, and that networks are generally more similar to networks with the same layer width than to networks with a relatively larger width. Raghu et al. (2017) All prior work computes similarity only between examples that are similar, using functional closedform measures. In contrast, we utilize both positive and negative samples in a learnable similarity measure, which allows adaptation to specific tasks.

3. PROBLEM SETUP

Let X = {(x (i) 1 , x 2 )} N i=1 denote a set of N examples, and A = {(a (i) 1 , a 2 )} N i=1 the set of representations generated for the examples in X. A representation is a high-dimensional vector of neuron activations. Representations may be created by the same or different models, by different layers of



provided an interpretation of the learning process by comparing the similarity of representations at some layer during the training process compared to the final representations. They found that networks converge from bottom to top, i.e., layers closer to the input converge to their final representation faster than deeper layers. Based on that insight, they proposed frozen training, where they successively freeze lower layers during training, updating only the deeper layers. They found that frozen training leads to classifiers with a higher generalization. Cianfarani et al. (2022) used similarity measures to analyze the effect of adversarial training on deep neural networks trained for image classification. Using CKA, they compared representations of adversarially trained neural networks with representations of regularly trained and discovered that adversarial examples have little effect on early layers. They further found that deeper layers overfit during adversarial training. Moreover, they found high similarity between representations of adversarial images generated with different threat model.

