SIMILARITY OF NEURAL ARCHITECTURES BASED ON INPUT GRADIENT TRANSFERABILITY

Abstract

In this paper, we aim to design a quantitative similarity function between two neural architectures. Specifically, we define a model similarity using input gradient transferability. We generate adversarial samples of two networks and measure the average accuracy of the networks on adversarial samples of each other. If two networks are highly correlated, then the attack transferability will be high, resulting in high similarity. Using the similarity score, we investigate two topics: (1) Which network component contributes to the model diversity? (2) How does model diversity affect practical scenarios? We answer the first question by providing feature importance analysis and clustering analysis. The second question is validated by two different scenarios: model ensemble and knowledge distillation. We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers. Our findings show that model diversity takes a key role when interacting with different neural architectures. For example, we found that more diversity leads to better ensemble performance. We also observe that the relationship between teacher and student networks and distillation performance depends on the choice of the base architecture of the teacher and student networks. We expect our analysis tool helps a high-level understanding of differences between various neural architectures as well as practical guidance when using multiple architectures.

1. INTRODUCTION

The last couple of decades have seen the great success of deep neural networks (DNNs) in real-world applications, e.g., image classification (He et al., 2016a) and natural language processing (Vaswani et al., 2017) . The advances in the DNN architecture design have taken a key role in this success by making the learning process easier (e.g., normalization methods (Ioffe & Szegedy, 2015; Wu & He, 2018; Ba et al., 2016) or skip connection (He et al., 2016a) ) enforcing human inductive bias in the architecture (e.g., convolutional neural networks (CNNs) (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015) ) or increasing model capability with the self-attention mechanism (e.g., Transformers (Vaswani et al., 2017) ). With different design principles and architectural elements, there have been proposed a number of different neural architectures; however, designing a distinguishable architecture is expensive and needs heavy expertise. One of the reasons for the difficulty is that there is only a little knowledge of the difference between two different neural architectures. Especially, if one can quantify the similarity between two models, then we can measure which design component actually contributes significantly to diverse properties of neural networks. We also can utilize the quantity for new model design (e.g., by neural architecture search (NAS) (Zoph & Le, 2017)). In this paper, we aim to define the similarity between two networks to quantify the difference and diversity between neural architectures. Existing studies have focused on dissecting each network component layer-by-layer (Kornblith et al., 2019; Raghu et al., 2021) or providing a high-level understanding by visualization of loss surface (Dinh et al., 2017 ), input gradient (Springenberg et al., 2015; Smilkov et al., 2017) , or decision boundary (Somepalli et al., 2022) . On the other hand, we aim to design an architecture agnostic and quantitative score to measure the difference between the two architectures. We especially focus on the input gradients, a widely-used framework to understand model behavior, e.g., how a model will change predictions by local pixel changes (Sung, 1998; Simonyan et al., 2014; Springenberg et al., 2015; Smilkov et al., 2017; Sundararajan et al., 2017; Bansal et al., 2020; Choe et al., 2022) . If two models are similar, then the input gradients are similar. However, because an input gradient is very noisy, directly measuring the difference between input

