CLIP-DISSECT: AUTOMATIC DESCRIPTION OF NEU-RON REPRESENTATIONS IN DEEP VISION NETWORKS

Abstract

In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples. We show that CLIP-Dissect provides more accurate descriptions than existing methods for last layer neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and can label all neurons from five layers of ResNet-50 in just 4 minutes, which is more than 10× faster than existing methods. Our code is available at https://github.com/Trustworthy-ML-Lab/CLIPdissect.

1. INTRODUCTION

Deep neural networks (DNNs) have demonstrated unprecedented performance in various machine learning tasks spanning computer vision, natural language processing and application domains such as healthcare and autonomous driving. However, due to their complex structure, it has been challenging to understand why and how DNNs achieve such great success across numerous tasks. Understanding how the trained DNNs operate is essential to trust their deployment in safety-critical tasks and can help reveal important failure cases or biases of a given model. One way towards understanding DNNs is to inspect the functionality of individual neurons, which is the focus of our work. This includes methods based on manual inspection (Erhan et al., 2009; Zeiler & Fergus, 2014; Zhou et al., 2015; Olah et al., 2017; 2020; Goh et al., 2021) , which provide high quality explanations and understanding of the network but require large amounts of manual effort. To address this issue, researchers have developed automated methods to describe the functionality of individual neurons, such as Network Dissection (Bau et al., 2017) and Compositional Explanations (Mu & Andreas, 2020). In (Bau et al., 2017) , the authors first created a new dataset named Broden with pixel labels associated with a pre-determined set of concepts, and then use Broden to find neurons whose activation pattern matches with that of a pre-defined concept. In (Mu & Andreas, 2020), the authors further extend Network Dissection to detect more complex concepts that are logical compositions of the concepts in Broden. Although these methods based on Network Dissection can provide accurate labels in some cases, they have a few major limitations: (1) They require a densely annotated dataset, which is expensive and requires significant amount of human labor to collect; (2) They can only detect concepts from the fixed concept set which may not cover the important concepts for some networks, and it is difficult to expand this concept set as each concept requires corresponding pixel-wise labeled data. To address the above limitations, we propose CLIP-Dissect, a novel method to automatically dissect DNNs with unrestricted concepts without the need of any concept labeled data. Our method is training-free and leverages pre-trained multi-modal models such as CLIP (Radford et al., 2021) to efficiently identify the functionality of individual neuron units. We show that CLIP-Dissect (i) provides high quality descriptions for internal neurons, (ii) is more accurate at labeling final layer neurons where we know the ground truth, and (iii) is 10×-200× more computationally efficient

