DEMYSTIFYING BLACK-BOX DNN TRAINING PRO-CESSES THROUGH CONCEPT-MONITOR

Abstract

Despite the successes of deep neural networks (DNNs) on a broad range of tasks little has been understood of why and how they achieve such victories due to their complex architecture and their opaque black-box training processes. With the goal to unveil the mystery of DNNs, in this work, we propose a general framework called Concept-Monitor to uncover the black-box DNN training processes automatically for the first time. Our proposed Concept-Monitor enables humaninterpretable visualization of the DNN training processes and thus facilitates transparency as well as deeper understanding on how DNNs function and operate along the training iterations. Using Concept-Monitor, we are able to observe and compare different training paradigms at ease, including standard training, finetuning, adversarial training and network pruning for Lottery Ticket Hypothesis, which brings new insights on why and how adversarial training and network pruning work and how they modify the network during training. For example, we find that the lottery ticket hypothesis discovers a mask that makes neurons interpretable at initialization, without any finetuning, and we also found that adversarially robust models have more neurons relying on color as compared to standard models trained on the same dataset.

1. INTRODUCTION

Unprecedented success of deep learning have lead to their rapid applications to a wide range of tasks; however, deep neural networks (DNNs) are also known to be black-box and non-interpretable. To deploy these deep neural network (DNN) models into real-world applications, especially for the safety-critical applications such as healthcare and autonomous driving, it is imperative for us to understand what is going behind the black box. There have been a proliferation of research efforts towards interpretating DNNs and they can be mainly divided into two categories: the first approach focuses on attributing DNN's prediction to the importance of individual-input and identify which pixels or features are important (Zhou et al., 2016; Selvaraju et al., 2019; Sundararajan et al., 2017; Smilkov et al., 2017) while the other approach investigates the functionalities (known as concept) of each individual-neuron (Bau et al., 2017a; Mu & Andreas, 2020; Oikarinen & Weng, 2022) . However, most of these methods only focus on examining a DNN model after it has been trained, and therefore missing out useful information that could be available in the training process. For example, for a deep learning researcher and engineer, it would be very useful to know: Recently, there has been a great interest towards understanding deep neural network models at the neuron-level, which is different from mainstream methods that focus on interpreting individual decisions through the input features and pixels (Ribeiro et al., 2016; Lundberg & Lee, 2017; Selvaraju 



What are the concepts learned by the DNN model and how has the DNN model learnt the concepts along the training process?The answer to the above question would be useful in two-fold: (i) it can shed light on why and how DNNs can achieve great success, which could be helpful to inspire new DNN training algorithms; (ii) it can also help to debug DNNs and prevent catastrophic failure if anything goes wrong. Motivated by the above question, it is the main goal of this work to develop a novel framework Concept-Monitor, which makes the black-box DNNs training process become transparent and human-understandable. Our proposed Concept-Monitor is scalable and automated -which are crucial to demystify the opaque DNN training process efficiently and help researchers better understand the training dynamics of the model. More formally, in this paper we provide the following contributions:• We propose a general framework Concept-Monitor, which is the first automatic and efficient pipeline to make the black-box neural network training transparent and interpretable.Our pipeline monitors and tracks the training progress with human-interpretable concepts which provide useful statistics and insights of the DNN model being trained• We develop a novel universal embedding space which allows us to efficiently track how the neurons' concepts evolve and visualize their semantic evolution through out the training process without the need to re-learn an embedding space proposed in prior work. • We provide four case studies to analyze various deep learning training paradigms, including training standard deep vision models, the mysterious lottery ticket hypothesis, adversarial robust training and fine-tuning on a medical dataset. With Concept-Monitor, we are able to discover new insights into the obscure training process that helps explain some of the empirical observations and hypothesis of the black-box deep learning through the lens of interpretability.

Figure 1: Our proposed Concept-Monitor is automated, scalable, training-free and makes DNN training process transparent and human understandable.

Figure 2: Visualizing the concept evolution of Neuron 256 (blue) and Neuron 479 in Layer 4 (purple) for standard training of Resnet-18 model on Places365 dataset using Concept-Monitor

