DEMYSTIFYING BLACK-BOX DNN TRAINING PRO-CESSES THROUGH CONCEPT-MONITOR

Abstract

Despite the successes of deep neural networks (DNNs) on a broad range of tasks little has been understood of why and how they achieve such victories due to their complex architecture and their opaque black-box training processes. With the goal to unveil the mystery of DNNs, in this work, we propose a general framework called Concept-Monitor to uncover the black-box DNN training processes automatically for the first time. Our proposed Concept-Monitor enables humaninterpretable visualization of the DNN training processes and thus facilitates transparency as well as deeper understanding on how DNNs function and operate along the training iterations. Using Concept-Monitor, we are able to observe and compare different training paradigms at ease, including standard training, finetuning, adversarial training and network pruning for Lottery Ticket Hypothesis, which brings new insights on why and how adversarial training and network pruning work and how they modify the network during training. For example, we find that the lottery ticket hypothesis discovers a mask that makes neurons interpretable at initialization, without any finetuning, and we also found that adversarially robust models have more neurons relying on color as compared to standard models trained on the same dataset.

1. INTRODUCTION

Unprecedented success of deep learning have lead to their rapid applications to a wide range of tasks; however, deep neural networks (DNNs) are also known to be black-box and non-interpretable. To deploy these deep neural network (DNN) models into real-world applications, especially for the safety-critical applications such as healthcare and autonomous driving, it is imperative for us to understand what is going behind the black box. There have been a proliferation of research efforts towards interpretating DNNs and they can be mainly divided into two categories: the first approach focuses on attributing DNN's prediction to the importance of individual-input and identify which pixels or features are important (Zhou et al., 2016; Selvaraju et al., 2019; Sundararajan et al., 2017; Smilkov et al., 2017) while the other approach investigates the functionalities (known as concept) of each individual-neuron (Bau et al., 2017a; Mu & Andreas, 2020; Oikarinen & Weng, 2022) . However, most of these methods only focus on examining a DNN model after it has been trained, and therefore missing out useful information that could be available in the training process. For example, for a deep learning researcher and engineer, it would be very useful to know: What are the concepts learned by the DNN model and how has the DNN model learnt the concepts along the training process? The answer to the above question would be useful in two-fold: (i) it can shed light on why and how DNNs can achieve great success, which could be helpful to inspire new DNN training algorithms; (ii) it can also help to debug DNNs and prevent catastrophic failure if anything goes wrong. Motivated by the above question, it is the main goal of this work to develop a novel framework Concept-Monitor, which makes the black-box DNNs training process become transparent and human-understandable. Our proposed Concept-Monitor is scalable and automated -which are crucial to demystify the opaque DNN training process efficiently and help researchers better understand the training dynamics of the model. More formally, in this paper we provide the following contributions: 1

