MODEL INFORMATION AS AN ANALYSIS TOOL IN DEEP LEARNING

Abstract

Information-theoretic perspectives can provide an alternative dimension of analyzing the learning process and complements usual performance metrics. Recently several works proposed methods for quantifying information content in a model (which we refer to as "model information"). We demonstrate using model information as a general analysis tool to gain insight into problems that arise in deep learning. By utilizing model information in different scenarios with different control variables, we are able to adapt model information to analyze fundamental elements of learning, i.e., task, data, model, and algorithm. We provide an example in each domain that model information is used as a tool to provide new solutions to problems or to gain insight into the nature of the particular learning setting. These examples help to illustrate the versatility and potential utility of model information as an analysis tool in deep learning.

1. INTRODUCTION

The ultimate goal of many deep learning research has been improving performance on specific datasets, for example, aiming for superior classification accuracy on the ILSVRC challenge. We have witnessed super-human performance on tasks in vision and language processing, but we are still far from understanding how the learning process works and whether they resemble the learning of human. This is partially due to the metric we use providing too little information on the dynamics of learning. Besides, performance on the test set as a sole goal can sometimes even lead to undesirable outcomes or misleading conclusions (Lipton & Steinhardt, 2019) . Recently, several works propose to use the description length of a model (or surrogate estimations thereof) to understand the behavior of learning. In this paper, we refer to such measures of the amount of information content in a model as model information. Blier & Ollivier (2018) first demonstrated efficiently encoding a deep neural network with prequential coding technique. Zhang et al. (2020) then proposed an approximation of model information and utilized model information to analyze the information content in a task. They also used model information to explain phenomenons in transfer learning and continual learning. They showed that model information provides a different perspective than performance, directly characterizing the information transfer in the learning process. Voita & Titov (2020) used model information as a probe to analyze what kind of information is present in a text representation model. They claim that model information is more informative and stable than performance metrics when used as a probe. Model information can provide an informational perspective to learning. It can potentially help to answer questions about learning dynamics, such as how much information exists in a dataset or model, or how much information is transferred in a learning step. Furthermore, we can also reformulate existing problems into problems about information, for example, similarity and capacity, as we will show in this paper. Comparing model information with model performance, model information not only accounts for how good a model can perform but also how fast it learns to perform well (in the sense of sample efficiency, a discussion is given by Yogatama et al. ( 2019)). The latter can be interpreted as related to the quantity of information transferred in model training. In this paper, we try to illustrate that model information can provide a framework for analyzing and understanding phenomena in deep learning. In the next section, we provide a general definition of model information, independent of how model information is estimated. We then unify the analysis of fundamental elements of deep learning under the framework of model information. In the

