QUANTIFYING TASK COMPLEXITY THROUGH GENER-ALIZED INFORMATION MEASURES

Abstract

How can we measure the "complexity" of a learning task so that we can compare one task to another? From classical information theory, we know that entropy is a useful measure of the complexity of a random variable and provides a lower bound on the minimum expected number of bits needed for transmitting its state. In this paper, we propose to measure the complexity of a learning task by the minimum expected number of questions that need to be answered to solve the task. For example, the minimum expected number of patches that need to be observed to classify FashionMNIST images. We prove several properties of the proposed complexity measure, including connections with classical entropy and sub-additivity for multiple tasks. As the computation of the minimum expected number of questions is generally intractable, we propose a greedy procedure called "information pursuit" (IP), which selects one question at a time depending on previous questions and their answers. This requires learning a probabilistic generative model relating data and questions to the task, for which we employ variational autoencoders and normalizing flows. We illustrate the usefulness of the proposed measure on various binary image classification tasks using image patches as the query set. Our results indicate that the complexity of a classification task increases as signal-to-noise ratio decreases, and that classification of the KMNIST dataset is more complex than classification of the FashionMNIST dataset. As a byproduct of choosing patches as queries, our approach also provides a principled way of determining which pixels in an image are most informative for a task.

1. INTRODUCTION

Deep networks have shown remarkable progress in both simple and complex machine learning tasks. But how does one measure the "complexity" of a learning task? Is it possible to ascertain in a principled manner which tasks are "harder" to solve than others? How "close" is one task to another? Answers to these questions would have implications in many fields of machine learning such as transfer learning, multi-task learning, un/semi/self-supervised learning, and domain adaptation. In classical information theory, the entropy of a random variable X is a useful measure of complexity for tasks such as compression and transmission, which essentially require reconstructing X. However, the entropy of X is insufficient for measuring the complexity of a supervised learning task T X,Y , where the goal is to predict an output Y from an input X, i.e., to estimate the conditional p Y |X (y | x) from a finite set of samples from p XY (x, y), which we refer to as solving the learning task. Complexity measures provided by statistical learning theory like VC-dimension or covering numbers are also inadequate for this purpose because they ignore the dependence between X and Y for the particular task at hand. Information-theoretic measures such as mutual information, information bottleneck (Tishby et al., 2000) and its variants (Strouse & Schwab, 2017) have been used to study the trade-off between model complexity and accuracy, but have not been developed to focus on assessing task complexity and can provide unsatisfactory results when comparing different tasks (see Section 5 for details). Measures based on Kolmogorov complexity (Li, 2006; Vereshchagin & Vitányi, 2004 ) could in principle be used to compare different tasks, but they are dataset permutation sensitive and not easily computable. The work of (Achille et al., 2019a) proposes to quantify task complexity by measuring the information stored on the network weights, but the approach depends on the specific neural network architecture used for training. The work of (Tran et al., 2019) does not require or assume trained models, but makes strict assumptions that limit its broad applicability.

