ACTIVE LEARNING IN BAYESIAN NEURAL NETWORKS WITH BALANCED ENTROPY LEARNING PRINCIPLE

Abstract

Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The infomax learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq 1 consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.

1. INTRODUCTION

Acquiring labeled data is challenging in many machine learning applications with limited budgets. As the dataset size gets bigger and bigger for training a complex model, labeling data by humans becomes more expensive. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The active learning problem is well-aligned with a subset selection problem that can find the most efficient but minimal subset from the data pool (Hochbaum, 1996; Nemhauser et al., 1978; Dvoretzky, 1961; Milman, 1971; Spielman & Teng, 2014; Spielman & Woo, 2009; Batson et al., 2009; Spielman & Srivastava, 2011) . The difference is that active learning is typically an iterative process where a model is trained and a collection of data points is selected to be labeled from an unlabelled data pool. It is well-known that any active learning method cannot improve the label complexity better than passive learning (random acquisition) in general (Vapnik & Chervonenkis, 1974; Kääriäinen, 2006; Castro & Nowak, 2008) . Under some conditions on labels or models, it is possible to achieve exponential savings (Balcan et al., 2007; Hanneke, 2007; Dasgupta et al., 2005; Hsu, 2010; Dekel et al., 2012; Hanneke, 2014; Zhang & Chaudhuri, 2014; Krishnamurthy et al., 2017; Shekhar et al., 2021; Puchkin & Zhivotovskiy, 2021) . Zhu & Nowak (2022b; a) recently proposed a provably



Code is available. https://github.com/jaeohwoo/BalancedEntropy 1

