SCALABLE BATCH-MODE DEEP BAYESIAN ACTIVE LEARNING VIA EQUIVALENCE CLASS ANNEALING

Abstract

Active learning has demonstrated data efficiency in many fields. Existing active learning algorithms, especially in the context of batch-mode deep Bayesian active models, rely heavily on the quality of uncertainty estimations of the model, and are often challenging to scale to large batches. In this paper, we propose Batch-BALANCE, a scalable batch-mode active learning algorithm, which combines insights from decision-theoretic active learning, combinatorial information measure, and diversity sampling. At its core, Batch-BALANCE relies on a novel decision-theoretic acquisition function that facilitates differentiation among different equivalence classes. Intuitively, each equivalence class consists of hypotheses (e.g., posterior samples of deep neural networks) with similar predictions, and Batch-BALANCE adaptively adjusts the size of the equivalence classes as learning progresses. To scale up the computation of queries to large batches, we further propose an efficient batch-mode acquisition procedure, which aims to maximize a novel information measure defined through the acquisition function. We show that our algorithm can effectively handle realistic multi-class classification tasks, and achieves compelling performance on several benchmark datasets for active learning under both low-and large-batch regimes. Reference code is released at https://github.com/zhangrenyuuchicago/BALanCe.

1. INTRODUCTION

Active learning (AL) (Settles, 2012) characterizes a collection of techniques that efficiently select data for training machine learning models. In the pool-based setting, an active learner selectively queries the labels of data points from a pool of unlabeled examples and incurs a certain cost for each label obtained. The goal is to minimize the total cost while achieving a target level of performance. A common practice for AL is to devise efficient surrogates, aka acquisition functions, to assess the effectiveness of unlabeled data points in the pool. There has been a vast body of literature and empirical studies (Huang et al., 2010; Houlsby et al., 2011; Wang & Ye, 2015; Hsu & Lin, 2015; Huang et al., 2016; Sener & Savarese, 2017; Ducoffe & Precioso, 2018; Ash et al., 2019; Liu et al., 2020; Yan et al., 2020) suggesting a variety of heuristics as potential acquisition functions for AL. Among these methods, Bayesian Active Learning by Disagreement (BALD) (Houlsby et al., 2011) has attained notable success in the context of deep Bayesian AL, while maintaining the expressiveness of Bayesian models (Gal et al., 2017; Janz et al., 2017; Shen et al., 2017) . Concretely, BALD relies on a most informative selection (MIS) strategy-a classical heuristic that dates back to Lindley (1956)-which greedily queries the data point exhibiting the maximal mutual information with the model parameters at each iteration. Despite the overwhelming popularity of such heuristics due to the algorithmic simplicity (MacKay, 1992; Chen et al., 2015; Gal & Ghahramani, 2016) , the performance of these AL algorithms unfortunately is sensitive to the quality of uncertainty estimations of the underlying model, and it remains an open problem in deep AL to accurately quantify the model uncertainty, due to limited access to training data and the challenge of posterior estimation. 1

