ON THE GEOMETRY OF DEEP BAYESIAN ACTIVE LEARNING Anonymous authors Paper under double-blind review

Abstract

We present geometric Bayesian active learning by disagreements (GBALD), a framework that performs BALD on its geometric interpretation interacting with a deep learning model. There are two main components in GBALD: initial acquisitions based on core-set construction and model uncertainty estimation with those initial acquisitions. Our key innovation is to construct the core-set on an ellipsoid, not typical sphere, preventing its updates towards the boundary regions of the distributions. Main improvements over BALD are twofold: relieving sensitivity to uninformative prior and reducing redundant information of model uncertainty. To guarantee the improvements, our generalization analysis proves that, compared to typical Bayesian spherical interpretation, geodesic search with ellipsoid can derive a tighter lower error bound and achieve higher probability to obtain a nearly zero error. Experiments on acquisitions with several scenarios demonstrate that, yielding slight perturbations to noisy and repeated samples, GBALD further achieves significant accuracy improvements than BALD, BatchBALD and other baselines.

1. INTRODUCTION

Lack of training labels restricts the performance of deep neural networks (DNNs), though prices of GPU resources were falling fast. Recently, leveraging the abundance of unlabeled data has become a potential solution to relieve this bottleneck whereby expert knowledge is involved to annotate those unlabeled data. In such setting, the deep learning community introduced active learning (AL) (Gal et al., 2017) that, maximizing the model uncertainty (Ashukha et al., 2019; Lakshminarayanan et al., 2017) to acquire a set of highly informative or representative unlabeled data, and solicit experts' annotations. During this AL process, the learning model tries to achieve a desired accuracy using minimal data labeling. Recent shift of model uncertainty in many fields, such as Bayesian neural networks (Blundell et al., 2015) , Monte-Carlo (MC) dropout (Gal & Ghahramani, 2016) , and Bayesian core-set construction (Sener & Savarese, 2018) , shows that, new scenarios arise from deep Bayesian AL (Pinsler et al., 2019; Kirsch et al., 2019) . Bayesian AL (Golovin et al., 2010; Jedoui et al., 2019) presents an expressive probabilistic interpretation on model uncertainty (Gal & Ghahramani, 2016) . Theoretically, for a simple regression model such as linear, logistic, and probit, AL can derive their closed-forms on updating one sparse subset that maximally reduces the uncertainty of the posteriors over the regression parameters (Pinsler et al., 2019) . However, for a DNN model, optimizing massive training parameters is not easily tractable. It is thus that Bayesian approximation provides alternatives including importance sampling (Doucet et al., 2000) and Frank-Wolfe optimization (Vavasis, 1992) . With importance sampling, a typical approach is to express the information gain in terms of the predictive entropy over the model, and it is called Bayesian active learning by disagreements (BALD) (Houlsby et al., 2011) . BALD has two interpretations: model uncertainty estimation and core-set construction. To estimate the model uncertainty, a greedy strategy is applied to select those data that maximize the parameter disagreements between the current training model and its subsequent updates as (Gal et al., 2017) . However, naively interacting with BALD using uninformative prior (Strachan & Van Dijk, 2003) (Price & Manson, 2002) , which can be created to reflect a balance among outcomes when no information is available, leads to unstable biased acquisitions (Gao et al., 2020) , e.g. insufficient prior labels. Moreover, the similarity or consistency of those acquisitions to the previous acquired samples, brings redundant information to the model and decelerates its training. Core-set construction (Campbell & Broderick, 2018) avoids the greedy interaction to the model by capturing characteristics of the data distributions. By modeling the complete data posterior over the distributions of parameters, BALD can be deemed as a core-set construction process on a sphere (Kirsch et al., 2019) , which seamlessly solicits a compact subset to approximate the input data distribution, and efficiently mitigates the sensitivity to uninformative prior and redundant information. From the view of geometry, updates of core-set construction is usually optimized with sphere geodesic as (Nie et al., 2013; Wang et al., 2019) . Once the core-set is obtained, deep AL immediately seeks annotations from experts and starts the training. However, data points located at the boundary regions of the distribution, usually win uniform distribution, cannot be highly-representative candidates for the core-set. Therefore, constructing the coreset on a sphere may not be the optimal choice for deep AL. This paper presents a novel AL framework, namely Geometric BALD (GBALD), over the geometric interpretation of BALD that, interpreting BALD with core-set construction on an ellipsoid, initializes an effective representation to drive a DNN model. The goal is to seek for significant accuracy improvements against an uninformative prior and redundant information. Figure 1 describes this two-stage framework. In the first stage, geometric core-set construction on an ellipsoid initializes effective acquisitions to start a DNN model regardless of the uninformative prior. Taking the core-set as the input features, the next stage ranks the batch acquisitions of model uncertainty according to their geometric representativeness, and then solicits some highly-representative examples from the batch. With the representation constraints, the ranked acquisitions reduce the probability of sampling nearby samples of the previous acquisitions, preventing redundant acquisitions. To guarantee the improvement, our generalization analysis shows that, the lower bound of generalization errors of AL with the ellipsoid is proven to be tighter than that of AL with the sphere. Achieving a nearly zero generalization error by AL with ellipsoid is also proven to have higher probability. Contributions of this paper can be summarized from Geometric, Algorithmic, and Theoretical perspectives. • Geometrically, our key innovation is to construct the core-set on an ellipsoid, not typical sphere, preventing its updates towards the boundary regions of the distributions. • In term of algorithm design, in our work, from a Bayesian perspective, we propose a two-stage framework that sequentially introduces the core-set representation and model uncertainty, strengthening their performance "independently". Moreover, different to the typical BALD optimizations, we present geometric solvers to construct core-set and estimate model uncertainty, which result in a different view for Bayesian active learning. • Theoretically, to guarantee those improvements, our generalization analysis proves that, compared to typical Bayesian spherical interpretation, geodesic search with ellipsoid can derive a tighter lower error bound and achieve higher probability to obtain a nearly zero error. See Appendix B. The rest of this paper is organized as follows. In Section 2, we first review the related work. Secondly, we elaborate BALD and GBALD in Sections 3 and 4, respectively. Experimental results are presented in Section 5. Finally, we conclude this paper in Section 6.

2. RELATED WORK

Model uncertainty. In deep learning community, AL (Cohn et al., 1994) was introduced to improve the training of a DNN model by annotating unlabeled data, where the data which maximize the model uncertainty (Lakshminarayanan et al., 2017) are the primary acquisitions. For example, in ensemble deep learning (Ashukha et al., 2019) , out-of-domain uncertainty estimation selects those data which do not follow the same distribution as the input training data; in-domain uncertainty draws the data from the original input distribution, producing reliable probability estimates. Gal &



Figure 1: Illustration of two-stage GBALD framework. BALD has two types of interpretation: model uncertainty estimation and core-set construction where the deeper the color of the core-set element, the higher the representation; GBALD integrates them into a uniform framework. Stage 1 : core-set construction is with an ellipsoid, not typical sphere, representing the original distribution to initialize the input features of DNN. Stage 2 : model uncertainty estimation with those initial acquisitions then derives highly informative and representative samples for DNN.

