AN EMPIRICAL STUDY ON THE EFFICACY OF DEEP ACTIVE LEARNING TECHNIQUES Anonymous authors Paper under double-blind review

Abstract

Deep Active Learning (DAL) has been advocated as a promising method to reduce labeling costs in supervised learning. However, existing evaluations of DAL methods are based on different settings, and their results are controversial. To tackle this issue, this paper comprehensively evaluates 19 existing DAL methods in a uniform setting, including traditional fully-supervised active learning (SAL) strategies and emerging semi-supervised active learning (SSAL) techniques. We have several non-trivial findings. First, most SAL methods cannot achieve higher accuracy than random selection. Second, semi-supervised training brings significant performance improvement compared to pure SAL methods. Third, performing data selection in the SSAL setting can achieve a significant and consistent performance improvement, especially with abundant unlabeled data. Our findings produce the following guidance for practitioners: one should (i) apply SSAL as early as possible and (ii) collect more unlabeled data whenever possible, for better model performance. We will release our code upon acceptance.

1. INTRODUCTION

Training a well-performed Deep Neural Network (DNN) generally requires a substantial amount of labeled data. However, data collection and labeling can be quite costly, especially for those tasks that require expert knowledge (e.g., medical image analysis (Hoi et al.) and malware detection (Nissim et al., 2014) ). Deep Active Learning (DAL) has thus long been advocated to mitigate this issue, wherein we proactively select and label the most informative training samples. That is, given a pool of unlabeled data, DAL iteratively performs data selection and training until the given labeling budget is reached, as shown in Figure 1 . Various DAL techniques are proposed in the literature. Most of them are fully-supervised ones (SAL) and aim for a better data selection strategyfoot_0 . The SAL strategies can be roughly grouped into three categories: 1) model-based selection; 2) data distribution-based selection; and 3) hybrid selection. Model-based selection prefers annotating data that are most uncertain under the task model (Gal et al., 2017; Beluch et al., 2018) . Data distribution-based methods select data according to their density or diversity (Sener & Savarese, 2018; Sinha et al., 2019) . Hybrid methods consider task model information and data distribution when selecting (Ash et al., 2020) . By applying pseudo-labels (Arazo et al., 2019) to the unlabeled data or consistency regularization (Berthelot et al., 2019) , semi-supervised learning (SSL) can improve the model performance substantially. Consequently, it is attractive to apply active learning on top of SSL techniques, referred to as semi-supervised active learning (SSAL). 5 ). Several empirical studies are therefore proposed to address this problem (Beck et al., 2021; Munjal et al., 2020) . However, again, they have controversial observations. Beck et al. ( 2021) claim that there is little or no benefits for different DAL methods over RS, while Munjal et al. (2020) show that DAL methods are much better than RS. These inconsistencies motivate us to unify the experimental settings and conduct a thorough empirical study on the effectiveness of DAL techniques. Our contributions are summarized as follows: • We re-implement and perform extensive evaluations of 19 deep active learning methods on several popular image classification tasks, including MNIST, CIFAR-10, and GTSRB. To the best of our knowledge, our evaluation is the most comprehensive one, which not only includes most state-of-the-art SAL solutions but also incorporates various SSAL methods. • Through extensive experiments, we conclude that SSAL techniques are preferred. Traditional SAL methods can hardly beat random selection, and there is no SAL method that can consistently outperform others. In contrast, SSAL methods not only easily outperform all the SAL methods by a large margin. More importantly, active sample selection plays an important role in SSAL methods, which can achieve significant and consistent performance improvements compared to random selection. • We conduct an in-depth analysis of SSAL methods and provide two guidance to the practitioners. First, one should conduct SSAL as early as possible. Second, one should seek more unlabeled data whenever possible to achieve better performance. The rest of the paper is organized as follows. Section 2 presents the related works. Section 3 illustrates the experimental setup. We present our empirical study on the performance in Section 4. Further studies on SSAL methods are presented in Section 5. Section 6 concludes this paper.

2. RELATED WORKS

In this section, we introduce existing DAL methods and empirical studies on these DAL methods. As shown in Figure 1 , existing DAL works can be roughly grouped into fully-supervised active learning (SAL) and semi-supervised active learning (SSAL), depending on whether they use unlabeled data to train the task model.

2.1. FULLY-SUPERVISED ACTIVE LEARNING (SAL)

We categorize the SAL strategies into three classes: uncertainty-based selection, diversity/ representativeness-based selection, and hybrid selection, which is the combination of the above two methods.



Among the 19 investigated methods, 14 of them are fully-supervised ones



Figure 1: The pool-based deep active learning process. In each iteration, we first train the task model. Based on the trained model and the available unlabeled data, we select a subset of the unlabeled data for labeling (marked as red circle or triangle). The process is iterated until certain model accuracy is achieved or the labeling budget is used up. Existing methods mainly focus on the data selection strategy or the model training strategy.

