AN EMPIRICAL STUDY ON THE EFFICACY OF DEEP ACTIVE LEARNING TECHNIQUES Anonymous authors Paper under double-blind review

Abstract

Deep Active Learning (DAL) has been advocated as a promising method to reduce labeling costs in supervised learning. However, existing evaluations of DAL methods are based on different settings, and their results are controversial. To tackle this issue, this paper comprehensively evaluates 19 existing DAL methods in a uniform setting, including traditional fully-supervised active learning (SAL) strategies and emerging semi-supervised active learning (SSAL) techniques. We have several non-trivial findings. First, most SAL methods cannot achieve higher accuracy than random selection. Second, semi-supervised training brings significant performance improvement compared to pure SAL methods. Third, performing data selection in the SSAL setting can achieve a significant and consistent performance improvement, especially with abundant unlabeled data. Our findings produce the following guidance for practitioners: one should (i) apply SSAL as early as possible and (ii) collect more unlabeled data whenever possible, for better model performance. We will release our code upon acceptance.

1. INTRODUCTION

Training a well-performed Deep Neural Network (DNN) generally requires a substantial amount of labeled data. However, data collection and labeling can be quite costly, especially for those tasks that require expert knowledge (e.g., medical image analysis (Hoi et al.) and malware detection (Nissim et al., 2014) ). Deep Active Learning (DAL) has thus long been advocated to mitigate this issue, wherein we proactively select and label the most informative training samples. That is, given a pool of unlabeled data, DAL iteratively performs data selection and training until the given labeling budget is reached, as shown in Figure 1 . Various DAL techniques are proposed in the literature. Most of them are fully-supervised ones (SAL) and aim for a better data selection strategyfoot_0 . The SAL strategies can be roughly grouped into three categories: 1) model-based selection; 2) data distribution-based selection; and 3) hybrid selection. Model-based selection prefers annotating data that are most uncertain under the task model (Gal et al., 2017; Beluch et al., 2018) . Data distribution-based methods select data according to their density or diversity (Sener & Savarese, 2018; Sinha et al., 2019) . Hybrid methods consider task model information and data distribution when selecting (Ash et al., 2020) . By applying pseudo-labels (Arazo et al., 2019) to the unlabeled data or consistency regularization (Berthelot et al., 2019) , semi-supervised learning (SSL) can improve the model performance substantially. Consequently, it is attractive to apply active learning on top of SSL techniques, referred to as semi-supervised active learning (SSAL). Song et al. ( 2019) incorporate the well-known SSL method MixMatch (Berthelot et al., 2019) during training. Gao et al. augment the unlabeled samples and enforce the model to have consistent predictions on the unlabeled samples and their corresponding augmentations. Also, Gao et al. develop a data selection strategy for the SSL-based method, i.e., selecting samples with inconsistent predictions. Besides, WAAL formulates the DAL as a distribution matching problem and trains the task model with an additional loss evaluated from the distributional difference between labeled and unlabeled data (Shui et al., 2020) . Despite the effectiveness of existing methods, the results from previous works often contradict each other. For example, CoreSet (Sener & Savarese, 2018) and the DBAL method (Gal et al., 2017) are 



Among the 19 investigated methods, 14 of them are fully-supervised ones 1

