PARETO OPTIMIZATION FOR ACTIVE LEARNING UNDER OUT-OF-DISTRIBUTION DATA SCENARIOS Anonymous

Abstract

Pool-based Active Learning (AL) has achieved great success in minimizing labeling costs by sequentially selecting the most informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling schemes might not work well under out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains data samples that do not belong to the pre-defined categories of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection -both more informative in-distribution (ID) data and OOD data in unlabeled data pool may be assigned high informativeness scores (e.g., high entropy) during AL processes. In this paper, we propose a Monte-Carlo Pareto Optimization for Active Learning (POAL) sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem and utilize Pareto optimization based on two conflicting objectives: (1) the typical AL sampling scheme (e.g., maximum entropy) and (2) the confidence of not being an OOD data sample. Experimental results show the effectiveness of our POAL on classical Machine Learning (ML) and Deep Learning (DL) tasks.

1. INTRODUCTION

In real-life applications, huge amounts of unlabeled data are easily obtained, but labeling them would be expensive (Shen et al., 2004) . AL aims to solve this problem -it achieves greater accuracy with less training data by sequentially selecting the most informative instances and then querying their labels from oracles/annotators (Zhan et al., 2021b) . Current AL methods have been tested on well-studied datasets (Kothawade et al., 2021) like MNIST (Deng, 2012)  and CIFAR10 (Krizhevsky et al., 2009) . These datasets are simple and clean. However, in realistic scenarios, when collecting unlabeled data, unrelated data (i.e., out-of-domain data) might be mixed in with the task-related data, e.g., images of letters when the task is to classify images of digits (Du et al., 2021) . Most AL methods are not robust to OOD data scenarios. For instance, Karamcheti et al. (2021) has demonstrated empirically that collective outliers hurt AL performances under Visual Question Answering (VQA) tasks. Meanwhile, selecting and querying OOD samples that are invalid for the target model will waste the labeling cost (Du et al., 2021) and make the AL sampling process less effective. There is a natural conflict between AL and OOD data detection. Most AL methods, especially uncertainty-based measures, prefer selecting data that are hardest to be classified by the current basic classifier (e.g., high entropy of predicted class probabilities). However, in AL, if a basic learner (e.g., Neural Network with softmax output) performs poorly on ID data, it is more likely to provide non-informative predicted probabilities (i.e., close to uniform probabilities) on OOD data (Vaze et al., 2021) . During AL processes, the basic learner is not well-trained due to insufficient labeled data, and insufficient epochs in the case of deep AL. Therefore, the samples selected by AL may contain both high informative ID and OOD samples. For example, consider the Maximum Entropy (ENT) approach for AL, which is a classic uncertainty-based method (Lewis & Catlett, 1994; Shannon, 2001) that selects data whose predicted class probabilities have the largest entropy. Meanwhile, ENT is also a typical OOD detection method -high entropy of the predicted class distribution suggests that the input may be OOD (Ren et al., 2019). Fig. 1(top) shows an example on the EX8 dataset (Ng, 2008) , which further illustrates this conflict -a large percentage of data with high entropy scores are OOD, and thus ENT-based AL will likely select OOD data for labeling. Thus, additional measures are needed to detect OOD samples so that they are not selected for AL. For example, in Fig. 1 (bottom), 1

