LEARNING ACTIVE LEARNING IN THE BATCH-MODE SETUP WITH ENSEMBLES OF ACTIVE LEARNING AGENTS Anonymous

Abstract

Supervised learning models perform best when trained on a lot of data, but annotating training data is very costly in some domains. Active learning aims to choose only the most informative subset of unlabelled samples for annotation, thus saving annotation cost. Several heuristics for choosing this subset have been developed, which use fixed policies for their choice. They are easily understandable and applied. However, there is no heuristic performing optimal in all settings. This led to the development of agents learning the best selection policy from data. They formulate active learning as a Markov decision process and apply reinforcement learning (RL) methods to it. Their advantage is that they are able to use many features and adapt to the specific task. Our paper proposes a new approach combining these advantages of learning active learning and heuristics: We propose to learn active learning using a parameterized ensemble of agents, where the parameters are learned using Monte Carlo policy search. As this approach can incorporate any active learning agent into its ensemble, it allows to increase the performance of every active learning agent by learning how to combine it with others.

1. INTRODUCTION

Supervised machine learning systems perform best when trained on a large amount of training data. Obtaining this data by labelling can cause huge time and cost efforts in some domains. Active learning in the selective scenario overcomes this bottleneck by selecting a subset of all unlabelled samples to be labelled such that the model trained on them learns as much as possible and achieves a high accuracy (Cohn et al., 1994) . Heuristic active learning agents choose the samples to be labelled using a fixed policy. They have a known and predictable behaviour and their policy is easy to understand (Settles et al., 2008) . However, they have two main disadvantages: First, they rarely combine different features. Second, it was found that the best heuristic highly depends on the dataset and supervised learning model used (Lowell et al., 2018) . More recently, these shortcomings have been addressed by learning active learning directly from data (Konyushkova et al., 2017; 2018; Bachman et al., 2017; Fang et al., 2017; Liu et al., 2018b; a) . The authors formulate active learning as a Markov decision process and apply reinforcement learning (RL) methods like Q-Learning and imitation learning to it. While this approach promises to overcome the advantages of heuristics it introduces new problems: There is the credit assignment problem (Minsky, 1961) , the training is computationally very costly (Amodei et al.), and many results are not significant and hard to reproduce (Henderson et al., 2017) . Learning active learning with RL in the batch-mode setting has only received little attention in literature. One of the reasons is that choosing a batch of samples instead of a single one makes the action space exponentially bigger and thus finding the action maximizing a value function can not be done by iterating over all actions anymore. Furthermore, it makes it harder to attribute the reward to a specific parameter of the policy. Our paper addresses these shortcomings by proposing an active learning agent being a weighted ensemble of other agents. The weights are learned using Monte Carlo policy search and blackbox optimization. As several different agents can be used as part of the ensemble, many different features can be included. By learning the weights of each agent, the ensemble can adapt to the dataset, model and optimization metric. Nonetheless, the learning is very robust and global, as blackbox optimization over a small number of parameters is much easier than reinforcement learning. Furthermore, the policy is easily understandable and interpretable. The ensemble should not be seen as an alternative approach to current approaches, but rather as an extension of them allowing to learn how to combine several approaches to further increase their performance. We evaluate our approach using active learning tasks from different domains and with random forests, CNNs and LSTMs as classifiers. The experiments show that the ensemble consistently performs at least as good as the best agent it includes and sometimes even outperforms it by a significant margin. Our main contributions are: • We propose an approach to learning active learning using an ensemble of heuristics. It combines the advantages of heuristics and approaches learning active learning. The experiments show that the theoretical advantages also translate into a high empirical performance. • We show with our experiments that it is very important to train the ensemble on a similar task it is evaluated on. This is contrary to assumptions in earlier literature that a learning agent trained on a synthetic dataset works well in completely different domains.

2.1. ACTIVE LEARNING HEURISTICS

Heuristic frameworks are active learning frameworks relying on engineered, fixed policies. Their performance depends highly on the active learning task, with no heuristic being able to outperform the others in all cases. There are three core ideas which kind of samples should be chosen to be labelled. They can be clearly distinguished, as they rely on disjoint sets of features: Informativeness sampling: One group of heuristics prefers to choose informative samples, which can be expected to change the supervised learning model a lot when added to the labelled set. Their features for a sample can be calculated only given the current supervised learning model and the sample itself. The most popular heuristic in this group is uncertainty sampling (Lewis & Gale, 1994; Scheffer et al., 2001; Shannon, 1948) , while many others exist. Diversity sampling: This approach chooses samples which are dissimilar to already labelled samples. Its features are similarity metrics from a sample to the current labelled set. It is nearly always combined with uncertainty based sampling. Representative sampling: This approach, also called density-based sampling, chooses samples which are representative for the whole dataset. In particular that means that outliers should not be chosen. Like diversity sampling, it is usually combined with uncertainty sampling. The combination of features or heuristics from these three disjoint groups is used by many researchers to increase the performance of active learning agents (Wang et al., 2017; Sener & Savarese, 2017; Zhu et al., 2009) . There are many ways to combine heuristics, e.g. it is possible to first choose a subset of candidate points using one heuristic and then choose the final points to be labelled using the other heuristic, or one could directly define a heuristic combining two existing ones. While our approach with a learned ensemble is the first known approach to learn the best combination of heuristics, there exist many ideas in current literature, which could improve the performance of the proposed ensemble even further: Wang et al. ( 2017) have added a diversity constraint to uncertainty sampling, and use a Quadratic Programming approach to enforce this constraint. Sener & Savarese (2017) have described active learning as a core-set-selection problem. Thus, they try to choose samples which are dissimilar both to each other and the labelled set and representative of the unlabelled set. These two approaches share that they propose advanced non-greedy methods for choosing a batch of samples. A greedy choice is only a 1 -1 e -algorithm, as already pointed out by Kirsch et al. (2019) . Adapting our

