LEARNING ACTIVE LEARNING IN THE BATCH-MODE SETUP WITH ENSEMBLES OF ACTIVE LEARNING AGENTS Anonymous

Abstract

Supervised learning models perform best when trained on a lot of data, but annotating training data is very costly in some domains. Active learning aims to choose only the most informative subset of unlabelled samples for annotation, thus saving annotation cost. Several heuristics for choosing this subset have been developed, which use fixed policies for their choice. They are easily understandable and applied. However, there is no heuristic performing optimal in all settings. This led to the development of agents learning the best selection policy from data. They formulate active learning as a Markov decision process and apply reinforcement learning (RL) methods to it. Their advantage is that they are able to use many features and adapt to the specific task.



Our paper proposes a new approach combining these advantages of learning active learning and heuristics: We propose to learn active learning using a parameterized ensemble of agents, where the parameters are learned using Monte Carlo policy search. As this approach can incorporate any active learning agent into its ensemble, it allows to increase the performance of every active learning agent by learning how to combine it with others.

1. INTRODUCTION

Supervised machine learning systems perform best when trained on a large amount of training data. Obtaining this data by labelling can cause huge time and cost efforts in some domains. Active learning in the selective scenario overcomes this bottleneck by selecting a subset of all unlabelled samples to be labelled such that the model trained on them learns as much as possible and achieves a high accuracy (Cohn et al., 1994) . Heuristic active learning agents choose the samples to be labelled using a fixed policy. They have a known and predictable behaviour and their policy is easy to understand (Settles et al., 2008) . However, they have two main disadvantages: First, they rarely combine different features. Second, it was found that the best heuristic highly depends on the dataset and supervised learning model used (Lowell et al., 2018) . More recently, these shortcomings have been addressed by learning active learning directly from data (Konyushkova et al., 2017; 2018; Bachman et al., 2017; Fang et al., 2017; Liu et al., 2018b; a) . The authors formulate active learning as a Markov decision process and apply reinforcement learning (RL) methods like Q-Learning and imitation learning to it. While this approach promises to overcome the advantages of heuristics it introduces new problems: There is the credit assignment problem (Minsky, 1961) , the training is computationally very costly (Amodei et al.), and many results are not significant and hard to reproduce (Henderson et al., 2017) . Learning active learning with RL in the batch-mode setting has only received little attention in literature. One of the reasons is that choosing a batch of samples instead of a single one makes the action space exponentially bigger and thus finding the action maximizing a value function can not be done by iterating over all actions anymore. Furthermore, it makes it harder to attribute the reward to a specific parameter of the policy.

