LEARNING TO ACTIVELY LEARN: A ROBUST AP-PROACH

Abstract

This work proposes a procedure for designing algorithms for specific adaptive data collection tasks like active learning and pure-exploration multi-armed bandits. Unlike the design of traditional adaptive algorithms that rely on concentration of measure and careful analysis to justify the correctness and sample complexity of the procedure, our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds. In particular, a single adaptive learning algorithm is learned that competes with the best adaptive algorithm learned for each equivalence class. Our procedure takes as input just the available queries, set of hypotheses, loss function, and total query budget. This is in contrast to existing meta-learning work that learns an adaptive algorithm relative to an explicit, user-defined subset or prior distribution over problems which can be challenging to define and be mismatched to the instance encountered at test time. This work is particularly focused on the regime when the total query budget is very small, such as a few dozen, which is much smaller than those budgets typically considered by theoretically derived algorithms. We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data including a noisy 20 Questions game and a joke recommendation task.

1. INTRODUCTION

Closed-loop learning algorithms use previous observations to inform what measurements to take next in a closed-loop in order to accomplish inference tasks far faster than any fixed measurement plan set in advance. For example, active learning algorithms for binary classification have been proposed that under favorable conditions require exponentially fewer labels than passive, random sampling to identify the optimal classifier (Hanneke et al., 2014) . And in the multi-armed bandits literature, adaptive sampling techniques have demonstrated the ability to identify the "best arm" that optimizes some metric with far fewer experiments than a fixed design (Garivier & Kaufmann, 2016; Fiez et al., 2019) . Unfortunately, such guarantees often either require simplifying assumptions that limit robustness and applicability, or appeal to concentration inequalities that are very loose unless the number of samples is very large (e.g., web-scale). The aim of this work is a framework that achieves the best of both worlds: algorithms that learn through simulated experience to be as effective as possible with a tiny measurement budget (e.g., 20 queries), while remaining robust due to adversarial training. Our work fits into a recent trend sometimes referred to as learning to actively learn (Konyushkova et al., 2017; Bachman et al., 2017; Fang et al., 2017; Boutilier et al., 2020; Kveton et al., 2020) which tunes existing algorithms or learns entirely new active learning algorithms by policy optimization. Previous works in this area learn a policy by optimizing with respect to data observed through prior experience (e.g., metalearning or transfer learning) or an assumed explicit prior distribution of problem parameters (e.g. the true weight vector for linear regression). In contrast, our approach makes no assumptions about what parameters are likely to be encountered at test time, and therefore produces algorithms that do not suffer from a potential mismatch of priors. Instead, our method learns a policy that attempts to mirror the guarantees of frequentist algorithms with instance dependent sample complexities: if the problem is hard you will suffer a large loss, if it is easy you will suffer little. 1

