PROBABILISTIC META-LEARNING FOR BAYESIAN OPTIMIZATION

Abstract

Transfer and meta-learning algorithms leverage evaluations on related tasks in order to significantly speed up learning or optimization on a new problem. For applications that depend on uncertainty estimates, e.g., in Bayesian optimization, recent probabilistic approaches have shown good performance at test time, but either scale poorly with the number of data points or under-perform with little data on the test task. In this paper, we propose a novel approach to probabilistic transfer learning that uses a generative model for the underlying data distribution and simultaneously learns a latent feature distribution to represent unknown task properties. To enable fast and accurate inference at test-time, we introduce a novel meta-loss that structures the latent space to match the prior used for inference. Together, these contributions ensure that our probabilistic model exhibits high sample-efficiency and provides well-calibrated uncertainty estimates. We evaluate the proposed approach and compare its performance to probabilistic models from the literature on a set of Bayesian optimization transfer-learning tasks.

1. INTRODUCTION

Bayesian optimization (BO) is arguably one of the most proven and widely used blackbox optimization frameworks for expensive functions (Shahriari et al., 2015) with applications that include materials design (Frazier & Wang, 2016 ), reinforcement learning (Metzen et al., 2015) , and automated machine learning (ML) (Hutter et al., 2019) . In practical applications, BO is repeatedly used to solve variations of similar tasks. In these cases, the sample efficiency can be further increased by not starting the optimization from scratch, but rather leveraging previous runs to inform and accelerate the latest one. Several approaches to this emerged under the name of transfer-learning (Weiss et al., 2016) and meta-learning (Vanschoren, 2018) . Compared to early work by Swersky et al. (2013); Golovin et al. (2017) , recent publications leverage the representative flexibility of neural networks, which allows for more powerful models and impressive results (Gordon et al., 2019; Rusu et al., 2019; Garnelo et al., 2018b; a; Zintgraf et al., 2019) . Despite these significant advances, only a small subset of algorithms offers the well-calibrated uncertainty estimates on which BO relies to guide its sampling strategy efficiently. Additionally, BO benefits greatly from a meaningful prior over tasks that quickly converges to the true function to provide the highest sample efficiency. Existing work mostly focuses on deterministic models and, for those providing uncertainty estimates, sample-efficiency at test time is often a challenge.

Contributions

We set out to close this gap and introduce BAyesian optimization with Neural Networks and Embedding Reasoning (BaNNER), a flexible meta-learning method for BO. We go beyond previous work of Perrone et al. (2018) and introduce a generative regression model explicitly conditioned on a low-dimensional latent representation for the tasks. This allows our model to (i) encode a meaningful prior over tasks and (ii) remain highly sample-efficient, since each new task only requires inference over a low-dimensional latent representation. To ensure robust training of our model, we introduce a novel loss function to regularize the latent distribution and optimize our model's hyper-parameters using the available meta-data. We evaluate BaNNER on a set of synthetic benchmarks and two meta-learning problems and compare with the state-of-the-art in the literature.

