MALIBO: META-LEARNING FOR LIKELIHOOD-FREE BAYESIAN OPTIMIZATION

Abstract

Bayesian Optimization (BO) is a popular method to optimize expensive blackbox functions. While BO typically only optimizes a single task, recent methods exploit knowledge from related tasks to warm-start BO and improve data-efficiency. However, these methods are either not scalable or sensitive to heterogeneous value scales across multiple tasks. We propose a novel approach to solve these problems by combining meta-learning with a likelihood-free acquisition function. Specifically, our meta-learning model simultaneously learns the underlying (taskagnostic) data distribution and a latent feature representation for individual tasks to be used as the acquisition function inside BO. The likelihood-free approach has less stringent assumptions about the problems compared to regression based methods and works with any classification algorithm, making it computation efficient and robust to different scales across tasks. Finally, we use gradient boosting as a residual model on top to adapt to distribution drifts between new and prior tasks, which might otherwise weaken the usefulness of the meta-learned features. Experiments show that the meta-model learns an effective prior for warm-starting optimization algorithms, is cheap to evaluate, and invariant under changes of scale across different datasets.

1. INTRODUCTION

Bayesian Optimization (BO) is a widely used method to optimize expensive black-box functions (Shahriari et al., 2016) and has been successfully applied in different fields, including automated machine learning (ML) (Hutter et al., 2019) . Given small amounts of data, traditional BO uses a Gaussian Process (GP) surrogate model together with an acquisition function to quickly optimize a black-box function. However, most BO techniques start from scratch for each new optimization problem, instead of leveraging information from previous runs for similar tasks to further improve data-efficiency. To warm-start BO, exploiting additional task information has been explored in the context of transfer learning (Weiss et al., 2016) and meta-learning (Vanschoren, 2018) . Prior knowledge can be used to build informed surrogate models (Schilling et al., 2016; Wistuba et al., 2018; Feurer et al., 2018b; Perrone et al., 2018) , to restrict the search space (Perrone et al., 2019) , or to warm-start the optimization with configurations that generally score well (Feurer et al., 2014; Salinas et al., 2020) . However, these approaches have three important issues: (i) GPs scale poorly due to their cubical computational complexity (Rasmussen, 2004) . (ii) The standard BO framework requires a surrogate model with well-calibrated and tractable predictive uncertainty, which is challenging in high-dimensional problems (Tiao et al., 2021; Song et al., 2022) . (iii) Regression models, including GPs, struggle with different scales and noise levels across tasks, which hurts warm-starting and optimization efficiency (Feurer et al., 2018a) . We propose a new meta-learning BO approach that can effectively transfer knowledge from related tasks and scales to large datasets. Our method is inspired by the idea of likelihood-free BO (Bergstra et al., 2011; Tiao et al., 2021; Song et al., 2022) , which replaces the surrogate model with a meta-learned classifier that directly balances exploration and exploitation without modeling the 1

