POISSON PROCESS FOR BAYESIAN OPTIMIZATION

Abstract

Bayesian Optimization (BO) is a sample-efficient, model-based method for optimizing black-box functions which can be expensive to evaluate. Traditionally, BO seeks a probabilistic surrogate model, such as Tree-structured Parzen Estimator (TPE), Sequential Model Algorithm Configuration (SMAC), and Gaussian process (GP), based on the exact observed values. However, compared to the value response, relative ranking is harder to be disrupted due to noise resulting in better robustness. Moreover, it has better practicality when the exact value responses are intractable, but information about candidate preferences can be acquired. This work introduces an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO), consisting of a novel ranking-based response surface based on Poisson process and two acquisition functions to accommodate the proposed surrogate model. We show empirically that PoPBO improves efficacy and efficiency on both simulated and real-world benchmarks, including HPO and NAS.

1. INTRODUCTION

Bayesian optimization (BO) (Mockus et al., 1978 ) is a popular black-box optimization paradigm and has achieved great success in a number of challenging fields, such as robotic control (Calandra et al., 2016) , biology (González et al., 2015) , and hyperparameter tuning for complex learning tasks (Bergstra et al., 2011) . A standard BO routine usually consists of two steps: (1) Learning a probabilistic response surface that captures the distribution of an unknown function f (x); (2) Optimizing an acquisition function that suggests the most valuable points for the next query iteration. Popular response surface for the first step includes Random Forest (SMAC) (Hutter et al., 2011) , Treestructure Parzen Estimator (TPE) (Bergstra et al., 2011) , Gaussian Process (GP) (Snoek et al., 2012) and Bayesian Neural Network (BNN) (Springenberg et al., 2016; Snoek et al., 2015) . Acquisition functions for the second step include Expected Improvement (EI) (Mockus, 1994) , Thompson Sampling (TS) (Chapelle & Li, 2011; Agrawal & Goyal, 2013) and Upper/Lower Confidence Bound (UCB/LCB) (Srinivas et al., 2012) , which are designed to trade off exploration and exploitation. Most of the existing BO methods (Bergstra et al., 2011; Hutter et al., 2011; Snoek et al., 2012) adopt absolute response surfacesfoot_0 that attempt to fit the black-box function based on the observed absolute function values. However, such an absolute metric can have the following disadvantages. 1) Absolute response can be difficult to obtain or even unavailable in some practical scenarios, such as sports games and recommender systems where only relative evaluationfoot_1 can be provided by pairwise comparison (He et al., 2022) . 2) Absolute response can be sensitive to noise, which is also pointed out by Rosset et al. (2005) . Such an issue will affect the performance of BO in real-world scenarios, where absolute responses are usually noisy. 3) It can be challenging to directly transfer absolute response surfaces. In particular, multi-fidelity metrics usually have different absolute responses for the same candidate, making it hard to utilize history observations on a coarse-fidelity metric to warm up the training of surrogate models on a fine-grained-fidelity one. Similarly, in hyperparameter optimization (HPO) and neural architecture search (NAS) tasks, performance on different datasets of the same hyperparameter selection or neural architectures is also different and is hard to be transferred across datasets. Relative metrics can be an effective cure for the above issues. 1) Relative response such as ranking has better practicality when the information about candidate preferences can be more easily acquired than raw value (González et al., 2017) , which is also widely used in many prior works (Kahneman & Tversky, 2013; Brusilovsky et al., 2007; González et al., 2017) . 2) Relative response is more robust to noise than absolute response since relation such as ranking between candidates is harder to be disrupted to noise, but absolute values are sensitive. In this work, we analyze the robustness of rankings in Sec. 3.1 under the common additive Gaussian noise assumption, showing that rankings are more insensitive to noise than absolute values. Similar conclusions are also made in other areas regarding the advantage of ranking models e.g. (Rosset et al., 2005) . 3) Relative response has better transferability, such as rankings between candidates, since they are usually comparable among multi-fidelity metrics or evaluations across different datasets for the same candidate. It is also demonstrated by (Salinas et al., 2020; Nguyen et al., 2021; Feurer et al., 2018) . Some Bayesian Optimization methods also adopt relative responses and are related to our work. Preferential BO methods (González et al., 2017; Mikkola et al., 2020) attempt to capture the relative preference by comparing pair of candidates. However, they have to rely on a computational-expensive soft-Copeland score (PBO) or have to optimize EI by the projective preferential query (PPBO) to propose the next query (optimal candidate). Moreover, they ignore tie situations, which commonly exist in real scenarios. Nguyen et al. (2021) extend the above method by comparing k samples. Specifically, they utilize Gaussian Process to model the absolute function values and leverage a multinominal logit model to build the evidence likelihood of local ranking of k observations. Although this method overcomes the computational disadvantage and takes ties into account, it essentially models the absolute response and simply captures the relationship (local ranking) among k candidates. In contrast to the above methods, we propose to capture the global ranking of each candidate in a feasible domain (search space) and model the relative response. On the one hand, we can directly search the optimum based on our relative response surface and obtain the next query without computationally expensive procedure (González et al., 2017; Mikkola et al., 2020) . On the other hand, unlike (Nguyen et al., 2021) that first build an absolute response surface and then derive local ranking among k candidates as the evidence, our method directly fits a ranking-based relative response surface. Moreover, due to the nature of ranking, our method can handle tie situations where candidates have the same ranking. Specifically, we adopt Poisson Process (PP) to capture the global ranking, which is naturally suitable since the ranking of a candidate can be figured out by counting the number of better candidates. Fig. 1 shows the superiority of our response surface to capture the global ranking against the GP-based one. Specifically, we conduct experiments on the Forrester function with various degrees of additive Gaussian noise. The setting details can be found in Appendix C.1. Our response surface is more robust to noise and can better capture the global ranking. Furthermore, we derive two acquisition functions to accommodate our response surface for a better exploitation-exploration trade-off. Finally, we propose a novel Bayesian Optimization framework, named PoPBO, achieving lower regret (better performance) with faster speed. Our contributions can be summarized as follows: 1) Ranking-based Response Surface based on Poisson Process. Unlike the prior absolute response surface (Bergstra et al., 2011; Snoek et al., 2012) , nor those (Nguyen et al., 2021) using relative evidence likelihood based on absolute responses, this work, to the best of our knowledge, is the first to directly capture the global ranking over a feasible domain via Poisson process. The robustness against noise is also analyzed in Sec. 3.1 and illustrated in Fig. 1 . 2) Tailored Acquisition Function for Ranking-based Response Surface. Two acquisition functions for our response surface, named R-LCB and ERI, are deduced from the vanilla LCB and EI for better exploitation-exploration trade-off. Gradients of the proposed acquisition functions w.r.t. candidates are also derived, so the next query can be optimized by SGD. 3) Computational-Efficient Bayesian Optimization Framework. The proposed ranking-based response surface and acquisition functions form a novel Bayesian optimization framework: Poisson Process Bayesian Optimization (PoPBO). Our framework is much faster than Gaussian process-based BO methods. Specifically, the computational complexity of PoPBO is O(N 2 ) compared to O(N 3 ) of GP, where N is the number of samples (see Fig. 3 ).

4) Extensive Empirical

Study with Strong Performance. Our method achieves substantial improvements over many prior BO methods on the simulated functions and multiple benchmarks on real-world datasets, including hyperparameter optimization and neural architecture search.



In this work, 'absolute evaluation (response)' of one query is defined as its exact black-box function value. In this work, 'relative evaluation (response)' of one query is defined as its ranking, which can be computed by comparing with other candidates.

