LEARNING COLLISION-FREE LATENT SPACE FOR BAYESIAN OPTIMIZATION

Abstract

Learning and optimizing a blackbox function is a common task in Bayesian optimization and experimental design. In real-world scenarios (e.g., tuning hyperparameters for deep learning models, synthesizing a protein sequence, etc.), these functions tend to be expensive to evaluate and often rely on high-dimensional inputs. While classical Bayesian optimization algorithms struggle in handling the scale and complexity of modern experimental design tasks, recent works attempt to get around this issue by applying neural networks ahead of the Gaussian process to learn a (low-dimensional) latent representation. We show that such learned representation often leads to collision in the latent space: two points with significantly different observations collide in the learned latent space. Collisions could be regarded as additional noise introduced by the traditional neural network, leading to degraded optimization performance. To address this issue, we propose Collision-Free Latent Space Optimization (CoFLO), which employs a novel regularizer to reduce the collision in the learned latent space and encourage the mapping from the latent space to objective value to be Lipschitz continuous. CoFLO takes in pairs of data points and penalizes those too close in the latent space compared to their target space distance. We provide a rigorous theoretical justification for the regularizer by inspecting the regret of the proposed algorithm. Our empirical results further demonstrate the effectiveness of CoFLO on several synthetic and real-world Bayesian optimization tasks, including a case study for computational cosmic experimental design.

1. INTRODUCTION

Bayesian optimization is a classical sequential optimization method and is widely used in various fields, including recommender systems, scientific experimental design, hyper-parameter optimization, etc. Many of theses applications involve evaluating an expensive blackbox function; therefore the number of queries should be minimized. A common way to model the unknown function is via Gaussian processes (GPs) Rasmussen and Williams (2006) . GPs have been extensively studied under the bandit setting, and has proven to be an effective approach for addressing a broad class of black-box function optimization problems. One of the key computational challenges for learning with GPs concerns with optimizing specific kernels used to model the covariance structures of GPs. As such optimization task depends on the dimension of feature space, for high dimensional input, it is often prohibitively expensive to train a Gaussian process model. Meanwhile, Gaussian processes are not intrinsically designed to deal with structured input that has a strong correlations among different dimensions, e.g., the graphs and time sequences. Therefore, dimensionality reduction algorithms are needed to speed up the learning process. Recently, it has become popular to investigate GPs in the context of latent space models. As an example, deep kernel learning (Wilson et al., 2016) simultaneously learns a (low-dimensional) data representation and a scalable kernel via an end-to-end trainable deep neural network. In general, the neural network is trained to learn a simpler latent representation with reduced dimension and has the structure information already embedded for the Gaussian process. Such a combination of neural network and Gaussian process could improve the scalability and extensibility of classical Bayesian optimization, but it also poses new challenges for the optimization task (Tripp et al., 2020) . As we later demonstrate, one critical challenge brought by introducing the neural network is that the latent representation is prone to collisions: two points with significant different observations can get As illustrated in Figure 1 , when passed through the neural network, data points with drastically different observations are mapped to close positions in the latent space. Such collisions could be regarded as additional noise introduced by the neural network. Although Bayesian optimization is known to be robust to mild noisy observations, the collision in latent space could be harmful to the optimization performance, as it is non-trivial to explicitly model the collision into the acquisition function. In addition, the additional noise induced by the collision effect will further loosen the regret bound for classical Bayesian optimization algorithms (Srinivas et al., 2010) .

Overview of main results

To mitigate the collision effect, we propose a novel regularization scheme which can be applied as a simple plugin amendment for the latent space-based Bayesian optimization models. The proposed algorithm, namely Collision-Free Latent Space Optimization (CoFLO), leverages a regularized regression loss function, to periodically optimize the latent space for Bayesian optimization. Concretely, our regularizer is encoded by a novel pairwise collision penalty function defined jointly on the latent space and the output domain. In order to mitigate the risk of collision in the latent space (and consequently boost the optimization performance), one can apply the regularizer uniformly to the latent space to minimize the collisions. However, in Bayesian global optimization tasks, we seek to prioritize the regions close to the possible optimum, as collisions in these regions are more likely to mislead the optimization algorithm. Based on this insight, we propose a optimizationaware regularization scheme, where we assign a higher weight for the collision penalty on those pairs of points closer to the optimum region in the latent space. This algorithm-which we refer to as dynamically-weighted CoFLO-is designed to dynamically assess the importance of a collision during optimization. Comparing to the uniform collision penalty over the latent space, the dynamic weighting mechanism has demonstrated drastic improvement over the state-of-the-art latent spacebased Bayesian optimization models. We summarize our the key contributions below: • We propose a novel regularization scheme, as a simple plugin amendment for latent spacebased Bayesian optimization models. Our regularizer penalizes collisions in the latent space and effectively reduces the collision effect. • We propose an optimization-aware dynamic weighting mechanism for adjusting the collision penalty to further improve the effectiveness of regularization for Bayesian optimization. • We provide theoretical analysis for the performance of Bayesian optimization on regularized latent space. • We conducted an extensive empirical study on four synthetic and real-world datasets, including a real-world case study for cosmic experimental design, and demonstrate strong empirical performance for our algorithm.



Figure 1: Illustration of the collision effect in latent space-based Bayesian optimization tasks. Since the data points around the optimum severely collided, BO is misguided to the sub-optimum.

