DESIGN-BENCH: BENCHMARKS FOR DATA-DRIVEN OFFLINE MODEL-BASED OPTIMIZATION Anonymous

Abstract

Black-box model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function, are ubiquitous in a wide range of domains, such as the design of drugs, aircraft, and robot morphology. Typically, such problems are solved by actively querying the black-box objective on design proposals and using the resulting feedback to improve the proposed designs. However, when the true objective function is expensive or dangerous to evaluate in the real world, we might instead prefer a method that can optimize this function using only previously collected data, for example from a set of previously conducted experiments. This data-driven offline MBO setting presents a number of unique challenges, but a number of recent works have demonstrated that viable offline MBO methods can be developed even for highdimensional problems, using high-capacity deep neural network function approximators. Unfortunately, the lack of standardized evaluation tasks in this emerging new field has made tracking progress and comparing recent methods difficult. To address this problem, we present Design-Bench, a benchmark suite of offline MBO tasks with a unified evaluation protocol and reference implementations of recent methods. Our benchmark suite includes diverse and realistic tasks derived from real-world problems in biology, material science, and robotics that present distinct challenges for offline MBO methods. Our benchmarks, together with the reference implementations, are available at sites.google.com/view/design-bench. We hope that our benchmark can serve as a meaningful metric for the progress of offline MBO methods and guide future algorithmic development.

1. INTRODUCTION

Automatically synthesizing designs that maximize a desired objective function is one of the most important problems in many scientific and engineering domains. From protein design in molecular biology Shen et al. (2014) to superconducting material discovery in physics Hamidieh (2018), researchers have made significant progress in applying machine learning methods to such optimization problems over structured design spaces. Commonly, the exact form of the objective function is unknown, and the objective values for a novel design can only be evaluated by running either computer simulations or physical experiments in the real world. The process of optimizing an unknown function is known as black-box optimization, and is typically solved in an online iterative manner, where in each iteration the solver proposes new designs and query the objective function for feedback in order to propose better design in the next iteration Williams & Rasmussen (2006) . In many domains however, the evaluation of the objective function is prohibitively expensive, because it requires manually conducting experiments in the real world. In this setting, one cannot simply query the true objective function to gradually improve the design. Instead, a collection of past records of designs and their corresponding objective values might be available, and therefore the optimization method must leverage the available data to synthesize the most optimal design possible. This is the setting of data-driven offline model-based optimization. Although online black-box optimization has been studied extensively, the offline MBO problem has received comparatively less attention, and only a small number of recent works study offline MBO in the setting with high-dimensional design spaces, where they utilize deep learning techniques Brookes et al. ( 2019); Kumar & Levine (2019); Fannjiang & Listgarten (2020) . This is partly due to the fact that methods for online design optimization cannot be easily applied in the offline MBO setting. However, even with only a few existing methods, it is still hard to compare and track the progress in this field, as these methods are proposed and evaluated on different tasks with distinct evaluation protocols. To the best of our knowledge, there is no commonly adopted set of benchmarks for offline MBO. To address this problem, in this paper we introduce a suite of offline MBO benchmarks with standardized evaluation protocols. We include a realistic and diverse set of tasks that spans a wide range of application domains, from synthetic biology to robotics. The realism and diversity of the tasks is essential for the evaluation of offline data-driven model-based optimization methods, as it measures the generality of the methods being evaluated across multiple domains and verifies that they are not overfitting to a single task. Our benchmark tasks incorporate a variety of challenging factors, such as high dimensional input spaces and sensitive discontinuous objective functions, which help better identify the strengths and weaknesses of MBO methods. Along with the benchmark suite, we also present reference implementations of a number of existing offline MBO methods and baselines. We systematically evaluate them on all of the proposed benchmark tasks and report our findings. A surprising discovery from our findings is that with proper data normalization, the simple baseline method of learning an objective value predictor and performing gradient ascent on its input outperforms several prior MBO methods in our benchmark. We hope that our work can provide insight into the current progress of offline MBO methods and can also serve as a meaningful metric to galvanize research in this area.

2. OFFLINE MODEL-BASED OPTIMIZATION PROBLEM STATEMENT

The goal in offline model-based optimization is to optimize an unknown (possibly stochastic) objective function f (x), provided access to a static dataset D = {(x i , y i )} of designs x i and a corresponding measurement of the objective value y i . Similar to batch Bayesian optimization (BBO) González et al. (2016) , each algorithm A is allowed to consume the dataset D, and is required to produce a set of K candidate designs A(D, K) = {x * i : i ∈ {1...K}}. These K candidates are evaluated under the ground truth objective function f (x), and the best performing design is reported as the final performance value. Abstractly, the objective for offline MBO is: arg max A [P({f (x * ) : x * ∈ A(D, K)}, N )] , where P denotes the percentile function. Intuitively, this formulation ranks an offline MBO algorithm using the N th percentile objective value obtained by it given a fixed evaluation budget of K-samples. Common choices of N are 100, which represent the max objective value, and 50 which represents the median objective value among the candidates. What makes offline MBO especially challenging? The offline nature of the problem requires that the algorithm A not be tuned by peeking into the ground truth objective f , and this makes the offline MBO problem much more difficult than the online design optimization problem. One simple idea to tackle this problem is to learn an objective proxy using the dataset, and then convert this offline MBO problem into an online problem by treating the learned objective proxy as the true objective. However, this idea may not work well, due to the intrinsic out-of-distribution nature of optimal designs. First of all, in a number of practical MBO problems such as optimization over proteins or robot morphologies, the designs with highest objective values in the dataset already lie on the tail of the dataset distribution, since they are better than most other designs. In order to improve upon the best designs, an optimization method needs to produce designs that are even further away from the dataset distribution. For such out-of-distribution designs, is would be impossible to guarantee that the learned objective proxy is accurate, and hence any powerful optimization method would easily "exploit" the learned objective proxy and produce falsely promising designs that are drastically overestimated by the learned objective proxy. This conflict between the out-of-distribution nature of optimization and the in-distribution requirement of any learned model is indeed the core challenge of offline MBO. This challenge is often exacerbated in real-world problems by the high dimensionality of the design space and the sparsity of the available data, as we will show in our benchmark. A good offline MBO method needs to carefully balance the two sides of the conflict, producing optimized designs that are not too far from the data distribution.

