SAMPLE-EFFICIENT MULTI-OBJECTIVE MOLECULAR OPTIMIZATION WITH GFLOWNETS

Abstract

Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as an expensive black-box optimization problem over the discrete chemical space. Computational methods have achieved initial success but still struggle with simultaneously optimizing multiple competing properties in a sample-efficient manner. In this work, we propose a multiobjective Bayesian optimization (MOBO) algorithm leveraging the hypernetworkbased GFlowNets (HN-GFN) as an acquisition function optimizer, with the purpose of sampling a diverse batch of candidate molecular graphs from an approximate Pareto front. Using a single preference-conditioned hypernetwork, HN-GFN learns to explore various trade-offs between objectives. Inspired by reinforcement learning, we further propose a hindsight-like off-policy strategy to share highperforming molecules among different preferences in order to speed up learning for HN-GFN. Through synthetic experiments, we illustrate that HN-GFN has adequate capacity to generalize over preferences. Extensive experiments show that our framework outperforms the best baselines by a large margin in terms of hypervolume in various real-world MOBO settings.

1. INTRODUCTION

Designing novel molecular structures with desired properties, also referred to as molecular optimization, is a crucial task with great application potential in scientific fields ranging from drug discovery to material design. Molecular optimization can be naturally formulated as a black-box optimization problem over the discrete chemical space, which is combinatorially large (Polishchuk et al., 2013) . Recent years have witnessed the trend of leveraging computational methods, such as deep generative models (Jin et al., 2018) and combinatorial optimization algorithms (You et al., 2018; Jensen, 2019) , to facilitate the optimization. However, the applicability of most prior approaches in real-world scenarios is hindered by two practical constraints: (i) realistic oracles (e.g., wet-lab experiments and high-fidelity simulations) require substantial costs to synthesize and evaluate molecules (Gao et al., 2022) , and (ii) chemists commonly seek to optimize multiple properties of interest simultaneously (Jin et al., 2020b) . For example, in addition to effectively inhibiting a disease-associated target, an ideal drug is desired to be easily synthesizable and non-toxic. Bayesian optimization (BO) (Jones et al., 1998; Shahriari et al., 2015) provides a sample-efficient framework for globally optimizing expensive black-box functions. The basic idea is to construct a cheap-to-evaluate surrogate model, typically a Gaussian Process (GP) (Rasmussen, 2003) , to approximate the true function (also known as the oracle) on the observed dataset. The core objective of BO is to optimize an acquisition function (built upon the surrogate model) in order to obtain informative candidates with high utility for the next round of evaluations. This loop is repeated until the evaluation budget is exhausted. Owing to the fact that a large batch of candidates can be evaluated in parallel in biochemical experiments, we perform batch BO (with large-batch and low-round settings (Angermueller et al., 2020) ) to significantly shorten the entire cycle of optimization. As multi-objective optimization (MOO) problems are prevalent in scientific and engineering applications, MOBO also received broad attention and achieved promising performance by effectively optimizing differentiable acquisition functions (Daulton et al., 2020) . Nevertheless, it is less prominent in discrete problems, especially considering batch settings. The difficulty lies in the fact that no gradients can be leveraged to navigate the discrete space for efficient and effective optimization of the acquisition function. Although most of the existing discrete molecular optimization methods can be adopted as the acquisition function optimizer to alleviate this issue, they suffer from the following limitations. 1) Most approaches do not explicitly discuss the diversity of the proposed candidates, which is a key consideration in batch settings as the surrogate model cannot exactly reproduce the oracle's full behaviors. Therefore, we not only want to cover more high modes of the surrogate model but also to obtain candidates that bring additional information about the search space. 2) Most multi-objective methods (Xie et al., 2021; Fu et al., 2022) simply rely on a scalarization function, parameterized by a predefined preference vector reflecting the trade-off between objectives, and turn the MOO problem into a single-objective one. Unfortunately, an ideal trade-off is unclear before optimization (even with domain knowledge), and many potential trade-offs of interest are worth exploring. In principle, it is possible to independently train multiple optimization models, each conditioned on a distinct preference vector, to cover the objective space. Practically, this trivial strategy cannot efficiently scale with the number of objectives (Navon et al., 2021) . The recently proposed GFlowNets (Bengio et al., 2021a) are a class of generative models over discrete objects (e.g., molecular graphs) that aim to learn a stochastic policy for sequentially constructing objects with a probability proportional to a reward function (e.g., the acquisition function). Hence, GFlowNets possess merit in generating diverse and high-reward objects, which makes them appealing in the batch BO context where exploration plays a significant role (Jain et al., 2022) . In this work, we present a MOBO algorithm based on GFlowNets for sample-efficient multiobjective molecular optimization. We propose a hypernetwork-based GFlowNet (HN-GFN) as the acquisition function optimizer within MOBO to sample a diverse batch of candidates from an approximate Pareto front. Instead of defining a fixed reward function as usual in past work (Bengio et al., 2021a) , we train a unified GFlowNet on the distribution of reward functions (random scalarizations parameterized by preference vectors) and control the policy using a single preferenceconditioned hypernetwork. While sampling candidates, HN-GFN explores various trade-offs between competing objectives flexibly by varying the input preference vector. Inspired by Hindsight Experience Replay (Andrychowicz et al., 2017) in RL, we further introduce a hindsight-like offpolicy strategy to share high-performing molecules among different preferences and speed up learning for HN-GFN. As detailed in our reported experiments, we first evaluate HN-GFN through synthetic experiments to verify that HN-GFN is capable of generalizing over preference vectors, then apply the proposed framework to real-world scenarios. Remarkably, our framework outperforms the best baselines by 60% and 24% (relative improvement in terms of hypervolume in the settings with two and four objectives), respectively. Our key contributions are summarized below: • We propose HN-GFN, a unified GFlowNet that can efficiently sample candidates from an approximate Pareto front using a single hypernetwork.



GFlowNet 𝜋 𝜃 : 𝜃 = (𝜃 𝑚𝑝𝑛𝑛 , 𝜃 𝑝𝑟𝑒𝑑 )Figure1: MOBO loop for molecular optimization using an evidential surrogate model M for uncertainty estimation and HN-GFN for acquisition function optimization. In each round, the policy π θ is trained with reward function R λ , where λ is sampled from Dir(α) per iteration. A new batch of candidates is sampled from the approximate Pareto front according to λ target ∈ Λ.

