MARS: MARKOV MOLECULAR SAMPLING FOR MULTI-OBJECTIVE DRUG DISCOVERY

Abstract

Searching for novel molecules with desired chemical properties is crucial in drug discovery. Existing work focuses on developing neural models to generate either molecular sequences or chemical graphs. However, it remains a big challenge to find novel and diverse compounds satisfying several properties. In this paper, we propose MARS, a method for multi-objective drug molecule discovery. MARS is based on the idea of generating the chemical candidates by iteratively editing fragments of molecular graphs. To search for high-quality candidates, it employs Markov chain Monte Carlo sampling (MCMC) on molecules with an annealing scheme and an adaptive proposal. To further improve sample efficiency, MARS uses a graph neural network (GNN) to represent and select candidate edits, where the GNN is trained on-the-fly with samples from MCMC. Experiments show that MARS achieves state-of-the-art performance in various multi-objective settings where molecular bio-activity, drug-likeness, and synthesizability are considered. Remarkably, in the most challenging setting where all four objectives are simultaneously optimized, our approach outperforms previous methods significantly in comprehensive evaluations. The code is available at https://github.com/yutxie/mars.

1. INTRODUCTION

Drug discovery aims to find chemical compounds with desired target properties, such as high druglikeness (Bickerton et al., 2012, QED) . The problem is also referred to as molecular design, molecular generation, or molecular search. The space of drug-like chemicals is enormous, approximate 10 33 for realistic drugs that could ever be synthesized (Polishchuk et al., 2013) . Therefore it is very challenging to search for high-quality molecules from such a vast space -enumeration would take almost forever. For a particular disease, finding the right candidates targeting specific proteins further complicates the problem. Instead of enumerating or searching from the immense chemical space, recent work utilizes deep generative models to generate candidate molecules directly (Schwalbe-Koda & Gómez-Bombarelli, 2020). However, most prior work focuses on generating molecules concerning a single property such as drug-likeness (QED) or octanol-water partition coefficient (logP) (Jin et al., 2018; You et al., 2018; Popova et al., 2019; Shi et al., 2020; Zang & Wang, 2020) . While in practical settings, typical drug discovery requires consideration of multiple properties jointly (Nicolaou et al., 2012) . For example, to find drug-like molecules that are easy to synthesize and exhibit high biological activity against the target protein. Naturally, multi-objective molecule design is much more challenging than the single-objective scenario (Jin et al., 2020) . This paper studies the problem of multi-objective molecule design for drug discovery. An ideal solution should be efficient and meet the following criteria. C1: It should satisfy multiple properties with high scores; C2: It should produce novel and diverse molecules; C3: Its generation process does not

