MOLECULE OPTIMIZATION BY EXPLAINABLE EVOLUTION

Abstract

Optimizing molecules for desired properties is a fundamental yet challenging task in chemistry, material science, and drug discovery. This paper develops a novel algorithm for optimizing molecular properties via an Expectation-Maximization (EM) like explainable evolutionary process. The algorithm is designed to mimic human experts in the process of searching for desirable molecules and alternate between two stages: the first stage on explainable local search which identifies rationales, i.e., critical subgraph patterns accounting for desired molecular properties, and the second stage on molecule completion which explores the larger space of molecules containing good rationales. We test our approach against various baselines on a real-world multi-property optimization task where each method is given the same number of queries to the property oracle. We show that our evolution-by-explanation algorithm is 79% better than the best baseline in terms of a generic metric combining aspects such as success rate, novelty, and diversity. Human expert evaluation on optimized molecules shows that 60% of top molecules obtained from our methods are deemed successful.

1. INTRODUCTION

The space of organic molecules is vast, the size of which is exceeding 10 60 (Reymond et al., 2010) . Searching over this vast space for molecules of interest is a challenging task in chemistry, material science, and drug discovery, especially given that molecules are desired to meet multiple criteria, e.g., high potency and low toxicity in drug discovery. When human experts optimize molecules for better molecular properties, they will first come up with rationales within desirable molecules. Typically, the rationales are subgraphs in a molecule deemed to contribute primarily to certain desired molecular properties. Once rationales are identified, chemists will design new molecules on top of rationales hoping that, the desired properties of new molecules will be further enhanced due to the existence of rationale and changes of non-rationale parts. The cycle of identifying molecular rationales and redesigning new hypothetical molecules will be carried on until molecules that meet certain property criteria are discovered. In this paper, we develop a novel algorithm that mimics the process of molecule optimization by human experts. Our algorithm finds new molecules with better properties via an EM-like explainable evolutionary process (Figure 1 ). The algorithm alternates between two stages. During the first stage, we use an explainable local search method to identify rationales within high-quality molecules that account for their high property scores. During the second stage, we use a conditional generative model to explore the larger space of molecules containing useful rationales. Our method is novel in that we are using explainable models to help us exploit useful patterns in the molecules, yet leveraging generative models to help us explore the molecule landscape. Comparing to existing methods that directly learn a generative model using Reinforcement Learning or perform continuous optimization in the latent space of molecules (Olivecrona et al., 2017; You et al., 2018a; Dai et al., 2018b) , our method is more sample-efficient and can generate more novel and unique molecules that meet the criteria. We evaluate our algorithm against several state-of-the-art methods on a molecule optimization task involving multiple properties. Compared with baselines, our algorithm is able to increase the success

