HOW MUCH SPACE HAS BEEN EXPLORED? MEASURING THE CHEMICAL SPACE COVERED BY DATABASES AND MACHINE-GENERATED MOLECULES

Abstract

Forming a molecular candidate set that contains a wide range of potentially effective compounds is crucial to the success of drug discovery. While most databases and machine-learning-based generation models aim to optimize particular chemical properties, there is limited literature on how to properly measure the coverage of the chemical space by those candidates included or generated. This problem is challenging due to the lack of formal criteria to select good measures of the chemical space. In this paper, we propose a novel evaluation framework for measures of the chemical space based on two analyses: an axiomatic analysis with three intuitive axioms that a good measure should obey, and an empirical analysis on the correlation between a measure and a proxy gold standard. Using this framework, we are able to identify #Circles, a new measure of chemical space coverage, which is superior to existing measures both analytically and empirically. We further evaluate how well the existing databases and generation models cover the chemical space in terms of #Circles. The results suggest that many generation models fail to explore a larger space over existing databases, which leads to new opportunities for improving generation models by encouraging exploration.

1. INTRODUCTION

To efficiently navigate through the huge chemical space for drug discovery, machine learning (ML) based approaches have been broadly designed and deployed, especially de novo molecular generation methods (Elton et al., 2019; Schwalbe-Koda & Gómez-Bombarelli, 2020; Bian & Xie, 2021; Deng et al., 2022) . Such generation models learn to generate candidate drug designs by optimizing various molecular property scores, such as the binding affinity scores. In practice, these scores can be computationally obtained using biological activity prediction models (Olivecrona et al., 2017; Li et al., 2018) , which is the key to obtaining massive labeled training data for machine learning. However, high in silico property scores are far from sufficient, as there is usually a considerable misalignment between these scores and the in vivo behaviors. Costly wet-lab experiments are still needed to verify potential drug hits, where only a limited number of drug candidates can be tested. In light of this cost constraint, it is critical to select or generate drug candidates not only with high in silico scores, but also covering a large portion of the chemical space. As functional difference between molecules is closely related to their structural difference (Huggins et al., 2011; Wawer et al., 2014) , a better coverage of the chemical space will likely lead to a higher chance of hits in wet experiments. For this purpose, quantitative coverage measures of the chemical space become crucial. Such measures can both be used to evaluate and compare the candidate libraries 1 , and be incorporated into training objectives to encourage ML models better explore the chemical space. In this paper, we investigate the problem of quantitatively measuring the coverage of the chemical space by a candidate library. There have been a few such coverage measures of chemical space. For example, richness counts the number of unique compounds in a molecular set, and it has been used to describe how well a model is able to generate unique structures (Shi & von Itzstein, 2019;  

