MULTIMODAL JOINT EMBEDDING TRANSFORMER FOR CONDITIONAL DE NOVO MOLEC-ULAR DESIGN AND MULTI-PROPERTY OPTIMIZATION Anonymous

Abstract

Multi-property constrained optimization of molecules using generative de novo design models is vital for the successful application of Artificial Intelligence (AI) towards materials and drug discovery. Yet there remains a gap between the reported performance of such models in the literature and their practical utility in real world design scenarios. Furthermore, existing models are largely inaccessible to chemists without an extensive background in computer science. To address these challenges, we propose a generative foundation model, the Multimodal Joint Embedding Transformer (MOLJET), which performs conditional generation of desired molecular distributions based on human-interpretable chemistry prompts in a zero-shot manner. We assess MOLJET on the standard benchmarks available in the GuacaMol and MIMOSA evaluation frameworks. These include structurebased sampling tasks as well as a range of multi-property optimization tasks that probe a models ability to design drug-like molecules given realistic property constraints. We demonstrate that with self-supervised pretraining, MOLJET outperforms 80% of task-optimized models while using zero-shot inferences and beats all baselines after minimal supervision. Moreover, the performance of MOLJET on text-only conditioning tasks improves with the inclusion of property modalities during training, highlighting the importance of a multimodal approach to molecular design. MOLJET is the first example of text-based de novo molecular design using large-scale multimodal foundation models and should serve as a building block towards further improvements to accessible AI for chemists.

1. INTRODUCTION

Emerging crises in climate, disease and human health threaten to permanently disrupt global stability and must be actively met with creative solutions. Many such solutions are dependent on the rapid discovery of innovative functional materials or novel drug-like molecules with optimal properties. For instance, the viability of using redox-flow batteries (RFBs) for long-term and large-scale energy storage is contingent on finding stable redox species with fast electrochemical kinetics, a feasible redox potential and high solubility (Zhang et al., 2018) . Due to the immense size and complexity of chemical phase space (Polishchuk et al., 2013) , the search for suitable materials is far from trivial and traditional "direct" design approaches based on iterative modifications to existing chemical structures are often far too slow (Kuhn & Beratan, 1996) . To address this issue, researchers have increasingly begun to look towards generative de novo design models to efficiently navigate the vast molecular phase space (Meyers et al., 2021) . These models are evaluated on their ability to generate a diverse array of novel molecular structures while simultaneously biasing them towards a desired property distribution (Polykovskiy et al., 2020) . Due to the ubiquity of string-based molecular representations (Weininger, 1988; Krenn et al., 2020) , recent innovations in natural language modeling have been successfully applied to de novo molecular design. For instance, transformer architectures have achieved state-of-the-art results on property prediction tasks that require quantum-level accuracy (Ross et al., 2021) and have also been shown to increase the diversity of candidates sampled from machine-learned molecular distributions (Dollar et al., 2021) .

