O-GNN: INCORPORATING RING PRIORS INTO MOLECULAR MODELING

Abstract

Cyclic compounds that contain at least one ring play an important role in drug design. Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models. In this work, we design a new variant of GNN, ring-enhanced GNN (O-GNN), that explicitly models rings in addition to atoms and bonds in compounds. In O-GNN, each ring is represented by a latent vector, which contributes to and is iteratively updated by atom and bond representations. Theoretical analysis shows that O-GNN is able to distinguish two isomorphic subgraphs lying on different rings using only one layer while conventional graph convolutional neural networks require multiple layers to distinguish, demonstrating that O-GNN is more expressive. Through experiments, O-GNN shows good performance on 11 public datasets. In particular, it achieves state-ofthe-art validation result on the PCQM4Mv1 benchmark (outperforming the previous KDDCup champion solution) and the drug-drug interaction prediction task on DrugBank. Furthermore, O-GNN outperforms strong baselines (without modeling rings) on the molecular property prediction and retrosynthesis prediction tasks.

1. INTRODUCTION

Cyclic compounds, which refers to the molecules that have at least one ring in its system, naturally exist in the chemical space. According to our statistics on 109M compounds from PubChem (Kim et al., 2019) which is a widely used chemical library, more than 90% compounds have at least one ring. The rings could be small/simple (e.g., the benzene is a six-member carbon ring, and the pentazole is a five-member nitrogen ring) or large/complex (e.g., the molecule shown in Figure 1 ). Rings are important in drug discovery, for example: (1) Rings can potentially reduce the flexibility of molecules, reduce the uncertainty when interacting with target proteins, and lock the molecules to their bioactive conformation (Sun et al., 2012) . (2) Macrocyclic compounds, which usually have a ring with more than 12 atoms, play important roles in antibotics design (Venugopal & Johnson, 2011) and peptide drug design (Bhardwaj et al., 2022) . Recently, deep neural networks, especially graph neural networks (denoted as GNN) (Kipf & Welling, 2017; Hamilton et al., 2017a) , have been widely used in molecular modeling. A GNN takes a graph as input, and messages of different nodes are passed along edges. GNNs have made great success in scientific discovery: (1) Stokes et al. (2020) train a GNN to predict growth inhibition of Escherichia coli and find that Halicin is a broad-spectrum bactericidal antibiotic. (2) Shan et al. ( 2022) leverage GNN to model the interactions between proteins, and they eventually obtain possible antibodies for SARS-CoV-2. In addition, GNNs are widely used in drug property prediction (Rong et al., 2020) , drug-target interaction modeling (Torng & Altman, 2019), retrosynthesis

availability

//github.

