PREDICTING DRUG REPURPOSING CANDIDATES AND THEIR MECHANISMS FROM A BIOMEDICAL KNOWL-EDGE GRAPH

Abstract

Computational drug repurposing is a cost-and time-efficient method to identify new indications of approved or experimental drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action between repurposed drugs and their target diseases remain largely unknown, which is still an unsolved issue in existing repurposing methods. As such, computational drug repurposing has not been widely adopted in clinical settings. In this work, based on a massive biomedical knowledge graph, we propose a computational drug repurposing framework that not only predicts the treatment probabilities between drugs and diseases but also predicts the path-based, testable mechanisms of action (MOAs) as their biomedical explanations. Specifically, we utilize the GraphSAGE model in an unsupervised manner to integrate each entity's neighborhood information and employ a Random Forest model to predict the treatment probabilities between pairs of drugs and diseases. Moreover, we train an adversarial actorcritic reinforcement learning model to predict the potential MOA for explaining drug purposing. To encourage the model to find biologically reasonable paths, we utilize the curated molecular interactions of drugs and a PubMed-publicationbased concept distance to extract potential drug MOA paths from the knowledge graph as "demonstration paths" to guide the model during the process of pathfinding. Comprehensive experiments and case studies show that the proposed framework outperforms state-of-the-art baselines in both predictive performance of drug repurposing and explanatory performance of recapitulating human-curated DrugMechDB-based paths.

1. INTRODUCTION

Traditional drug development is a time-consuming process (from initial chemical identification to clinical trials and finally to FDA approval) that takes around 10-15 years and also comes along with billions-of-dollars investments and high failure rates (Berdigaliyev & Aljofan, 2020) . Considering the rapid pace of novel disease evolution, it is urgent to find a more efficient and economical drug discovery method. Fortunately, it has been observed that a single drug can often be effective in treating multiple diseases. For example, thalidomide was originally used as an anti-anxiety medication (Miller, 1991) , and was later found to have the potential for the treatment of cancers (Singhal et al., 1999) . Hence, drug repurposing, also known as the identification of new uses for the approved or experimental drugs/compounds, might bring us the hope to address this urgent need with the advantage of a shorter research cycle, lower investments, and more pre-existing safety tests. Existing drug repurposing approaches can roughly be categorized into three groups: experimentalbased approaches (e.g., binding affinity assays, phenotypic screening), clinical-based approaches (e.g., off-label drug use analysis), and computational-based approaches (e.g., network-based approaches) (Dhir et al., 2020) . Due to the advancement of techniques, more and more publicly available biomedical data can be freely accessed in different databases such as DrugBank (Wishart et al., 2017 ), ChEMBL (Gaulton et al., 2012) , HMDB (Wishart et al., 2018) , which makes the computational approaches seem to be more cost-efficient, particularly when the goal is to prioritize repur-posed targets for followup experimental investigation. One of the computational drug repurposing methods commonly used in recent years is to integrate existing biomedical relations from databases or literature into a so-called biomedical knowledge graph (BKG) where unknown drug-disease treatment relationships are predicted via different knowledge graph (KG)-based machine learning models (Himmelstein et al., 2017; Ioannidis et al., 2020b; Zhang et al., 2021; Zhang & Che, 2021) . Although these KG-based models are demonstrated to have good predictive performance for drug repurposing, they struggle to explain why some drugs can be useful for treating a given disease in an intuitive and easy-to-understand fashion. To solve the "black-box" concern for drug repurposing prediction, some methods are proposed to leverage KG-based paths as explanations, as illustrated in Figure 1 . However, these existing models cannot be efficiently applied to a large and general BKG without additional weighted edge information (Sosa et al., 2020) or pre-defined meta-paths derived from domain experts or inefficient computational methods (e.g., degree-weighted path count -"DWPC") (Liu et al., 2021) . In this study, we customize a large and standardized biomedical knowledge graph and propose a computational drug repurposing framework that predicts not only the treatment probabilities between drugs and diseases but also the KG-based mechanism of action (MOA) (Davis, 2020) paths as their biomedical explanations based on the treatment predictions. For drug repurposing predictions, we first calculate attribute embedding as the initial feature of each node and employ the GraphSAGE model in an unsupervised manner to further capture the neighborhood information for each node, then a Random Forest model is utilized to predict the treatment probability of drug-disease pairs based on their embeddings. To predict the MOA paths, we employ the ADversarial Actor-Critic (ADAC) reinforcement learning (RL) model (Zhao et al., 2020) to perform path-finding on the knowledge graph. To encourage the RL model to find paths that are biologically reasonable, we amplify it with knowledge-and-publication-based "demonstration paths", paths that explain why a drug can treat a disease. Although the underlying mechanisms of action between repurposed drugs and their target diseases largely remain vague, in this study, we assume that a repurposed drug follows similar molecular mechanisms as the known MOAs to treat different diseases. Based on this assumption, we define demonstration paths based on the known drug-target interactions from a curated drug database (e.g., DrugBank v5.1 (Wishart et al., 2017) ) and a chemical-knowledge-centric data provider (e.g., Molecular Data Provider v1.2foot_0 ) as well as an adjusted PubMed-publicationbased version of Normalized Google Distance (NGD) (Cilibrasi & Vitanyi, 2007) . In summary, the main contributions are summarized as follows: • We propose a novel computational model framework that both accurately predict how likely a drug can be used to treat a disease and also predict its corresponding knowledge graphbased mechanism of action path as the explanation of the predicted treatment. • We are innovative in using a knowledge-based and publication-based method to extract demonstration paths from a BKG and leverage it to guide the RL model to identify biologically reasonable paths. Empirical results demonstrate the great effectiveness of it. By comparing with the existing popular KG-based models and evaluating the predicted paths with an expert-curated path-based drug MOA database DrugMechDB (Mayers et al., 2020), we show that this proposed model framework outperforms the state-of-the-art baseline models on both the predictive performance of drug repurposing and the explanatory performance of recapitulating humancurated MOA paths provided by DrugMechDB. In further case studies, by comparing the model predictions with the real regulatory networks, we show that the proposed framework is effective in identifying biologically reasonable KG-based paths for real-world applications.



https://github.com/NCATSTranslator/Translator-All/wiki/ Molecular-Data-Provider



Figure 1: Drug repurposing prediction and pathbased explanation.

