PREDICTING DRUG REPURPOSING CANDIDATES AND THEIR MECHANISMS FROM A BIOMEDICAL KNOWL-EDGE GRAPH

Abstract

Computational drug repurposing is a cost-and time-efficient method to identify new indications of approved or experimental drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action between repurposed drugs and their target diseases remain largely unknown, which is still an unsolved issue in existing repurposing methods. As such, computational drug repurposing has not been widely adopted in clinical settings. In this work, based on a massive biomedical knowledge graph, we propose a computational drug repurposing framework that not only predicts the treatment probabilities between drugs and diseases but also predicts the path-based, testable mechanisms of action (MOAs) as their biomedical explanations. Specifically, we utilize the GraphSAGE model in an unsupervised manner to integrate each entity's neighborhood information and employ a Random Forest model to predict the treatment probabilities between pairs of drugs and diseases. Moreover, we train an adversarial actorcritic reinforcement learning model to predict the potential MOA for explaining drug purposing. To encourage the model to find biologically reasonable paths, we utilize the curated molecular interactions of drugs and a PubMed-publicationbased concept distance to extract potential drug MOA paths from the knowledge graph as "demonstration paths" to guide the model during the process of pathfinding. Comprehensive experiments and case studies show that the proposed framework outperforms state-of-the-art baselines in both predictive performance of drug repurposing and explanatory performance of recapitulating human-curated DrugMechDB-based paths.

1. INTRODUCTION

Traditional drug development is a time-consuming process (from initial chemical identification to clinical trials and finally to FDA approval) that takes around 10-15 years and also comes along with billions-of-dollars investments and high failure rates (Berdigaliyev & Aljofan, 2020) . Considering the rapid pace of novel disease evolution, it is urgent to find a more efficient and economical drug discovery method. Fortunately, it has been observed that a single drug can often be effective in treating multiple diseases. For example, thalidomide was originally used as an anti-anxiety medication (Miller, 1991) , and was later found to have the potential for the treatment of cancers (Singhal et al., 1999) . Hence, drug repurposing, also known as the identification of new uses for the approved or experimental drugs/compounds, might bring us the hope to address this urgent need with the advantage of a shorter research cycle, lower investments, and more pre-existing safety tests. Existing drug repurposing approaches can roughly be categorized into three groups: experimentalbased approaches (e.g., binding affinity assays, phenotypic screening), clinical-based approaches (e.g., off-label drug use analysis), and computational-based approaches (e.g., network-based approaches) (Dhir et al., 2020) . Due to the advancement of techniques, more and more publicly available biomedical data can be freely accessed in different databases such as DrugBank (Wishart et al., 2017 ), ChEMBL (Gaulton et al., 2012 ), HMDB (Wishart et al., 2018) , which makes the computational approaches seem to be more cost-efficient, particularly when the goal is to prioritize repur-

