ONE TRANSFORMER CAN UNDERSTAND BOTH 2D & 3D MOLECULAR DATA

Abstract

Unlike vision and language data which usually has a unique format, molecules can naturally be characterized using different chemical formulations. One can view a molecule as a 2D graph or define it as a collection of atoms located in a 3D space. For molecular representation learning, most previous works designed neural networks only for a particular data format, making the learned models likely to fail for other data formats. We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. Using the standard Transformer as the backbone architecture, Transformer-M develops two separated channels to encode 2D and 3D structural information and incorporate them with the atom features in the network modules. When the input data is in a particular format, the corresponding channel will be activated, and the other will be disabled. By training on 2D and 3D molecular data with properly designed supervised signals, Transformer-M automatically learns to leverage knowledge from different data modalities and correctly capture the representations. We conducted extensive experiments for Transformer-M. All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks, suggesting its broad applicability. The code and models will be made publicly available at https://github.com/lsj2408/Transformer-M.

1. INTRODUCTION

Deep learning approaches have revolutionized many domains, including computer vision (He et al., 2016) , natural language processing (Devlin et al., 2019; Brown et al., 2020), and games (Mnih et al., 2013; Silver et al., 2016) . Recently, researchers have started investigating whether the power of neural networks could help solve important scientific problems in chemistry, e.g., predicting the property of molecules and simulating the molecular dynamics from large-scale training data (Hu et al., 2020a; 2021; Zhang et al., 2018; Chanussot et al., 2020) . One key difference between chemistry and conventional domains such as vision and language is the multimodality of data. In vision and language, a data instance is usually characterized in a particular form. For example, an image is defined as RGB values in a pixel grid, while a sentence is defined as tokens in a sequence. In contrast, molecules naturally have different chemical formulations. A molecule can be represented as a sequence (Weininger, 1988 ), a 2D graph (Wiswesser, 1985) , or a collection of atoms located in a 3D space. 2D and 3D structures are the most popularly used formulations as many valuable properties and statistics can be obtained from them (Chmiela et al., 2017; Stokes et al., 2020) . However, as far as we know, most previous works focus on designing neural network models for either 2D or 3D structures, making the model learned in one form fail to be applied in tasks of the other form. We argue that a general-purpose neural network model in chemistry should at least be able to handle molecular tasks across data modalities. In this paper, we take the first step toward this goal by

