PHYSICS-EMPOWERED MOLECULAR REPRESENTA-TION LEARNING

Abstract

Estimating the energetic properties of molecular systems is a critical task in material design. With the trade-off between accuracy and computational cost, various methods have been used to predict the energy of materials, including recent neural-net-based models. However, most existing neural-net models are contextfree (physics-ignoring) black-box models, limiting their applications to predict energy only within the distribution of the training set and thus preventing from being applied to the real practice of molecular design. Inspired by the physical mechanism of the interatomic potential, we propose a physics-driven energy prediction model using a Transformer. Our model is trained not only on the energy regression in the training set, but also with conditions inspired by physical insights and self-supervision based on Masked Atomic Modeling, making it adaptable to the optimization of molecular structure beyond the range observed during training, taking a step towards realizable molecular structure optimization.

1. INTRODUCTION

Material simulation is a vast research field that spans understanding material's optimal structure, simulating microscopic dynamics depending on time, temperature, and pressure beyond the experimental resolution, and reducing trial-error loops in designing new materials. The foundation of this simulation is defining the energy at the atomic level considering interactions between numerous atoms, so-called many-body problem. Advances in theory and computational capability have led to higher predictability of energy with greater accuracy than ever before. Despite the tremendous advances, however, many-body interactions between atoms are exponentially complex, so increasing raw computing power alone has fundamental limitations. Thus, it is unavoidable to put an effort into reducing computational cost, as well as improving the prediction accuracy (e.g., quantum mechanics), which has been a grand challenge in computational material simulations. Quantum mechanical electron structure simulation is aligned with the direction to enhance accuracy. Specifically, Density Functional Theory (DFT; Kohn & Sham (1965); Parr (1980) ) is one of the most successful methods in terms of accuracy, describing and predicting the structure of a material, dynamics based on temperature, phase change, and chemical reaction. However, the number of atoms that DFT can practically handle is limited to a few thousand atoms. Thus, it is complicated to compute the dynamics at a larger scale with DFT, which demands an alternative method to scale up. Another direction is to reduce the computational cost by expediting computation through approximation while considering many-body interactions. For atoms of much larger scale, so-called classical force field approximations Harrison et al. ( 2018) have been developed. Various classical force field potentials have been proposed, sharing a standard scheme: formulating the term-by-term energy equation using chemical intuition and performing parameterization for the targeting system. This approach also has drawbacks. First, building a classical force field requires extensive human effort to define energy terms by understanding the target system and parameterization. Also, the potential is described in the bonding terms of the material, so its applicability is restricted to non-reactive materials within similar phases, since the parameter should be changed depending on the chemical environment. Computationally heavier ReaxFF (Senftle et al., 2016; Gomzi et al., 2021) exists, but it focuses on chemical reactions that require parameters tailored to the specific reaction. The force field is challenged to secure accuracy despite a great deal of effort, and generalization to various chemical situations remains a challenging question. Nevertheless, it can mimic potentials with few parameters, since it is based on equations and thus scales better than quantum simulation, which requires energy optimization by self-consistent calculations. Recently, machine learning has made an impact on the material simulation. Specifically, machine learning (ML) surrogate potentials, which predict the energies of given molecules, have drawn much attention from the scientific community as they provide an alternative solution to the aforementioned inevitable trade-off between cost and accuracy of physics-based models. Several state-of-the-art ML surrogate potentials achieved high accuracy close to that of quantum mechanics while requiring significantly less computational resources than traditional DFT-based methods. One might argue that we can produce an infinite training set using DFT, and a fully data-driven approach would eventually work well on this problem, just like on other regression problems. However, considering the exponentially increasing complexity of this problem and limited resources, it is desirable to equip the surrogate potential model with domain knowledge from physics. In this paper, we propose a hybrid approach that combines the powerful expressive power of Transformers (Vaswani et al., 2017) with classical force-field-style equations. In particular, our paper's main contributions are summarized as follows: • We propose a physics-empowered molecular representation learning method, actually preserving the optimal structure instead of simply fitting the single energy value. et al., 2012; Reymond, 2015; Ramakrishnan et al., 2014) , while the increased model capability nest a risk of overfitting (Hawkins,



• Taking advantage of Transformers, our molecular representations are trained in a self-supervised manner by Masked Atomic Modeling, inspired by the approach of Masked Language Modeling. • We conduct extensive experiments to evaluate the proposed model quantitatively and qualitatively, introducing several novel approaches to actually evaluate the model's ability to capture optimal structure of a molecule. Fixed descriptors. Behler & Parrinello (2007) introduced an ML potential model that uses the atom-centered symmetry function (descriptor) to describe the local environment of each atom and passes each descriptor value to the simple feed-forward neural network to map the total energy. These descriptors (symmetry functions) process distance and angle information between paired atoms within a specific cutoff and produce a single value for each descriptor. Behler-Parinello Neural Network (BPNN; Behler & Parrinello (2007)) series are the representative practical examples that increase model complexity for high-dimensional Potential Energy Surface (PES) compared to previous kernel-based methods. BPNN was the realistic and the first attempt to decompose the total energy as a sum of each individual atom's energy. A fundamental limitation of this approach is that fixed descriptors are insufficient to cover complex spatial patterns (e.g., ring structures, bond types, or chemical functional groups), limiting the knowledge transferability between different molecules. Also, the original symmetry function does not reflect the chemical environment outside the cutoff at all Kulichenko et al. (2021). Despite these limitations, it achieved accuracy that no previous classical force field reached. It has been shown to work for systems with many atoms in a dense system with a few species Behler (2015); Kulichenko et al. (2021). Recently, deep neural networks have been actively applied to construct surrogate potentials. Most models in this category allow the chemical environmental information can be transferred between atoms over a greater distance than traditional models, providing a higher degree of freedom. As a specific example, ANI (Smith et al., 2017) extended BPNN by modifying its angular function. Gilmer et al. (2017) proposed a Message Passing Neural Networks (MPNN), specialized in learning from graph-structured data by updating hidden node states by combining messages from adjacent nodes. Since then, various graph-based approaches (Schütt et al., 2018; Gasteiger et al., 2020; Unke & Meuwly, 2019) have been proposed. MPNNs significantly improved accuracy in molecule-related tasks on QM9 (Ruddigkeit

