EFFICIENT APPROXIMATIONS OF COMPLETE INTER-ATOMIC POTENTIALS FOR CRYSTAL PROPERTY PRE-DICTION

Abstract

We study the problem of crystal material property prediction. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we propose to model the complete set of potentials among all atoms, instead of only between nearby atoms as in prior methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of complete interatomic potentials leads to consistent performance improvements with reasonable computational costs.

1. INTRODUCTION

The past decade has witnessed a surge of interests and rapid developments in machine learning for molecular analysis (Duvenaud et al., 2015) . These initial studies mainly focus on the prediction and generation problems of small molecules. To enable computational analyses, molecules need to be featurized in an appropriate mathematical representation form. Recently, with the advances of graph neural networks (GNNs) (Gilmer et al., 2017; Battaglia et al., 2018) , molecules are more commonly represented as graphs in which each node corresponds to an atom, and each edge corresponds to a chemical bond (Stokes et al., 2020; Wang et al., 2022c) . A variety of molecular graph prediction (Stokes et al., 2020; Wang et al., 2022c) and generation (Shi et al., 2019; Jin et al., 2018; Luo et al., 2021) methods have been developed based on 2D molecular graph representations. A key limitation of the 2D graph representations is that the 3D geometries of molecules are not captured, but such information may be critical in many molecular property prediction problems (Hu et al., 2021) . To enable the encoding of 3D molecular geometries in GNNs, a series of 3D GNN methods have been developed for prediction (Schütt et al., 2017; Gasteiger et al., 2019; Liu et al., 2022b; Wang et al., 2022b) and generation (Liu et al., 2022a; Luo & Ji, 2022; Hoogeboom et al., 2022) problems. In these 3D graph representations, each node is associated with the corresponding atom's coordinate in 3D space. Geometric information, such as distances between nodes and angles between edges, is used during message passing in GNNs. Recently, these methods have been extended to learn representations for proteins (Jing et al., 2020; Wang et al., 2022a) . Inspired by the success of GNNs on small molecules, Xie & Grossman (2018) developed the crystal graph convolutional neural network (CGCNN) for crystal material property prediction. Different from small molecules and proteins, crystal materials are typically modeled by a minimal unit cell (similar to a small molecule) that is repeated in 3D space with certain directions and step sizes. In theory, the unit cell is repeated infinitely in 3D space, but any real-world material has finite size. However, given that our modeling is at the atomic level, modeling crystal materials as infinite repetitions of unit cells is approximately accurate. Therefore, a key challenge in crystal material modeling is how to accurately capture the infinite-range interatomic interactions resulted from the repetitions of unit cells in 3D space. Current GNN-based crystal property prediction methods construct graphs by creating edges only between atoms within a pre-specified distance threshold (Xie & Grossman, 2018; Chen et al., 2019; Louis et al., 2020; Schmidt et al., 2021; Choudhary & DeCost, 2021) . Thus, they fail to capture interactions between distant atoms explicitly. In this work, we propose a new graph deep learning method, PotNet, with several innovations to significantly advance the field of crystal material modeling. First, we propose model interatomic potentials directly as edge features in PotNet, instead of using distance as in prior methods. These potentials include the Coulomb potential (West, 1988) , London dispersion potential (Wagner & Schreiner, 2015) , and Pauli repulsion potential (Krane, 1991) . Second, a distinguishing feature of PotNet is to model the complete set of potentials among all atoms, instead of only between nearby atoms as in prior methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We performed comprehensive experiments on the JARVIS and Materials Project benchmarks to evaluate our methods. Results show that the use of complete interatomic potentials in our methods leads to consistent performance improvements with reasonable computational costs.

2.1. CRYSTAL REPRESENTATION AND PROPERTY PREDICTION

A crystal structure can be represented as periodic repetitions of unit cells in the threedimensional (3D) Euclidean space, where the unit cell contains the smallest repeatable structure of a given crystal. Specifically, let n be the number of atoms in the unit cell, a crystal can be represented as M = (A, L). Here, A = {a i } n i=1 = {(x i , p i )} n i=1 describes one of the unit cell structures of M , where x i ∈ R b and p i ∈ R 3 denote the b-dimensional feature vector and the 3D Cartesian coordinates of the i-th atom in the unit cell, respectively. L = [l 1 , l 2 , l 3 ] ∈ R 3×3 is the lattice matrix describing how a unit cell repeats itself in the 3D space. In the complete crystal structure, every atom in a unit cell repeats itself periodically in the 3D space. Specifically, from an arbitrary integer vector k ∈ Z 3 and the unit cell structure A, we can always obtain another repeated unit cell structure A k = {a k i } n i=1 = {(x k i , p k i )} n i=1 , where x k i = x i , p k i = p i + Lk. Hence, the complete crystal structure A of M with all unit cells can be described as A = k∈Z 3 A k . (1) In this work, we study the problem of crystal property prediction. Our objective is to learn a property prediction model f : M → y ∈ R that can predict the property y of the given crystal structure M . We will focus on predicting the total energy, or other energy-related properties of crystals.

2.2. CRYSTAL PROPERTY PREDICTION WITH INTERATOMIC POTENTIALS

Most of the classical crystal energy prediction methods are based on interatomic potentials. According to the studies in physics (West, 1988; Daw et al., 1993; Brown, 2016) , the total energy of a crystal structure can be approximated by the summation of interatomic potentials in the crystal. Particularly, the three following categories of interatomic potentials are widely used in crystals, and they are considered as sufficient for accurate energy approximation. • Coulomb potential is caused by the electrostatic interaction of two atoms with charges. Coulomb potentials are closely related to ionic bonding and metallic bonding in crystals (West, 1988) . . Here e is the elementary charge constant, and ϵ 0 is the permittivity constant of free space.



For any two atoms a and b, let z a and z b denote the number of charges in the atom a and b, and let d(a, b) be the Euclidean distance between the atom a and b. The Coulomb potential V Coulomb is defined as V Coulomb (a, b) = -zaz b e 2 4πϵ0d(a,b)

