EFFICIENT APPROXIMATIONS OF COMPLETE INTER-ATOMIC POTENTIALS FOR CRYSTAL PROPERTY PRE-DICTION

Abstract

We study the problem of crystal material property prediction. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we propose to model the complete set of potentials among all atoms, instead of only between nearby atoms as in prior methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of complete interatomic potentials leads to consistent performance improvements with reasonable computational costs.

1. INTRODUCTION

The past decade has witnessed a surge of interests and rapid developments in machine learning for molecular analysis (Duvenaud et al., 2015) . These initial studies mainly focus on the prediction and generation problems of small molecules. To enable computational analyses, molecules need to be featurized in an appropriate mathematical representation form. Recently, with the advances of graph neural networks (GNNs) (Gilmer et al., 2017; Battaglia et al., 2018) , molecules are more commonly represented as graphs in which each node corresponds to an atom, and each edge corresponds to a chemical bond (Stokes et al., 2020; Wang et al., 2022c) . A variety of molecular graph prediction (Stokes et al., 2020; Wang et al., 2022c) and generation (Shi et al., 2019; Jin et al., 2018; Luo et al., 2021) methods have been developed based on 2D molecular graph representations. A key limitation of the 2D graph representations is that the 3D geometries of molecules are not captured, but such information may be critical in many molecular property prediction problems (Hu et al., 2021) . To enable the encoding of 3D molecular geometries in GNNs, a series of 3D GNN methods have been developed for prediction (Schütt et al., 2017; Gasteiger et al., 2019; Liu et al., 2022b; Wang et al., 2022b) and generation (Liu et al., 2022a; Luo & Ji, 2022; Hoogeboom et al., 2022) problems. In these 3D graph representations, each node is associated with the corresponding atom's coordinate in 3D space. Geometric information, such as distances between nodes and angles between edges, is used during message passing in GNNs. Recently, these methods have been extended to learn representations for proteins (Jing et al., 2020; Wang et al., 2022a) . Inspired by the success of GNNs on small molecules, Xie & Grossman (2018) developed the crystal graph convolutional neural network (CGCNN) for crystal material property prediction. Different from small molecules and proteins, crystal materials are typically modeled by a minimal unit cell (similar to a small molecule) that is repeated in 3D space with certain directions and step sizes. In theory, the unit cell is repeated infinitely in 3D space, but any real-world material has finite size. However, given that our modeling is at the atomic level, modeling crystal materials as infinite repetitions of unit cells is approximately accurate. Therefore, a key challenge in crystal material modeling is how to accurately capture the infinite-range interatomic interactions resulted from the repetitions of unit cells in 3D space. Current GNN-based crystal property prediction methods construct graphs by creating edges only between atoms within a pre-specified distance threshold (Xie & Grossman, 2018; Chen et al., 2019; Louis et al., 2020; Schmidt et al., 2021; Choudhary & DeCost, 2021) . Thus, they fail to capture interactions between distant atoms explicitly. In this work, we propose a new graph deep learning method, PotNet, with several innovations to significantly advance the field of crystal material modeling. First, we propose to model interatomic potentials directly as edge features in PotNet, instead of using distance as in prior methods. These potentials include the Coulomb potential (West, 1988) , London dispersion potential (Wagner & Schreiner, 2015) , and Pauli repulsion potential (Krane, 1991) . Second, a distinguishing feature of PotNet is to model the complete set of potentials among all atoms, instead of only between nearby atoms as in prior methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We performed comprehensive experiments on the JARVIS and Materials Project benchmarks to evaluate our methods. Results show that the use of complete interatomic potentials in our methods leads to consistent performance improvements with reasonable computational costs.

2.1. CRYSTAL REPRESENTATION AND PROPERTY PREDICTION

A crystal structure can be represented as periodic repetitions of unit cells in the threedimensional (3D) Euclidean space, where the unit cell contains the smallest repeatable structure of a given crystal. Specifically, let n be the number of atoms in the unit cell, a crystal can be represented as M = (A, L). Here, A = {a i } n i=1 = {(x i , p i )} n i=1 describes one of the unit cell structures of M , where x i ∈ R b and p i ∈ R 3 denote the b-dimensional feature vector and the 3D Cartesian coordinates of the i-th atom in the unit cell, respectively. L = [l 1 , l 2 , l 3 ] ∈ R 3×3 is the lattice matrix describing how a unit cell repeats itself in the 3D space. In the complete crystal structure, every atom in a unit cell repeats itself periodically in the 3D space. Specifically, from an arbitrary integer vector k ∈ Z 3 and the unit cell structure A, we can always obtain another repeated unit cell structure A k = {a k i } n i=1 = {(x k i , p k i )} n i=1 , where x k i = x i , p k i = p i + Lk. Hence, the complete crystal structure A of M with all unit cells can be described as A = k∈Z 3 A k . (1) In this work, we study the problem of crystal property prediction. Our objective is to learn a property prediction model f : M → y ∈ R that can predict the property y of the given crystal structure M . We will focus on predicting the total energy, or other energy-related properties of crystals.

2.2. CRYSTAL PROPERTY PREDICTION WITH INTERATOMIC POTENTIALS

Most of the classical crystal energy prediction methods are based on interatomic potentials. According to the studies in physics (West, 1988; Daw et al., 1993; Brown, 2016) , the total energy of a crystal structure can be approximated by the summation of interatomic potentials in the crystal. Particularly, the three following categories of interatomic potentials are widely used in crystals, and they are considered as sufficient for accurate energy approximation. • Coulomb potential is caused by the electrostatic interaction of two atoms with charges. Coulomb potentials are closely related to ionic bonding and metallic bonding in crystals (West, 1988) . For any two atoms a and b, let z a and z b denote the number of charges in the atom a and b, and let d(a, b) be the Euclidean distance between the atom a and b. The Coulomb potential V Coulomb is defined as a,b) . Here e is the elementary charge constant, and ϵ 0 is the permittivity constant of free space. V Coulomb (a, b) = -zaz b e 2 4πϵ0d( • London dispersion potential describes the Van der Waals interaction between atoms. London dispersion potential is often considered in energy estimation because its contribution is cumulative over the volume of crystals (Wagner & Schreiner, 2015) and sometimes very strong in bulk crystals, such as the crystals containing sulfur and phosphorus. The mathematical form of this potential can be described as V London (a, b) = -ϵ/d 6 (a, b), where ϵ is a hyperparameter. • Pauli repulsion potential is resulted from the Pauli exclusion principle generally exists in all crystal structures. The Pauli exclusion principle forces any two atoms to be sufficiently far away from each other so that the electron orbits of them do not overlap. Such exclusion interactions lead to Pauli repulsion potential with the form of V Pauli (a, b) = e -αd (a,b) , where α is a hyperparameter (Buckingham, 1938; Slater, 1928) .

2.3. CRYSTAL PROPERTY PREDICTION WITH DEEP LEARNING

While physics-based methods have been used for predicting the crystal energy for a long time, these methods are usually crystal-specific, i.e., one method can only achieve accurate approximation for one specific type of crystals. Recently, thanks to the advances of deep learning, many studies have been done to develop a general crystal property predictor for a variety of different crystals with powerful deep neural network models. Some studies (Wang et al., 2021; Jha et al., 2018; 2019; Goodall & Lee, 2020) represent crystals as chemical formulas, and adopt sequence models to predict properties from these string representations. However, more recent studies consider crystals as 3D graphs and employ expressive 3D GNNs (Schütt et al., 2017; Klicpera et al., 2020b; Liu et al., 2022b) , a family of deep neural networks specifically designed for 3D graph-structured data, to crystal representation learning. CGCNN (Xie & Grossman, 2018) is the first method that proposes to represent crystals with radius graphs and adopts a graph convolutional network to predict the property from the graph. Based on the pioneering exploration of CGCNN, a lot of subsequent studies (Schmidt et al., 2021; Louis et al., 2020; Chen et al., 2019; Choudhary & DeCost, 2021; Batzner et al., 2022) propose various 3D GNN architectures to achieve more effective crystal representation learning. Particularly, by enhancing the input features with angle information, ALIGNN (Choudhary & De-Cost, 2021) develops the currently most powerful 3D GNN architecture for crystals and achieves the best crystal property prediction performance.

3. METHOD

Although existing GNN-based methods have achieved impressive performance in crystal property prediction, they struggle in further boosting the performance due to the approximation of interatomic interactions using functional expansions based on distances and failing in capturing complete interatomic interactions. In this section, we present PotNet, a novel crystal representation model that can overcome these limitations of prior methods. Based on the physical modeling of crystal energy, Pot-Net explicitly uses infinite potential summations as input features to capture complete interatomic interactions. The infinite potential summations are incorporated into the message passing mechanism of graph neural networks and efficiently approximated by a fast algorithm. To the best of our knowledge, PotNet is the first work that bridges the classical crystal energy computation methods based on potentials and the data-driven methods based on deep neural networks.

3.1. APPROXIMATING CRYSTAL ENERGY WITH COMPLETE INTERATOMIC POTENTIALS

According to the density functional theory (DFT) in physics, for any crystal M = (A, L) with the complete structure A defined in Eqn. (1), its total energy E(M ) can be accurately approximated by the embedded atom method (Daw & Baskes, 1984; Daw et al., 1993; Baskes, 1987; Lee et al., 2016; Riffe et al., 2018) in the form of E(M ) = 1 2 a∈A b̸ =a,b∈ A V (a, b) + a∈A F (ρ a ), where V (a, b) denotes the interatomic potentials between the atoms a and b, capturing the magnitude of interactions; ρ a is the local electron density of the atom a, determined by the coordinate and number of charges of the atom a according to the Hohenberg-Kohn theorem; F (•) is a parametrized function to embed the electron density ρ a . Actually, existing studies (Jalkanen & Müser, 2015) show that ρ a can be considered as a function of b̸ =a,b∈ A V (a, b) mathematically. Hence, Eqn. (2) can be rewritten in the following form: E(M ) = a∈A   1 2 b̸ =a,b∈ A V (a, b) + G   b̸ =a,b∈ A V (a, b)     , where G(•) is a parametrized function. Eqn. (3) can be considered as a way to compute the energy from the complete interatomic potential summation b̸ =a,b∈ A V (a, b) of every atom a in the unit cell A. However, in practice, the function G is computationally expensive if not infeasible. Hence, more and more studies have turned to the powerful learning capability of modern deep neural network models to approximate it effectively.

3.2. LIMITATIONS OF EXISTING DEEP LEARNING METHODS

Currently, most of the existing graph deep learning methods for crystals (Xie & Grossman, 2018; Chen et al., 2019; Louis et al., 2020; Choudhary & DeCost, 2021) use radius graph representations and distance-based features as inputs to predict the crystal energy in Eqn. (3). Specifically, for a crystal M = (A, L), the radius graph is constructed by adding edges between any atom a in the unit cell A and any other atom b in the complete crystal structure A whose distances are smaller than a pre-specified distance threshold r. In addition, some functional expansions of distances, e.g., radial basis functions (RBF), are used to model interatomic interactions and form the input edge features to 3D GNN models. Hence, let a = (x a , p a ), b = (x b , p b ), the crystal energy prediction Ê(M ) of these methods can be generally described as However, we argue that predicting or approximating the energy with Eqn. ( 4) is a suboptimal solution. Actually, compared with Eqn. (3), which is physics-principled, there exist non-negligible approximation errors in Eqn. (4). First, Eqn. (4) captures the interatomic interactions based on interatomic distances, while the energy can be more accurately approximated by a function of interatomic potentials as in Eqn. (3). Though according to Sec. 2.2, interatomic potentials themselves are also functions of distances, we argue that directly using functional expansions of distances is not the best solution to crystal energy prediction. The commonly used functional expansions in existing methods, such as RBF ϕ(•), have different mathematical forms from potentials defined in Sec. 2.2. Intuitively, this poses more challenges to 3D GNN models since they need to learn a mapping from ϕ(•) to the energy E, while the energy E is not a direct function of ϕ(•). Hence, we argue that directly employing the physics-principled potential functions instead of ϕ(•) as input features is more suitable for crystal energy prediction. Ê(M ) = a∈A b∈Nr(a) H (ϕ (||p a -p b || 2 )) , Second, different from Eqn. (3), Eqn. (4) does not capture the complete set of interatomic interactions because the summation set N r (a) of atoms b is constrained to be the atoms whose distances to the atom a are smaller than r. This can lead to a significant approximation error due to ignoring long-range interatomic potentials, i.e., interatomic potentials between distant atoms. Different from molecules with finite structures, long-range interatomic potentials cannot be ignored for crystals with infinite structures. By the first principles in physics, interatomic potentials decay algebraically when pairwise interatomic distances become larger. Hence, for a finite structure like molecules, the potentials from atoms that are far away from a given atom is limited and can be ignored. However, long-range interatomic potentials have a significant influence on a given atom in the infinite crystal structure. Taking Coulomb potentials as an example, assuming that we are given a 1D crystal structure where there is only one atom repeating itself with Euclidean distance of 1, each atom has only one unit of charge, and the total energy is simply the sum of all interatomic Coulomb potentials. As defined in Sec. 2.2, Coulomb potential energy V (a, b) between atoms a and b satisfies V (a, b) ∝ 1/d, where d is the distance between atoms a and b. Considering the Coulomb potentials between a given atom and all other atoms, the total potential V of them satisfies V ∝ ∞ n=1 1/n. If only the pairwise atoms within the distance threshold r are considered, an infinite error of energy calculation is introduced. Specifically, the smallest possible prediction error ∆V satisfies ∆V ∝ ∞ n=⌊r+1⌋ 1/n, which is infinite. In other words, ignoring interatomic Coulomb potentials between atoms with distances larger than r will cause a significant prediction error of the total energy. We can observe from this example that the failure to capture complete interatomic potentials due to the use of radius graphs is a key factor that prevents accurate energy prediction in existing GNN-based methods.

3.3. MESSAGE PASSING WITH COMPLETE INTERATOMIC POTENTIALS

It follows from the analysis in Sec. 3.2 that major limitations of existing deep learning methods for crystal representation learning lie in (1) not making predictions from physics-principled interatomic potentials, and (2) not considering complete interatomic interactions. To overcome these limitations, we propose to explicitly use complete interatomic potential summations in GNN models. Since our proposed method is tightly related to potentials, we name it PotNet. By reformulating Eqn. (3), our PotNet incorporates the crystal energy computation with complete interatomic potentials into the message passing scheme of GNN models. For any material structure M = (A, L), we can rewrite the definition of its complete structure A in Eqn. (1) as A = k∈Z 3 A k = k∈Z 3 b∈A {b k } = b∈A k∈Z 3 {b k } = b∈A A b , where A b = k∈Z 3 {b k } denotes the set of atoms containing the atom b from the unit cell A and all its periodically repeated duplicates in the complete crystal structure. With Eqn. ( 5), we can reformulate Eqn. ( 3) as E(M ) = a∈A   1 2 b∈A c̸ =a,c∈A b V (a, c) + G   b∈A c̸ =a,c∈A b V (a, c)     = a∈A 1 2 b∈A S(a, b) + G b∈A S(a, b) , where the infinite potential summation S(a, b) = c̸ =a,c∈A b V (a, c) denotes the sum of the interatomic potentials from the atom b together with its all periodic duplicates to the atom a. Eqn. ( 6) can be integrated into the message passing scheme of GNN models. Specifically, we can create a graph G for M = (A, L), where each atom in the unit cell A corresponds to a node in the graph. For any two nodes u, v in the graph, there is an edge connecting them, and every node u in the graph is also connected to itself by a self-loop edge. If we consider the infinite potential summation S(a, b) as the feature of the edge from node b to a, we can use the message passing based non-linear neural network model in GNN to fit the function 1 2 b∈A S(a, b) + G b∈A S(a, b) . Based on this design of directly using interatomic potentials as edge features, our PotNet employs a GNN model with multiple message passing layers on the graph G to predict the crystal energy of M . The computational process of the ℓ-th message passing layer for the node a can be described as h (ℓ) a = g φ h (ℓ-1) a , b∈A f θ h (ℓ-1) a , h (ℓ-1) b , b∈A S(a, b) , where h (ℓ) a denotes the embedding vector of node a generated from the ℓ-th message passing layer, h (0) a is initialized to the atom feature vector of the atom a, and g φ (•), f θ (•) are both neural network models with trainable parameters φ and θ, respectively. Here, the model f θ plays the role of capturing information from both atomic features and complete interatomic potentials. Detailed information about model architectures of f θ and g φ is provided in Appendix D.1. Note that our Pot-Net is actually a 3D GNN model even though 3D geometric information is not explicitly involved in Eqn. ( 7). This is because the edge feature S(a, b) is related to potential functions, and by Sec. 2.2 we know that they are computed from interatomic distances. In other words, PotNet can be considered to encode 3D geometric information with potential functions, though our direct motivation of using potential functions comes from the physical modeling of crystal energy. Intuitively, the message passing process in Eqn. ( 7) over the graph G can be considered as a general case of employing a radius graph where the distance threshold r goes to infinity, i.e., r → +∞. In this case, as shown in Fig. 1(a) , for any atom in the crystal, all the other atoms in the complete crystal structure have been included to interact with it. If we follow the radius graph construction process in the previous methods (Xie & Grossman, 2018; Chen et al., 2019; Louis et al., 2020; Choudhary & DeCost, 2021) , we obtain a graph G in which there exist an infinite number of edges between every pair of nodes. However, PotNet simplifies this complicated graph G to the graph G in which only one edge exists between every node pair. Specifically, PotNet directly models interatomic interactions as potentials and for any two nodes in G, PotNet aggregates all edges between them to a single edge by the use of infinite potential summation S(a, b) (see Fig. 1(b) ). In other words, PotNet provides an effective solution that enables GNN models to capture complete interatomic interactions through the use of infinite potential summations.

3.4. EFFICIENT COMPUTATION OF INFINITE POTENTIAL SUMMATION

Although we have effectively incorporated infinite potential summations into the message passing based GNN models, the computation of these infinite potential summations is not easy. Basically, there are two challenges to achieve accurate and efficient computation of the infinite potential summations. For accuracy, the computation algorithm needs to have provable error bounds. For efficiency, it needs to be fast for scalable GNN training and fast crystal property prediction. To tackle these two challenges, we derive a fast approximation algorithm for infinite potential summations based on the Ewald summation method (Ewald, 1921) . Specifically, we unify the summations of three infinite potentials between atom a and all duplicated positions of atom b into an integral form such that the Ewald summation method can be applied for efficient implementation in Pot-Net (Fig. 1(c) ). The key idea of the Ewald summation is that a slowly converging summation in the real space is guaranteed to be converted into a quickly converging summation in the Fourier space (Woodward, 2014) . Based on that, the Ewald summation method divides a summation into two parts. One part has a quicker converging rate in the real space than the original summation. The other "slower-to-converge" part is transformed into the Fourier space and becomes quickly convergent. In our method, the Ewald summation method is used with the infinite summations by dividing the integral into two parts, including one part that converges fast in the Fourier space and another part that converges fast in the real space, to obtain fast approximations with provable error bounds. To apply the fast approximation algorithm of infinite summations proposed by Ewald (1921) , a unified integral view of infinite potential summations is needed. Following notations in Sec.  | d = ∥p b + Lk -p a ∥, k ∈ Z 3 }. Similarly, London dispersion potentials from all atoms in A b to the atom a can be represented as {-ϵ ′ d 6 | d = ∥p b +Lk-p a ∥, k ∈ Z 3 }. It is worth noting that Coulomb potentials and London dispersion potentials can be represented in a unified view as { constant d p | d = ∥v ab + Lk∥, v ab = p b -p a , k ∈ Z 3 }, where p is a positive number. And we represent Pauli potentials from all positions of atom j to atom i as {e -αd | d = ∥p b + Lk -p a ∥, k ∈ Z 3 }. We provide detailed proofs in Appendix C.1 that the summations of these three potentials can be unified in an integral form as S(a, b) = D k∈Z 3 ∞ 0 t C-1 e -Aπ|Lk+v ab | 2 t-B t dt, where A, B, C, D are constants derived from the corresponding specific potential forms. We then apply the Ewald summation method (Ewald, 1921) to Eqn. ( 8) and split it into two parts as S(a, b) =D k∈Z 3 1 0 t C-1 e -Aπ|Lk+v ab | 2 t-B t dt + D k∈Z 3 ∞ 1 t C-1 e -Aπ|Lk+v ab | 2 t-B t dt =S Fourier (a, b) + S direct (a, b), where S direct denotes the part that converges fast in real space, and S Fourier denotes the other part that quickly converges in Fourier space when the total summation converges as shown by Ewald (1921) . Based on this, we further show that S direct and S Fourier can be expressed as summations of incomplete Bessel functions K ν (x, y) in Appendix C.2, and the approximation error can be analyzed and shown to be bounded. Note that for London dispersion potentials and Pauli potentials, the transformed summations of incomplete Bessel functions can be directly approximated. However, for Coulomb potentials, the direct summation diverges. Concretely, when p > 3 for potentials of form { constant d p | d = ∥v ab + Lk∥, v ab = p b -p a , k ∈ Z 3 }, the corresponding potential summation converges, and when p ≤ 3, the corresponding potential summation diverges. To tackle this problem, we follow previous mathematical derivations (Harris, 2008; Slevinsky & Safouhi, 2010; Jones, 2007) and use the analytically continued incomplete Bessel functions to approximate infinite summations of Coulomb potentials as shown in Appendix C.3. We then provide detailed mathematical proofs that the summations of incomplete Bessel functions are convergent and can be approximated with an error bounded by the Gaussian Lattice Sum in Appendix B.2. Detailed implementation of the proposed summation algorithm can be found in Appendix C.4. It is worth noting that PotNet is the first method to use the incomplete Bessel function to compute the potential summations and is able to compute the Pauli potential summation, while previous methods (Crandall, 1998; Lee & Cai, 2009; Nestler et al., 2015) cannot achieve this. Also, we are able to compute other interatomic potential summations including Lennard-Jones potential, Morse potential, and screened Coulomb potential by using our method as shown in Appendix C.5. 1 . To make the comparisons fair, we follow the settings of the previous state-of-the-art (SOTA) ALIGNN (Choudhary & DeCost, 2021) for tasks including formation energy and band gap prediction, and retrain all other baselines using the same dataset setting for these two tasks. For bulk and shear moduli, we follow the dataset setting of GATGNN (Louis et al., 2020) which has the best prediction performances for these two tasks and retrained all other baselines. Detailed configurations are also shown in Appendix D.2. We present our results in Table 1 , where PotNet consistently outperforms other SOTA methods on all four tasks by large margins. JARVIS Dataset. We then evaluate PotNet on JARVIS, a newly released benchmark dataset proposed by Choudhary et al. (2020) with 55722 crystals. We evaluate PotNet on five crystal property prediction tasks including formation energy, bandgap (OPT), bandgap (MBJ), total energy, and Ehull. We follow ALIGNN and use the same training, validation, and test sets for these tasks. Since there are missing results for baseline methods, we retrain corresponding baseline methods follow-ing the same settings with ALIGNN as discussed in Appendix D.2. As shown in Table 2 , PotNet achieves the best performances on all five tasks consistently by significant margins. We also analyze the time cost of infinite summations as shown in Table 4 . We calculate the infinite summations during preprocessing. Unlike previous methods including ALIGNN, we need to calculate the complete potential set besides constructing graphs. Therefore, we do require more computing time for preprocessing. However, for a single material, the preprocessing time of our method is at the level of milliseconds. Even if we additionally compute the infinite potential summations, the preprocessing time of our method and ALIGNN is still within the same order of magnitude. If considering both preprocessing time and model inference time for single material screening, we have a faster inference speed than ALIGNN as illustrated in the table. Overall, the computational cost of our method is reasonable.

4.3. ABLATION STUDIES

In this section, we demonstrate the importance of two core components of PotNet, including interaction modeling using potentials and infinite summation of potentials for crystal prediction. We conduct experiments on the JARVIS formation energy task, and use test MAE as the evaluation metric. 

5. CONCLUSION

We study the problem of how to capture infinite-range interatomic potentials in crystal property prediction directly. As a radical departure from prior methods that only consider nearby atoms, we develop a new GNN, PotNet, with the message passing scheme that takes efficient approximations to capture the complete set of potentials among all atoms. Experiments show that the use of complete potentials leads to consistent performance improvements. Altogether, our work provides a theoretically principled and practically effective framework for crystal modeling. Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120(14):145301, 2018. Hideki Yukawa. On the interaction of elementary particles. i. Proceedings of the Physico-Mathematical Society of Japan. 3rd Series, 17:48-57, 1935.

A GAUSSIAN LATTICE SUM

The Gaussian Lattice Sum (Bétermin et al., 2021) computes the summation of Gaussian functions centered at the points given by a shifted lattice, formally defined as G Ω (L, v, c) = k∈Z d e -c|Lk+v| 2 , ( ) where L ∈ R d×d is the lattice matrix, v ∈ R d is a vector inside a unit cell and c ∈ R + is a prefixed constant. One of the characteristics of the Gaussian Lattice Sum is that the term e -c|Lk+v| 2 rapidly decays as k becomes large, resulting fast convergence of G Ω (L, v, c). As shown by Deconinck et al. (2004) , for R ∈ R + , we have k∈Z d , √ c|Lk+v|≥R e -c|Lk+v| 2 ≤ d 2 2 ρ d Γ d 2 , R - ρ 2 2 , where Γ(z, x) = ∞ x t z-1 e -t dt is the incomplete Gamma function and ρ = min{ √ c|Lk| |k ∈ Z d , k ̸ = 0}. We can obtain the upper bound of the Gaussian Lattice Sum by setting R = 0 such that k∈Z d e -c|Lk+v| 2 ≤ d 2 2 ρ d Γ d 2 , ρ 2 2 .

B INCOMPLETE BESSEL FUNCTION

The incomplete Bessel function (Harris, 2008) is defined as K ν (x, y) = ∞ 1 t -ν-1 e -xt-y/t dt, where x, y ∈ R, x ≥ 0, y ≥ 0, and ν > 0. In existing studies (Harris, 2008; Slevinsky & Safouhi, 2010; Jones, 2007) , K ν (x, y) can be analytically continued to ν ∈ R. In this work, we follow these studies and also use the analytic continuation for our calculations, i.e., we also consider ν ∈ R.

B.1 FAST APPROXIMATION OF THE INCOMPLETE BESSEL FUNCTION

Computing the incomplete Bessel function is extremely challenging as there does not exist any explicit closed-form solution. In this work, we investigate a fast approximation of the incomplete Bessel function. Specifically, we adopt the algorithm in the work by Slevinsky & Safouhi (2010) , where the incomplete Bessel function K ν (x, y) is approximated by the G (m) n transformation with the linear time complexity of O(n), and n is the number of iterations. As shown by Levin & Sidi (1981) , the approximation G (m) n to ∞ 0 f (t)dt is given as the solution of d l dx l G (m) n - x 0 f (t)dt - m-1 k=0 x σ k f (k) (x) n-1 i=0 β i,k x i = 0, ( ) where (2010) proved that the incomplete Bessel function satisfies Eqn. ( 14) with m = 1, through which we can obtain β i,k and G (m) n are unknowns, d l dx l G (m) n = 0, ∀l > 0, σ k = min(s k , k + 1), and s k is the largest of s ∈ Z such that lim x→∞ x s f (k) (x) = 0 for k = 0, 1, • • • , m -1. Slevinsky & Safouhi G (1) n - x 0 f (t)dt = x σ0 f (x) n-1 i=0 β i,k x i . ( ) To eliminate all the unknowns β i,k , Slevinsky & Safouhi (2010) applied the (x 2 d dx ) operator n times such that (x 2 d dx ) n G (1) n - x 0 f (t)dt x σ0f (x) = 0. ( ) By doing this, we obtain G (1) n = (x 2 d dx ) n ( x 0 f (t)dt x σ 0 f (x) ) (x 2 d dx ) n ( 1 x σ 0 f (x) ) = N n (x) D n (x) , ( ) in which we can have N n (x) = (x 2 d dx )N n-1 (x) and D n (x) = (x 2 d dx )G n-1 (x) with N 0 (x) = x 0 f (t)dt x σ0 f (x) and D 0 (x) = 1 x σ0 f (x) . ( ) This leads to a recursive algorithm to approximate G (1) n . To compute the incomplete Bessel function K ν (x, y), Slevinsky & Safouhi (2010) investigated the following property K ν (x, y) + x ν x 0 t -ν-1 e -t-xy t dt = x ν ∞ 0 t -ν-1 e -t-xy t dt, in which the term ∞ 0 t -ν-1 e -t-xy t dt can be approximated by G (1) n . Therefore, to approximate K ν (x, y) we have G(1) n = x ν (G (1) n - x 0 t -ν-1 e -t-xy t dt) = x ν N n (x) -D n (x) x 0 t -ν-1 e -t-xy t dt D n (x) = x ν n r=1 n r D n-r (x)(x 2 d dx ) r-1 (x -ν+1 e -x-y ) D n (x) = Ñn (x) D n (x) . ( ) As a result, we obtain the approximation G(1) n to K ν (x, y) by recursively solving Ñn (x) and D n (x) (Gaudreau et al., 2012) . The detailed expressions of Ñn (x) and D n (x) are given in Slevinsky & Safouhi (2010) . In addition, we follow Nestler et al. (2015) to further optimize the approximation of the incomplete Bessel function when ν = 0 and x, y are both small, e.g. x 2 +y 2 < 1. In this case, the remaining part of the Taylor expansion of K 0 (x, y) is small and we can approximate K 0 (x, y) by the first m terms of Taylor series such that K 0 (x, y) ≈ m n=0 (-1) n n! x n y n Γ(-n, x). ( ) The detailed error bound by this expansion is shown in Nestler et al. (2015) .

B.2 CONVERGENCE OF INCOMPLETE BESSEL FUNCTION SUMMATION

We define the summation of incomplete Bessel functions on a lattice as k∈Z d K ν (α|Lk + v| 2 + γ, β), where ν, α, β, γ ∈ R are constants, α > 0, β ≥ 0, γ ≥ 0, v ∈ R d is a vector inside a unit cell and L ∈ R d×d is the lattice matrix. We aim to prove the summation of incomplete Bessel functions is convergent and can be approximated with an error bounded by the Gaussian Lattice Sum introduced in Appendix A. Proof. The incomplete Bessel function has an upper bound such that K ν (x, y) = ∞ 1 t -ν-1 e -xt-y/t dt ≤ ∞ 1 t -ν-1 e -xt dt = x -ν Γ(-ν, x) , where x > 0 and Γ is the incomplete Gamma function described in Appendix A. Based on this, we obtain K ν (α|Lk + v| 2 + γ, β) ≤ Γ(-ν, α|Lk + v| 2 + γ) (α|Lk + v| 2 + γ) ν , where α|Lk + v| 2 + γ > 0. As shown by Borwein & Chan (2009) , | Γ(z,x) x z | has an upper bound such that Γ(z, x) x z = e -x ∞ 0 e -xs (1 + s) z-1 ds ≤ e -x x-z+1 , z > 1 e -x x , z ≤ 1 , ( ) where z ∈ R, x ∈ R and x > 0. For a prefixed value R ∈ R, R 2 > -ν, R 2 > 1 and R 2 > γ, we have ϵ(R) = k∈Z d ,α|Lk+v| 2 +γ≥R 2 K ν (α|Lk + v| 2 + γ, β) ≤ k∈Z d ,α|Lk+v| 2 +γ≥R 2 |K ν (α|Lk + v| 2 + γ, β)| ≤ k∈Z d ,α|Lk+v| 2 +γ≥R 2 e -α|Lk+v| 2 -γ ≤ e -γ G Ω (L, v, α), where G Ω (L, v, α) is the Gaussian Lattice Sum as described in Appendix A. Therefore, the incomplete Bessel function summation can be divided into two parts such that k∈Z d K ν (α|Lk + v| 2 + γ, β) = k∈Z d ,α|Lk+v| 2 +γ≤R 2 K ν (α|Lk + v| 2 + γ, β) + ϵ(R), ( ) where the first part is a finite part inside an ellipsoid with a size of R 2 /α -γ/α, and the second part ϵ(R) is bounded by the Gaussian Lattice Sum G Ω (L, v, α) which is convergent. Thus, the incomplete Bessel function summation is convergent. Consequently, to approximate the incomplete Bessel function summation, we choose to evaluate the summation inside an ellipsoid with the size of R 2 /α -γ/α for a prefixed R ∈ R, such that R 2 > -ν, R 2 > 1 and R 2 > γ. Then the error ϵ(R) is bounded by Gaussian Lattice Sum G Ω (L, v, α). We can further bound the error by the inequality (11) of Gaussian Lattice Sum as introduced in Appendix A such that ϵ(R) ≤ k∈Z d , √ α|Lk+v|≥ √ R 2 -γ e -α|Lk+v| 2 ≤ d 2 ( 2 ρ ) d Γ( d 2 , ( R 2 -γ - ρ 2 ) 2 ), where ρ = min{ √ α|Lk| |k ∈ Z d , k ̸ = 0}. This completes the proof.

C FAST ALGORITHM OF POTENTIAL SUMMATION C.1 INTEGRAL TRANSFORMATION

We denote G(L, v) as the potential summation and U (L, v) as the potential function with a lattice matrix L ∈ R d×d and a vector v ∈ R d between two atoms inside a unit cell. For a potential summation S(a, b) of a crystal with lattice matrix L, we have S(a, b) = G(L, v ab ). Based on these notations, we prove that the summation of the three introduced potentials can be transformed into an integral form as G(L, v) = D k∈Z d ∞ 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt, where A, B, C, D are constants derived from the corresponding specific potential forms. Proof. We first prove that both the potential forms U (L, v) = 1/|Lk + v| 2p and U (L, v) = e -α|Lk+v| can be written in the following integral form U (L, v) = D ∞ 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt. 1). For the potentials in the form of U (L, v) = 1/|Lk + v| 2p , we apply the Mellin transform such that M {U }(L, v) = ∞ 0 t p-1 e -t|Ln+v| 2 dt = Γ(p) |Ln + v| 2p . Thus, we obtain U (L, v) = 1 |Lk + v| 2p = 1 Γ(p) ∞ 0 t p-1 e -t|Ln+v| 2 dt. Apparently, we can obtain A = 1/π, B = 0, C = p and D = 1/Γ(p) for the integral form of U (L, v) = 1/|Lk + v| 2p 2). For the potentials in the form of U (L, v) = e -α|Lk+v| , we consider the inverse Laplace transform on e -α √ s as shown by Bateman (1954) , such that L -1 {e -α √ s } = a 2 √ π t -3 2 e -α 2 4t . Therefore, we can apply the Laplace transform in Eqn. (34) to derive the integral form of e -α|Lk+v| : U (L, v) = e -α|Lk+v| = a 2 √ π ∞ 0 t -3 2 e -|Lk+v| 2 t-α 2 4t dt = α 2π ∞ 0 t -3 2 e -πt|Lk+v| 2 -α 2 4πt dt (t ← πt) . Apparently, we can obtain A = 1, B = α 2 4π , C = -1 2 and D = α 2π for the integral form of U (L, v) = e -α|Lk+v| . Finally, we conduct a summation for these two types of potentials U (L, v) in the space k ∈ Z d as G(L, v) = k∈Z d U (L, v). This completes the proof. In fact, the summation of U (L, v) = 1/|Lk + v| 2p is a special case of multidimensional zeta function (Crandall & Buhler, 1987; Terras, 1973; Crandall, 1998) , which is a generalization of Riemann zeta function. The multidimensional zeta function (Crandall & Buhler, 1987) is defined as Z L (s; u, v) = k∈Z d e 2πiu•Lk |Lk -v| s , where s ∈ C, L ∈ R d×d and u, v ∈ R d . It is not hard to show summation of U (L, v) = 1/|Lk + v| 2p can be expressed as Z L (2p; 0, -v), and therefore functional properties of the multidimensional zeta function also hold for summation of U (L, v) = 1/|Lk + v| 2p . For instance, Z L (s; u, v) has an analytic continuation to the entire complex plane, except for simple poles at s = 0 and s = d (Crandall & Buhler, 1987) . As a result, summation of U (L, v) = 1/|Lk + v| 2p are all convergent for p ∈ C except p = 0 and p = d/2 by analytic continuation. Moreover, the multidimensional zeta function can be written in the form of an integral summation by Eqn. 33 as Z L (s; u, v) = k∈Z d e 2πiu•Lk |Lk -v| s = k∈Z d 1 Γ(s/2) ∞ 0 t s/2-1 e 2πiu•Lk-t|Lk-v| 2 dt. Based on this, we can also split the integral and apply Poisson summation Eqn. 41 to obtain two summations of incomplete Bessel functions to evaluate this series. For more details on the multidimensional zeta function, we refer readers to Crandall & Buhler (1987) ; Terras (1973) ; Crandall (1998) ; Kirsten (1994) ; Selberg & Chowla (1967) .

C.2 CALCULATING INTEGRAL SUMMATION

As shown in Sec. 3.4, G(L, v) can be written as the summations in the Euclidean space directly and then Fourier space: G(L, v) = D k∈Z d ∞ 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt = D k∈Z d 1 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt + D k∈Z d ∞ 1 t C-1 e -Aπ|Lk+v| 2 t-B t dt = G Fourier (L, v) + G direct (L, v), where G Fourier (L, v) = D k∈Z d 1 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt denotes the summation in Fourier space, and G direct (L, v) = D k∈Z d ∞ 1 t C-1 e -Aπ|Lk+v| 2 t-B t dt denotes the summation in direct space. Apparently, G direct (L, v) is already the form of the incomplete Bessel function summation. We apply the analytic continuation to G direct (L, v) to expand the domain of C in G direct (L, v) as detailed in Appendix C.3 and we have G direct (L, v) = D k∈Z d K -C (Aπ|Lk + v| 2 , B) for all constant C ∈ R. Below, we prove that G Fourier (L, v) can be deduced into the incomplete Bessel function summation. Proof. Inspired by the Ewald summation (Ewald, 1921; Crandall, 1998) , we consider G Fourier (L, v) on the reciprocal lattice using the Poisson summation (Crandall, 1998) : k∈Z d e 2πiw•Lk-πt|Lk+v| 2 = t -d 2 e 2πiw•v det L k∈Z d e 2πiL ′ k•v-π t |L ′ k+w| 2 , ( ) where w ∈ R d is a vector and w = 0 in our case, and L ′ = L(L T L) -1 is the lattice matrix for the reciprocal lattice. As a result, We obtain G Fourier (L, v) = D k∈Z d 1 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt = D A C k∈Z d A 0 t C-1 e -π|Lk+v| 2 t-AB t dt(t ← t A ) = 1 det L D A C k∈Z d A 0 t C-d 2 -1 e 2πiL ′ n•v-π t |L ′ k| 2 -AB t dt (Eqn. (41)) = 1 det L D A d 2 k∈Z d 1 0 t C-d 2 -1 e 2πiL ′ k•v-π At |L ′ k| 2 -B t dt(t ← At) = 1 det L D A d 2 k∈Z d ∞ 1 t d 2 -C-1 e 2πiL ′ n•v-πt A |L ′ k| 2 -Bt dt(t ← 1 t ) = 1 det L D A d 2 k∈Z d e 2πiL ′ k•v K C-d 2 ( π |L ′ k| 2 A + B, 0). We also apply the analytic continuation to G Fourier (L, v) in the last step. Apparently, G Fourier (L, v) is deduced into the incomplete Bessel function summation, and this completes the proof. Therefore, both G Fourier (L, v) and G direct (L, v) can be expressed by the incomplete Bessel function summation forms: G(L, v) = G direct (L, v) + G Fourier (L, v) =D k∈Z d K -C (Aπ|Lk + v| 2 , B) + 1 det L D A d 2 k∈Z d e 2πiL ′ k•v K C-d 2 ( π |L ′ k| 2 A + B, 0). As shown in Appendix B.2, the incomplete Bessel function summation k∈Z d K ν (α|Lk + v| 2 + γ, β) is convergent and can be approximated. Therefore, G(L, v) can also be approximated.

C.3 ANALYTIC CONTINUATION OF POTENTIAL SUMMATIONS

To represent the series that is not convergent, including the inverse summation k∈Z d 1 |Lk+v| as shown in Sec. 3.4, we need to investigate the analytic continuation of potential summation. As shown in Appendix C.2, the potential summation can be written as G(L, v) = D k∈Z d ∞ 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt = D k∈Z d 1 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt + D k∈Z d ∞ 1 t C-1 e -Aπ|Lk+v| 2 t-B t dt. These two parts can be deduced into two incomplete Bessel function summations by analytic continuation. Analytic continuation is a technique to extend the domain P of a given analytic function f (x). Consequently, we denote f (x) as an analytic continuation of f (x) to Q, where we denote the domain Q containing P , and denote a function f (x) that is analytic on Q, and f (x) = f (x) holds for all x in P . As shown by Kung & Yang (2003) , the analytic continuation is unique and satisfies the permanence of functional relationships, i.e., the equations holding for f (x) will also hold for f (x). In our case, we can expand the domain of constant C in G(L, v) to C ∈ R such that G(L, v) is well-defined for any C ∈ R. For example, for summation of the potentials in the form of k∈Z d 1/|Lk + v| 2p , we have C = p as shown in Appendix C.1. Analytic continuation enables us to compute the summation when p = 0.5, which is initially divergent. Formally, we assume that C is originally defined in the domain P ⊂ R. As derived in Appendix C.1, for C ∈ P , we have G(L, v) = D k∈Z d 1 0 t C-1 e -Aπ|Lk+v| 2 t-B t dt + D k∈Z d ∞ 1 t C-1 e -Aπ|Lk+v| 2 t-B t dt =D k∈Z d K -C (Aπ|Lk + v| 2 , B) + 1 det L D A d 2 k∈Z d e 2πiL ′ k•v K C-d 2 ( π |L ′ k| 2 A + B, 0). Furthermore, we have ν = -C and ν = C -d 2 in the incomplete Bessel function, which is analytically continued to ν ∈ R (Jones, 2007) . That is, P is contained by ν's domain R. Therefore, the incomplete Bessel function summations are the analytic continuation of the corresponding potential summations. Our approach is ultimately to capture the total contribution of the potential in the crystal system. Based on this, to explain the advantage of analytic continuation, we consider the total infinite potential summation inside a unit cell such that S =  This implies that we can use analytically continued summations to approximate the total contribution of potentials where individual potential summations are initially divergent. In addition, for a nonconvergent summation S(a, b), analytic continuation will result in an unusual value such as negative energy instead of infinity. This is useful in practice since the infinity will cause a numerical explosion in training. A famous example of using analytic continuation for crystal energy prediction is the Madelung constant of N aCl, which is derived by the summation of Coulomb potentials among N a and Cl ions and calculated by analytic continuation of an absolutely convergent series (Borwein et al., 1985) . We further show energy calculation of N aCl by analytic continuation in Appendix E.

C.4 IMPLEMENTATION AND NUMERICAL EXAMPLES OF APPROXIMATION

Here we describe the implementation of our algorithm as in Eqn. ( 43). Considering the fact that solving the inverse of the incomplete Gamma function is complicated, instead, we provide a proper value R and then calculate its corresponding error bound ϵ based on Eqn. ( 29). Given a lattice matrix L ∈ R d×d , a vector v ∈ R d inside a unit cell, and the constants A, B, C, D derived from specific potential functions as described in Appendix C.1, we aim to evaluate these two parts D k∈Z d K -C (Aπ|Lk + v| 2 , B) and 1 det L D A d 2 k∈Z d e 2πiL ′ k•v K C-d 2 ( π|L ′ k| 2 A + B, 0). To evaluate G direct (L, v) = D k∈Z d K -C (Aπ|Lk + v| 2 , B ), we derive the following steps. Step 1: Determine the value R such that R 2 > C, R 2 > 1, and calculate the error bound ϵ = d 2 ( 2 ρ ) d Γ( d 2 , (R -ρ 2 ) 2 ), where ρ = min{ √ Aπ|Lk| |k ∈ Z d , k ̸ = 0}. Step 2: Select points inside an ellipsoid such that P = {k| |Lk + v| ≤ R/ √ Aπ}. Step Step 1: Determine the value R such that R 2 > d 2 -C, R 2 > 1, R 2 > B, and calculate the error bound ϵ > d 2 ( 2 ρ ) d Γ( d 2 , ( √ R 2 -B -ρ 2 ) 2 ), where ρ = min{ π A |L ′ k| |k ∈ Z d , k ̸ = 0}. Step 2: Select points inside an ellipsoid such that P = {k| |Lk + v| ≤ A(R 2 -B)/π}. Step 3: Evaluate the incomplete Bessel function summation G Fourier (L, v) by calculating every term 1 det L D A d 2 e 2πiL ′ k•v K C-d 2 ( π|L ′ k| 2 A + B, 0) based on Appendix B.1. Our implementation is based on Cython and GNU Scientific Library (Galassi et al., 2002) , in which the native incomplete Gamma function and Bessel function are used to implement the incomplete Bessel function. We conduct numerical experiments on Intel Xeon Gold 6258R CPU. We show the evaluation examples in Table 6 with the corresponding error bound and evaluation time. The running time is at the scale of milliseconds.

C.5 POTENTIAL SUMMATION EXTENSIONS

To highlight the generality of our potential summation method, in this section, we introduce additional potentials that can be converted to our general integral form in Eqn. 31, including Lennard-Jones potential, Morse potential, and screened Coulomb potential. These potentials are only used for some specific types of materials, such as gas and fluid materials. (Lennard-Jones & Dent, 1928 ) is an intermolecular pair potential that is usually used for gas or organic materials. The commonly used expression for the Lennard-Jones 

Lennard-Jones Potential

U LJ (L, v) = 4ϵ σ |Lk + v| 12 - σ |Lk + v| 6 , where ϵ and σ are hyperparameters. And the summation of U LJ (L, v) can be converted to two potential summations of type 1/|Lk + v| 2p with p = 3 and p = 6, such that  G LJ (L, v) = k∈Z d U LJ (L, v) = 4ϵ   σ 12   k∈Z d 1 |Lk + v| 12   -σ 6   k∈Z d 1 |Lk + v| 6     , U M orse (L, v) = D e e -2a(|Lk+v|-re) -2e -a(|Lk+v|-re) , where D e and r e are hyperparameters. Similarly, the summation of U M orse (L, v) can be converted to two potential summations of type e -α|Lk+v| with α = a and α = 2a, such that Consider the inverse Laplace transform on e -α √ s / √ s, by using Bromwich contour with branch points, we obtain G M orse (L, v) = k∈Z d U M orse (L, v) = D e   e 2are k∈Z d e -2a|Lk+v| -2e are k∈Z d e -a|Lk+v|   , L -1 {e -α √ s / √ s} = 1 √ πt e -α 2 4t . Therefore, we can apply the Laplace transform in Eqn. 53 such that U screened (L, v) = e -α|Lk+v| |Lk + v| = 1 √ π ∞ 0 t -1 2 e -|Lk+v| 2 t-α 2 4t dt. Then we can obtain A = 1/π, B = α 2 , C = 1 2 and D = 1/ √ π in Eqn. 30 to fit screened Coulomb potential into our potential summation method.

D MODEL IMPLEMENTATION D.1 POTNET IMPLEMENTATION

The employed network architecture is shown in Fig. 2 . Since our major contribution is to consider infinite interatomic potentials, we simply design our network architecture following the commonly used settings. Specifically, existing methods for 3D graphs (Xie & Grossman, 2018; Schütt et al., 2017; Klicpera et al., 2020b; a; Liu et al., 2022b; Gasteiger et al., 2021; Schütt et al., 2021 ) share a similar architecture, which usually contains an input block, an interaction block, and an output block. Without loss of generality, we take the updating process for node i as an example to illustrate the network. The Inputs contain atomic features and potentials. z i is the 92-dimensional atomic feature for any atom i following CGCNN (Xie & Grossman, 2018) . Below we denote d as the interatomic distances and show our potential features. For our implementation of infinite potential summations S(a, b) in Eqn. ( 7 The input block contains a Linear layer and an Embedding layer. For each node i, the Linear layer is employed to generate a 256-dimensional vector as the input node features to the first interaction layer. For each edge, the Embedding layer is employed to map the Coulomb potentials and summations of Coulomb potentials, London dispersion potentials, and Pauli repulsion potentials to 256-dimensional embeddings by using 256 RBF kernels with centers from -4.0 to 4.0. The Interaction block contains several Interaction layers. Each layer updates the feature vector of node i based on features of the neighboring nodes and potential embeddings of the connected edges. Particularly, for any of the neighboring node j of node i, the corresponding potential embeddings e c ij , s c ij , s l ij , and s p ij for edge ij are all produced by the Embedding layer. The Readout block contains an AvgPooling layer and another Linear layer. We first use the Avg-Pooling layer to aggregate features from all nodes in a graph and then use the Linear layer that maps the hidden dimension of 256 to the final output which is a scalar.

D.2 CONFIGURATIONS OF RETRAINED MODELS

In this section, we show detailed configurations of retrained models of CGCNN (Xie & Grossman, 2018), SchNet (Schütt et al., 2017) , MEGNET (Chen et al., 2019) , GATGNN (Louis et al., 2020) and ALIGNN (Choudhary & DeCost, 2021) . If not specified, models are trained with a radius cutoff of 8.0 using the Adam (Kingma & Ba, 2014) optimizer with weight decay (Loshchilov & Hutter, 2017) and one cycle learning rate scheduler (Smith & Topin, 2019) . CGCNN (Xie & Grossman, 2018) . We directly use the publicly available code from Xie & Grossman (2018) to build and train the CGCNN model. We build the model with 128 hidden dimensions and three message-passing layers and train the model for 1000 epochs with a batch size of 256 and an initial learning rate of 1e-2. SchNet (Schütt et al., 2017) . We directly adopt the SchNet model from PyTorch Geometric (Fey & Lenssen, 2019) with 128 hidden dimensions and six message-passing layers following the original paper. We train SchNet with an initial learning rate of 5e-4 and batch size of 64 for 500 epochs. GATGNN (Louis et al., 2020) . We directly use the publicly available code from Louis et al. (2020) to build and train the GATGNN model. We build the model with 128 hidden dimensions and three message-passing layers following the original default settings. We train the model with an initial learning rate of 5e-3 and batch size of 64 for 500 epochs. MEGNET (Chen et al., 2019) . We directly adopt the MEGNET model from the publicly available code from Chen et al. (2019) . Following the original paper, we use three message-passing layers with the same feature dimensions as mentioned in the original paper and use Set2Set readout function. We train MEGNET with an initial learning rate of 1e-3 and batch size of 128 for 1000 epochs following the configuration settings mentioned in the original paper. ALIGNN (Choudhary & DeCost, 2021) . We directly use the publicly available code from Choudhary & DeCost (2021) . We use the official best model configurations of ALIGNN to train ALIGNN models with an initial learning rate of 1e-3 and batch size of 64 for 500 epochs.

E LINEAR ENERGY MODELING USING INFINITE POTENTIAL SUMMATION

In this section, we provide examples of calculations where the true energies of these materials can be directly approximated by linear combinations of our infinite potential sums. These are special cases of Eqn. 6 with a linear embedded function G. Specifically, we evaluate the total energy per atom of N aCl and two other materials (M gO, LiF ) whose crystal structures are similar to N aCl.



where N r (a) = {b : b ̸ = a, b ∈ A, ||p a -p b || 2 < r}, ϕ(•) denotes the functional expansions, and H(•) is a non-linear function based on 3D GNN models.

Figure 1: Schematic illustrations of how complete interatomic itneractions are captured in PotNet. Note that PotNet models 3D crystals while we have 2D illustration for simplicity. (a) An example crystal in which each unit cell contains two atoms u and v. In PotNet, the potentials between all pairs of atoms are captured. For simplicity, we only show the potentials from all v atoms to a u atom. (b) The complete set of potentials in (a) can be grouped into four categories, including u → v, v → u, u → u, and v → v. (c) We propose to compute an approximate summation for each category of potentials.

2 and 3.3, we denote the positions of the atoms in the set A b as P b = {p k b | p k b = p b + Lk, k ∈ Z 3 }. The Euclidean distances between the atom a and all atoms in A b can be represented as {d | d = ∥p b + Lkp a ∥, k ∈ Z 3 }. As described in Sec. 3.3, we simplify the Coulomb potential function as V Coulomb (a, b) = -ϵ 1 /d(a, b), where ϵ 1 is a hyperparameter scaling the potential. Based

b) = a∈A b̸ =a,b∈ A V (a, b). (46)where A and A are described in Sec. 2.1. If we use an analytically continued function to approximate convergent summation S(a, b), we can directly approximate S by S = a∈A b∈A S(a, b).(47)As for a non-convergent summation S(a, b), due to the permanence of functional relationships of analytic continuation, we still obtain

Evaluate the incomplete Bessel function summation G direct (L, v) by calculating the every term DK -C (Aπ|Lk + v| 2 , B) based on Appendix B.1.To evaluate G Fourier(L, v)

50) where we show the calculation of the potential summation of type 1/|Lk + v| 2p in Appendix C.1. Morse Potential (Morse, 1929) is an interatomic potential of a diatomic molecule and can be used for simple molecular materials. The Morse potential has a mathematical form of

52) where we show the calculation of the potential summation of type e -α|Lk+v| in Appendix C.1. Screened Coulomb Potential or Yukawa potential (Yukawa, 1935) represents the Coulomb interactions with damping of electric fields. It is an important potential reflecting the behaviors of charge-carrying fluids or particles in semiconductors. The screened Coulomb potential has an analytic form of V (a, b) = zaz b e 2 d(a,b) exp(-αd(a, b)) where d(a, b) is the distance between atom a and b, z a , z b are charges of atom a and b, e is elementary charge constant and α is a scaling hyperparameter. Since z a , z b , e are constants and can be extracted outside the summation, we can obtain the simplified screened Coulomb potential U screened (L, v) = e -α|Lk+v| |Lk + v| . (53)

Figure 2: The developed network architecture for PotNet.

), we add up the infinite summations of three potentials, including the Coulomb potentials, London dispersion potentials, and Pauli repulsion potentials described in Sec. 2.2. We simplify the mathematical form of Coulomb potentialsV Coulomb (a, b) = -zaz b e 2 4πϵ0d(a,b) to V Coulomb (a, b) = -ϵ 1 /d(a, b)where ϵ 1 is a hyperparameter, because e, π, ϵ 0 are all known constants, and z a , z b can be learned from atomic features. As explained in Sec. 4.1, e c = -ϵ ′ 1 /d denotes the Coulomb potentials for the local crystal graph. s c =d ϵ 1 /d, s l =d ϵ/d 6 , s p = d e -αd denote the summations of Coulomb potentials, London dispersion potentials, and Pauli potentials for the infinite crystal graph, and we set ϵ ′ 1 = 4.0 for e c , ϵ 1 = -1.0 for s c , ϵ = 4.0 for s l and α = 3.0 for s p , respectively. For simplicity, we apply them to the RBF embeddings with the same cutoff. Since e c and s l are negative values and have values exceeding the RBF cutoff, we use an exponential function to make their absolute values smaller.

Comparison between PotNet and other baselines in terms of test MAE on the Materials Project dataset. To make the comparison clear and fair, we retrain baseline methods using the same dataset settings. We also show results from original papers in parentheses and mark missing results as -. The best results are shown in bold and the second best results are shown with underlines. we can represent Coulomb potentials from all atoms in A b to the atom a as {-ϵ d

Comparison between PotNet and other baselines in terms of test MAE on JARVIS dataset. The best results are shown in bold and the second best results are shown with underlines.We conduct experiments on two material benchmark datasets, including The Materials Project and JARVIS. Baseline methods include CFID(Choudhary et al., 2018), SchNet(Schütt et al., 2017),CGCNN (Xie & Grossman, 2018), MEGNET(Chen et al., 2019), GATGNN(Louis et al., 2020), and ALIGNN(Choudhary & DeCost, 2021). Unless otherwise specified, we report the results reported by referred papers or provided by original authors. All PotNet models are trained using the Adam (Kingma & Ba, 2014) optimizer with weight decay(Loshchilov & Hutter, 2017) and one cycle learning rate scheduler (Smith & Topin, 2019) with a learning rate of 0.001, training epoch of 500, and batch size of 64. We use Pytorch to implement our models. For all tasks on two benchmark datasets, we use one NVIDIA RTX A6000 48G GPU for computing. Other detailed configurations of PotNet for different tasks are provided in Appendix D.1.To capture the global infinite-range interactions without losing details of local interactions, PotNet uses both local and infinite crystal graphs. Specifically, for the local crystal graph, we use the radius crystal graph proposed by CGCNN but replace Euclidean distances with interatomic potentials for edge features. To be concrete, because the influences of London dispersion potentials and Pauli potentials are limited and can be ignored when only considering nearby regions, we only use the Coulomb potentials in the radius crystal graph. The infinite crystal graph is constructed as described in Sec. 3.3, where Coulomb potentials, London dispersion potentials and Pauli potentials are used. We first evaluate PotNet on The Materials Project-2018.6.1, which is a widely used large-scale material benchmark with 69239 crystals. We follow previous works(Xie &  Grossman, 2018;Chen et al., 2019;Choudhary & DeCost, 2021;Louis et al., 2020) and use four crystal properties including formation energy, band gap, bulk moduli, and shear moduli. We notice that previous works(Xie & Grossman, 2018;Chen et al., 2019;Choudhary & DeCost, 2021;Louis et al., 2020;Schütt et al., 2017) compare with each other using different versions of splitting training, evaluation, and testing datasets with different random seeds. For instance, the original CGCNN paper only uses 28046 training samples for formation energy prediction, resulting in the original result of 0.039 as shown in Table

Model complexity and runtime compared with ALIGNN on JARVIS formation energy. Efficiency of PotNet. Beyond the superior modeling capacity for crystals, our PotNet is faster and more efficient than ALIGNN. To demonstrate the efficiency of PotNet, we compare PotNet with ALIGNN in terms of training time per epoch, total training time, and inference time for the task of JARVIS formation energy prediction. From Table 3, PotNet is four times faster in terms of total training time and inference time compared with ALIGNN.

Preprocessing time compared with ALIGNN on the JARVIS dataset. The first and second columns show the preprocessing time on the whole JARVIS dataset with 55722 crystals. The third column denotes the mean inference time considering both preprocessing and model time.

Ablation studies for the effects of adding Coulomb potentials and infinite summation.Infinite Summation of Potentials. The importance of infinite summation of potentials is demonstrated by comparing the previous base models with 'Base + Potential + Infinite', denoting the full PotNet model with infinite summation in infinite crystal graph. It can be seen from Table5that by using infinite crystal graphs introduced in Sec. 3.3, the global repeating patterns of crystal structures are captured, resulting in a performance gain from 0.0318 to 0.0308 for formation energy prediction.

Numerical examples of our algorithm. Here, ζ(x) =

annex

Since they are pure ionic crystals and Coulomb interactions dominate the system, we first consider the total electrostatic energy and Coulomb potential summations. For a neutral system, the electrostatic energy is convergent because the Coulomb potentials cancel each other. To calculate the energy of a crystal, we cannot derive it directly due to its complexity. Instead, we divide it into many individual infinite potential summations. Although these individual summations might be originally divergent, they will all be well-defined under analytic continuation and the total energy will also be convergent.As shown in Appendix C.3, analytic continuation allows us to compute potential summations that are initially divergent, and also allows us to compute the convergent summations by analytically continued potential summations. For example, we can calculate n∈Z,n̸ =0 (-1) n n by analytic continuation such that Inspired by analytic continuation, we can approximate the total energy per atom of crystals of N aCl by analytically continued infinite potential summations. Due to the symmetry of the N aCl cell, we only involve atoms a, b, c, d in our calculation. Specifically, as shown in Fig. 3 , atom a represents the body center N a + , atom b represents the face center Cl -, atom c represents the edge center N a + and atom d represents the corner Cl -. Based on this, assuming that the side length of the unit cell is 1, the total energy of N a + is approximated by the total Coulomb interactions with atom a such thatwhere N A is Avogadro constant, d(a, u) is the distance between atom a and u, r 0 is the minimum distance between N a and Cl, d = d/r 0 is the normalized distance, A a denotes the set of atoms containing atom a and all its repetitions, S is the infinite potential summation, approximated by our infinite potential summation method in Sec. 3.4, z N a , z Cl are charges of N a + and Cl -, e is the elementary charge constant, ϵ 0 is the permittivity constant of free space, and the coefficients 1, 1 2 6, 1 4 12, 1 8 8 denote the fraction of atoms in a unit cell. We finally obtain a constant -1.7475646 from our infinite potential summations. In fact, this constant is exactly the famous Madelung constant M (Borwein et al., 1985) . Also, by considering an additional repulsion term, we can derive the calculation result in Eqn. 57 to the famous Born-Landé equation (Born, 1921 ) where z + , z -are the charges of cation and anion, M is the Madelung constant computed from infinite Coulomb potential summations, and n is the Born exponent measuring the effect of repulsion. By choosing n = 9, we can derive an approximation for the total energy of N aCl of 7.84 eV. Similarly, we can apply Eqn. 58 to M gO and LiF . We further show these approximations in Table . 7 and it can be noticed that these approximations already give rough results compared to the ground truth energy. This implies that our features can serve as a good starting point for machine learning models to learn the ground truth energy. Apparently, previous methods cannot achieve this due to the lack of such informative features. It is worth noting that the Madelung constant is typically unknown because those coefficients for the infinite potential summations depend on the charge distribution in the system, which we do not know at the beginning. Also, these crystals are special cases of Eqn. 6 with a linear embedded function G, while G is typically a nonlinear function (Daw & Baskes, 1984) . Therefore, the network serves the purpose of learning those coefficients to learn the Madelung constant and providing nonlinearity.

