EFFICIENT APPROXIMATIONS OF COMPLETE INTER-ATOMIC POTENTIALS FOR CRYSTAL PROPERTY PRE-DICTION

Abstract

We study the problem of crystal material property prediction. A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. How to accurately represent such repetitive structures in machine learning models remains unresolved. Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. First, we propose to model physics-principled interatomic potentials directly instead of only using distances as in existing methods. These potentials include the Coulomb potential, London dispersion potential, and Pauli repulsion potential. Second, we propose to model the complete set of potentials among all atoms, instead of only between nearby atoms as in prior methods. This is enabled by our approximations of infinite potential summations with provable error bounds. We further develop efficient algorithms to compute the approximations. Finally, we propose to incorporate our computations of complete interatomic potentials into message passing neural networks for representation learning. We perform experiments on the JARVIS and Materials Project benchmarks for evaluation. Results show that the use of complete interatomic potentials leads to consistent performance improvements with reasonable computational costs.

1. INTRODUCTION

The past decade has witnessed a surge of interests and rapid developments in machine learning for molecular analysis (Duvenaud et al., 2015) . These initial studies mainly focus on the prediction and generation problems of small molecules. To enable computational analyses, molecules need to be featurized in an appropriate mathematical representation form. Recently, with the advances of graph neural networks (GNNs) (Gilmer et al., 2017; Battaglia et al., 2018) , molecules are more commonly represented as graphs in which each node corresponds to an atom, and each edge corresponds to a chemical bond (Stokes et al., 2020; Wang et al., 2022c) . A variety of molecular graph prediction (Stokes et al., 2020; Wang et al., 2022c) and generation (Shi et al., 2019; Jin et al., 2018; Luo et al., 2021) methods have been developed based on 2D molecular graph representations. A key limitation of the 2D graph representations is that the 3D geometries of molecules are not captured, but such information may be critical in many molecular property prediction problems (Hu et al., 2021) . To enable the encoding of 3D molecular geometries in GNNs, a series of 3D GNN methods have been developed for prediction (Schütt et al., 2017; Gasteiger et al., 2019; Liu et al., 2022b; Wang et al., 2022b) and generation (Liu et al., 2022a; Luo & Ji, 2022; Hoogeboom et al., 2022) problems. In these 3D graph representations, each node is associated with the corresponding atom's coordinate in 3D space. Geometric information, such as distances between nodes and angles between edges, is used during message passing in GNNs. Recently, these methods have been extended to learn representations for proteins (Jing et al., 2020; Wang et al., 2022a) . Inspired by the success of GNNs on small molecules, Xie & Grossman (2018) developed the crystal graph convolutional neural network (CGCNN) for crystal material property prediction. Different from small molecules and proteins, crystal materials are typically modeled by a minimal unit cell (similar to a small molecule) that is repeated in 3D space with certain directions and step sizes. In theory, the unit cell is repeated infinitely in 3D space, but any real-world material has finite size. 1

