A SELF-ATTENTION ANSATZ FOR AB-INITIO QUANTUM CHEMISTRY

Abstract

We present a novel neural network architecture using self-attention, the Wavefunction Transformer (Psiformer), which can be used as an approximation (or Ansatz) for solving the many-electron Schrödinger equation, the fundamental equation for quantum chemistry and material science. This equation can be solved from first principles, requiring no external training data. In recent years, deep neural networks like the FermiNet and PauliNet have been used to significantly improve the accuracy of these first-principle calculations, but they lack an attention-like mechanism for gating interactions between electrons. Here we show that the Psiformer can be used as a drop-in replacement for these other neural networks, often dramatically improving the accuracy of the calculations. On larger molecules especially, the ground state energy can be improved by dozens of kcal/mol, a qualitative leap over previous methods. This demonstrates that self-attention networks can learn complex quantum mechanical correlations between electrons, and are a promising route to reaching unprecedented accuracy in chemical calculations on larger systems.

1. INTRODUCTION

The laws of quantum mechanics describe the nature of matter at the microscopic level, and underpin the study of chemistry, condensed matter physics and material science. Although these laws have been known for nearly a century (Schrödinger, 1926) , the fundamental equations are too difficult to solve analytically for all but the simplest systems. In recent years, tools from deep learning have been used to great effect to improve the quality of computational quantum physics (Carleo & Troyer, 2017) . For the study of chemistry in particular, it is the quantum behavior of electrons that matters, which imposes certain constraints on the possible solutions. The use of deep neural networks for successfully computing the quantum behavior of molecules was introduced almost simultaneously by several groups (Pfau et al., 2020; Hermann et al., 2020; Choo et al., 2020) , and has since led to a variety of extensions and improvements (Hermann et al., 2022) . However, follow-up work has mostly focused on applications and iterative improvements to the neural network architectures introduced in the first set of papers. At the same time, neural networks using self-attention layers, like the Transformer (Vaswani et al., 2017) , have had a profound impact on much of machine learning. They have led to breakthroughs in natural language processing (Devlin et al., 2018) , language modeling (Brown et al., 2020) , image recognition (Dosovitskiy et al., 2020) , and protein folding (Jumper et al., 2021) . The basic selfattention layer is also permutation equivariant, a useful property for applications to chemistry, where physical quantities should be invariant to the ordering of atoms and electrons (Fuchs et al., 2020) . Despite the manifest successes in other fields, no one has yet investigated whether self-attention neural networks are appropriate for approximating solutions in computational quantum mechanics. In this work, we introduce a new self-attention neural network, the Wavefunction Transformer (Psiformer), which can be used as an approximate numerical solution (or Ansatz) for the fundamental equations of the quantum mechanics of electrons. We test the Psiformer on a wide variety of benchmark systems for quantum chemistry and find that it is significantly more accurate than existing neural network Ansatzes of roughly the same size. The increase in accuracy is more pronounced the larger the system is -as much as 75 times the normal standard for "chemical accuracy" -suggesting that the Psiformer is a particularly attractive approach for scaling neural network Ansatzes to larger, more challenging systems. In what follows, we will provide an overview of the variational quantum Monte Carlo approach to computational quantum mechanics (Sec. 2), introduce the Psiformer architecture in detail (Sec. 3), present results on a wide variety of atomic and molecular benchmarks (Sec. 4) and wrap up with a discussion of future directions (Sec. 5).

2. BACKGROUND 2.1 QUANTUM MECHANICS AND CHEMISTRY

The fundamental object of study in quantum mechanics is the wavefunction, which represents the state of all possible classical configurations of a system. If the wavefunction is known, then all other properties of a system can be calculated from it. While there are multiple ways of representing a wavefunction, we focus on the first quantization approach, where the wavefunction is a map from possible particle states to a complex amplitude. The state of a single electron x ∈ R 3 × {↑, ↓} can be represented by its position r ∈ R 3 and spin σ ∈ {↑, ↓}. Then the wavefunction for an N-electron system is a function Ψ : R 3 × {↑, ↓} N → C. Let x ≜ x 1 , . . . , x N denote the set of all electron states. The wavefunction is constrained to have unit ℓ 2 norm dx|Ψ| 2 (x) = 1, and |Ψ| 2 can be interpreted as the probability of observing a quantum system in a given state when measured. Not all functions are valid wavefunctions -particles must be indistinguishable, meaning |Ψ| 2 should be invariant to changes in ordering. Additionally, the Pauli exclusion principle states that the probability of observing any two electrons in the same state must be zero. This is enforced by requiring the wavefunction for electronic systems to be antisymmetric. In this paper, we will focus on how to learn an unnormalized approximation to Ψ by representing it with a neural network. The physical behavior of non-relativistic quantum systems is described by the Schrödinger equation. In its time-independent form, it is an eigenfunction equation ĤΨ(x) = EΨ(x) where Ĥ is a Hermitian linear operator called the Hamiltonian and the scalar eigenvalue E corresponds to the energy of that particular solution. In quantum chemistry, atomic units (a.u.) are typically used, in which the unit of distance is the Bohr radius (a 0 ), and the unit of energy is Hartree (Ha). The physical details of a system are defined through the choice of Hamiltonian. For chemical systems, the only details which need to be specified are the locations and charges of the atomic nuclei. In quantum chemistry it is standard to approximate the nuclei as classical particles with fixed positions, known as the Born-Oppenheimer approximation, in which case the Hamiltonian becomes: Ĥ = - 1 2 i ∇ 2 i + i>j 1 |r i -r j | - iI Z I |r i -R I | + I>J Z I Z J |R I -R J | where ∇ 2 i = 3 j=1 ∂ 2 ∂r 2 ij is the Laplacian w.r.t. the ith particle and Z I and R I , I ∈ {1, ..., N nuc } are the charges and coordinates of the nuclei. Two simplifications follow from this. First, since Ĥ is a Hermitian operator, solutions Ψ must be real-valued. Thus we can restrict our attention to real-valued wavefunctions. Second, since the spins σ i do not appear anywhere in Eq. 1, we can fix a certain number of electrons to be spin up and the remainder to be spin down before beginning any calculation (Foulkes et al., 2001) . The appropriate number for the lowest energy state can usually be guessed by heuristics such as Hund's rules. While the time-independent Schrödinger equation defines the possible solutions of constant energy, at the energy scales relevant for most chemistry the electrons are almost always found near the lowest energy state, known as the ground state. Solutions with higher energy, known as excited states, are relevant to photochemistry, but in this paper we will restrict our attention to ground states. For a typical small molecule, the total energy of a system is on the order of hundreds to thousands of Hartrees. However the relevant energy scale for chemical bonds is typically much smaller -on the order of 1 kilocalorie per mole (kcal/mol), or ∼1.6 mHa -less than one part in one hundred thousand of the total energy. Calculations within 1 kcal/mol of the ground truth are generally considered "chemically accurate". Mean-field methods are typically within about 0.5% of the true total energy. The difference between the mean-field energy and true energy is known as the correlation energy, and chemical accuracy is usually less than 1% of this correlation energy. For example, the binding energy of the benzene dimer (investigated in Section 4.5) is only ∼4 mHa.

