D4FT: A DEEP LEARNING APPROACH TO KOHN-SHAM DENSITY FUNCTIONAL THEORY

Abstract

Kohn-Sham Density Functional Theory (KS-DFT) has been traditionally solved by the Self-Consistent Field (SCF) method. Behind the SCF loop is the physics intuition of solving a system of non-interactive single-electron wave functions under an effective potential. In this work, we propose a deep learning approach to KS-DFT. First, in contrast to the conventional SCF loop, we propose directly minimizing the total energy by reparameterizing the orthogonal constraint as a feed-forward computation. We prove that such an approach has the same expressivity as the SCF method yet reduces the computational complexity from O(N 4 ) to O(N 3 ). Second, the numerical integration, which involves a summation over the quadrature grids, can be amortized to the optimization steps. At each step, stochastic gradient descent (SGD) is performed with a sampled minibatch of the grids. Extensive experiments are carried out to demonstrate the advantage of our approach in terms of efficiency and stability. In addition, we show that our approach enables us to explore more complex neural-based wave functions.

1. INTRODUCTION

Density functional theory (DFT) is the most successful quantum-mechanical method, which is widely used in chemistry and physics for predicting electron-related properties of matters (Szabo & Ostlund, 2012; Levine et al., 2009; Koch & Holthausen, 2015) . As scientists are exploring more complex molecules and materials, DFT methods are often limited in scale or accuracy due to their computation complexity. On the other hand, Deep Learning (DL) has achieved great success in function approximations (Hornik et al., 1989) , optimization algorithms (Kingma & Ba, 2014), and systems (Bradbury et al., 2018) in the past decade. Many aspects of deep learning can be harnessed to improve DFT. Of them, data-driven function fitting is the most straightforward and often the first to be considered. It has been shown that models learned from a sufficient amount of data generalize greatly to unseen data, given that the models have the right inductive bias. The Hohenberg-Kohn theorem proves that the ground state energy is a functional of electron density (Hohenberg & Kohn, 1964a) , but this functional is not available analytically. This is where data-driven learning can be helpful for DFT. The strong function approximation capability of deep learning gives hope to learning such functionals in a data-driven manner. There have already been initial successes in learning the exchange-correlation functional (Chen et al., 2020a; b; Dick & Fernandez-Serra, 2020) . Furthermore, deep learning has shifted the mindsets of researchers and engineers towards differentiable programming. Implementing the derivative of a function has no extra cost if the primal function is implemented with deep learning frameworks. Derivation of functions frequently appears in DFT, e.g., estimating the kinetic energy of a wave function; calculating generalized gradient approximation (GGA) exchange-correlation functional, etc. Using modern automatic differentiation (AD) techniques ease the implementation greatly (Abbott et al., 2021) . Despite the numerous efforts that apply deep learning to DFT, there is still a vast space for exploration. For example, the most popular Kohn-Sham DFT (KS-DFT) (Kohn & Sham, 1965) utilizes the self-consistency field (SCF) method for solving the parameters. At each SCF step, it solves a closed-form eigendecomposition problem, which finally leads to energy minimization. However, this method suffers from many drawbacks. Many computational chemists and material scientists criticize that optimizing via SCF is time-consuming for large molecules or solid cells, and that the convergence of SCF is not always guaranteed. Furthermore, DFT methods often utilize the linear combination of basis functions as the ansatz of wave functions, which may not have satisfactory expressiveness to approximate realistic quantum systems. To address the problems of SCF, we propose a deep learning approach for solving KS-DFT. Our approach differs from SCF in the following aspects. First, the eigendecomposition steps in SCF come from the orthogonal constraints on the wave functions; we show in this work that the original objective function for KS-DFT can be converted into an unconstrained equivalent by reparameterizing the orthogonal constraints as part of the objective function. Second, we further explore amortizing the integral in the objective function over the optimization steps, i.e., using stochastic gradient descent (SGD), which is well-motivated both empirically and theoretically for large-scale machine learning (Bottou et al., 2018) . We demonstrate the equivalence between our approach and the conventional SCF both empirically and theoretically. Our approach reduces the computational complexity from O(N 4 ) to O(N 3 ), which significantly improves the efficiency and scalability of KS-DFT. Third, gradient-based optimization treats all parameters equally. We show that it is possible to optimize more complex neural-based wave functions instead of optimizing only the coefficients. In this paper, we instantiate this idea with local scaling transformation as an example showing how to construct neural-based wave functions for DFT.

2. DFT PRELIMINARIES

Density functional theory (DFT) is among the most successful quantum-mechanical simulation methods for computing electronic structure and all electron-related properties. DFT defines the ground state energy as a functional of the electron density ρ : R 3 → R: E gs = E[ρ]. The Hohenberg-Kohn theorem (Hohenberg & Kohn, 1964b ) guarantees that such functionals E exists and the ground state energy can be determined uniquely by the electron density. However, the exact definition of such functional has been a puzzling obstacle for physicists and chemists. Some approximations, including the famous Thomas-Fermi method and Kohn-Sham method, have been proposed and have later become the most important ab-initio calculation methods. The Objective Function One of the difficulties in finding the functional of electron density is the lack of an accurate functional of the kinetic energy. The Kohn-Sham method resolves this issue by introducing an orthogonal set of single-particle wave functions {ψ i } and rewriting the energy as a functional of these wave functions. The energy functional connects back to the Schrödinger equation. Without compromising the understanding of this paper, we leave the detailed derivation from Schrödinger equation and the motivation of the orthogonality constraint in Appendix B.1. As far as this paper is concerned, we focus on the objective function of KS-DFT, defined as, E gs = min {ψ σ i } E[{ψ σ i }] = min {ψ σ i } E Kin [{ψ σ i }] + E Ext [{ψ σ i }] + E H [{ψ σ i }] + E XC [{ψ σ i }] (3) s.t. ⟨ψ σ i |ψ σ j ⟩ = δ ij (4) where ψ σ i is a wave function mapping R 3 → C, and ψ σ * i denotes its complex conjugate. For simplicity, we use the bra-ket notation for ⟨ψ σ i |ψ σ j ⟩ = ψ σ * i (r)ψ σ j (r)dr. δ ij is the Kronecker delta function. The superscript σ ∈ {α, β} denotes the spin.foot_0 E Kin , E Ext , E H , E XC are the kinetic, external potential (nuclear attraction), Hartree (Coulomb repulsion between electrons) and



We omit the spin notation σ in the following sections for simplification reasons.

