AD-NEGF: AN END-TO-END DIFFERENTIABLE QUANTUM TRANSPORT SIMULATOR FOR SENSITIVITY ANALYSIS AND INVERSE PROBLEMS Anonymous

Abstract

Quantum transport theory describes transport phenomena from first principles, which is essential for domains such as semiconductor fabrication. As a representative, the Non-Equilibrium Green Function (NEGF) method achieves superiority in numerical accuracy. However, its tremendous computational cost makes it unbearable for high-throughput simulation tasks such as sensitivity analysis, inverse design, etc. In this work, we propose AD-NEGF, to the best of our knowledge the first Automatic Differentiation (AD) based quantum transport simulator. AD-NEGF calculates gradient information efficiently by utilizing automatic differentiation and implicit layer techniques, while guaranteeing the correctness of the forward simulation. Such gradient information enables accurate and efficient calculation of differential physical quantities and solving inverse problems that are intractable by traditional optimization methods.

1. INTRODUCTION

The strong and lasting demand for higher computing power and lower energy consumption urges the downscale of semiconductor devices. Over the last 40 years, the microelectronics industry has successfully made the transistor feature size scale from 10µm to near 20nm, of which size the quantum mechanical effect starts to dominate (Anantram et al., 2008; Wang et al., 2008; Datta, 1997) . Therefore, device simulators facing the future need to take a quantum theory oriented formulation, while NEGF, as a representative, is one of the most rigorous approaches among existing quantum transport methods (Jacoboni, 2010) . Although NEGF shows superiority in simulation accuracy, it is also extremely time and computation consuming. Recently, many works successfully integrate machine learning techniques to resolve the accuracy-efficiency dilemma of scientific simulations. A typical paradigm is to build up learningbased surrogate models (e.g., a neural network) (Li et al., 2020; Bürkle et al., 2021; Pimachev & Neogi, 2021) . By learning from data generated with highly accurate simulations beforehand, the surrogate model is expected to maintain first-principle accuracy while performing much faster in usage. A fatal problem of such methods is that there is no guarantee for prediction accuracy, especially for input out of the distribution of the training dataset. Such drawback limits the application of machine learning based surrogates in quantum transport scenarios. An alternative is to utilize automatic differentiation to make the computation process differentiable. In quantum transport simulations, practically useful information is often related to calculating derivatives. For instance, the thermoelectric property measured by the Seebeck coefficient; the sub-threshold swing of MOSFET that is related to the derivative of the drain current I D with respect to the applied gate voltage V g , etc. Compared to traditional numerical differentiation, automatic differentiation can overcome the trade-off between the round-off error and the truncation error when choosing the step-size (Gautschi, 1997, Chap. 3) , and also can be numerically more efficient when the input dimension is high. Moreover, in theoretical inverse problems, an end-to-end differentiable solver is also extremely useful and in fact, critical. The availability of gradients makes it possible to conduct efficient gradient-based optimization, which can outperform black-box optimization methods such as Bayesian optimization, genetic algorithm, etc., and can conduct optimization on a scale that black-box methods cannot. Recent advances have also shown the value to apply differentiable programming in scientific computation scenarios, such as fluid dynamics (Holl et al., 2019) , quantum chemistry (Kasim & Vinko, 2021), molecular dynamics (Schoenholz & Cubuk, 2020), photonic crystal optimization (Minkov et al., 2020), etc. In this work, we propose AD-NEGF, to the best of our knowledge the first end-to-end differentiable quantum transport simulator. The entire numerical process of NEGF and TB modeling is implemented in PyTorch, including the computation of the self-energy term, the Green function, the electrostatic potential, the transport properties, as well as an optional Slater-Koster Tight-Binding (SKTB) module to generate the block tri-diagonal Tight-Binding (TB) Hamiltonian (Klymenko et al., 2021) , which we will introduce in detail in Section 3. The backward pass to compute the gradients is improved by utilizing the implicit gradient techniques and the adjoint sensitivity method for Partial Differential Equations (PDE). To efficiently backpropagate through Poisson's equation in transport, we propose and implement the image charge gradient method, which can utilize the Fast Multi-pole Method (FMM) to reduce the backpropagation complexity of Poisson's equation from O(N 3 ) to O(N 4/3 ). We demonstrate the capability of AD-NEGF to efficiently and accurately compute differential physical properties by comparing with numerical differentiation. Also, it is shown that by cooperating AD-NEGF with the gradient-based optimizer, it can perform high-dimensional optimization at a scale that is not affordable with conventional optimization approaches. Furthermore, in a more practical scenario of material doping optimization where we optimize the empirical SK parameters of injected atoms, our method shows significant advances in convergence speed and optimization solution, compared with traditional black-box optimization methods. Our contributions can be summarized as follows: • We propose and implement AD-NEGF, as far as we know the first end-to-end differentiable quantum transport simulator, including the NEGF method, the Poisson's equation module for self-consistent electrostatic potential computation, and the SKTB module to generate the tight-binding Hamiltonian from the coordinates and properties of the system atoms. • The efficiency of the backward gradient computation is improved by applying the implicit gradient method, the adjoint method for PDEs, as well as our newly proposed gradient computation for the image charge method. • We validate the advantages of AD-NEGF in calculating differential transport quantities, high-dimensional parameter fitting, and device optimization, where AD-NEGF outperforms numerical differentiation and black-box optimization methods.

2. RELATED WORKS

NEGF. Originating from Keldysh (1964); Kadanoff (2018), NEGF has been a well-received method in the quantum transport theory, which describes a system with a finite bias voltage and contact interactions under consideration. Recently, NEGF-based computation methods gain increasing popularity for the simplicity of the formulation, and the easy implementation in programming (Ferry & Goodnick, 1999; Taylor et al., 2001; Brandbyge et al., 2002; Fetter & Walecka, 2012) , which makes NEGF one of the most widely applied methods in transport calculation. Several methods dedicated to improving its numerical stability and computational efficiency are proposed (Sancho et al., 1985; Krstić et al., 2002; Rungger & Sanvito, 2008) , some of which are widely implemented in modern quantum transport simulation software, including but not limited to Papior et al. ( 2017 (2011) . On the other hand, despite its advantages, the NEGF method suffers from heavy computational burdens. AI for Quantum Transport. There have been prior works to apply machine learning techniques in quantum transport, mostly by training a neural network with data generated from first-principle simulations, so that the neural network can serve as an efficient surrogate model to predict transport properties, such as conductance (Bürkle et al., 2021; Pimachev & Neogi, 2021; Li et al., 2020) , transport coefficients (Lopez-Bezanilla & von Lilienfeld, 2014) , etc. Most existing methods use relatively simple deep learning models such as multi-layer perceptrons ( Župančić et al.) and convolutional networks (Han et al., 2021; Souma & Ogawa, 2021; 2020) , while in some cases more advanced and specially designed models are utilized (Bürkle et al., 2021) . However, as mentioned in Section 1, a dataset generated with ab-initio simulation is required, which is expensive to obtain. Moreover,



); Smidstrup et al. (2019); Steiger et al.

