LATTICE CONVOLUTIONAL NETWORKS FOR LEARNING GROUND STATES OF QUANTUM MANY-BODY SYSTEMS

Abstract

Deep learning methods have been shown to be effective in representing ground-state wave functions of quantum many-body systems. Existing methods use convolutional neural networks (CNNs) for square lattices due to their image-like structures. For non-square lattices, existing method uses graph neural network (GNN) in which structure information is not precisely captured, thereby requiring additional hand-crafted sublattice encoding. In this work, we propose lattice convolutions in which a set of proposed operations are used to convert non-square lattices into grid-like augmented lattices on which regular convolution can be applied. Based on the proposed lattice convolutions, we design lattice convolutional networks (LCN) that use self-gating and attention mechanisms. Experimental results show that our method achieves performance on par or better than the GNN method on spin 1/2 J 1 -J 2 Heisenberg model over the square, honeycomb, triangular, and kagome lattices while without using hand-crafted encoding.

1. INTRODUCTION

Study of quantum many-body problems is of fundamental interests in physics. It is crucial for theoretical modeling and simulation of complex quantum systems, materials and molecules (Carleo et al., 2019) . For instance, graphene, arguably the most famous 2D material, is made of carbon atoms on a honeycomb lattice. Solving quantum many-body problems remains to be very challenging because of the exponential growth of Hilbert space dimensions with the number of particles in quantum systems. Only approximation solutions are available in most cases. Tensor network (White, 1992; Schollwöck, 2011; Orús, 2014; Biamonte and Bergholm, 2017) is one of the popular techniques to model quantum many-body systems but suffers entanglement problems (Choo et al., 2018) . Variational Monte Carlo (VMC) (McMillan, 1965) is a more general methodology to obtain quantum many-body wave functions by optimizing a compact parameterized variational ansatz with data sampled from itself. But how to design variational ansatz with high expressivity to represent real quantum states is still an open problem. Recently traditional machine learning models, such as restricted Boltzmann machine (RBM) (Smolensky, 1986) , has been used as variational ansatz (Carleo and Troyer, 2017; Nomura et al., 2017; Choo et al., 2018; Kaubruegger et al., 2018; Choo et al., 2020; Nomura, 2021; Chen et al., 2022) . Following this direction, some studies explore deep Boltzmann machines (Gao and Duan, 2017; Carleo et al., 2018; Pastori et al., 2019) and fully-connected neural networks to represent quantum states (Saito and Kato, 2018; Cai and Liu, 2018; Saito, 2017; 2018; Saito and Kato, 2018) . Most recent studies also use CNN as variational ansatz for square lattice systems (Liang et al., 2018; Choo et al., 2019; Zheng et al., 2021; Liang et al., 2021; Roth and MacDonald, 2021) . And GNN has been applied to non-square lattices and random graph systems (Yang et al., 2020a; Kochkov et al., 2021) . In this work, we explore the potential of using CNN as variational anstaz for non-square lattice quantum spin systems. We propose lattice convolutions that use a set of proposed operations to convert non-square lattices into grid-like augmented lattices on which any existing CNN architectures can be applied. Based on proposed lattice convolution, we design highly expressive lattice convolutional networks (LCN) by leveraging self-gating and attention mechanisms. Experimental results show that our method achieves performance on par or better than the GNN method over the square, honeycomb, triangular, and kagome lattice quantum systems on spin 1/2 J 1 -J 2 Heisenberg model, a prototypical quantum many-body model of magnetic materials that captures the exchange interaction between spins. Novelty and Significance. Our work proposes the first pure deep learning approach that does not require any prior knowledge of quantum physics to solve quantum many-body problems on different types of lattice systems. Our approach overcomes the shortcomings of previous neural quantum state methods, which not only require extensive prior knowledge but are also designed for a specific lattice or even a specific regime. However, our method can be seamlessly applied to different lattices and can still achieve competitive or even better performance than existing methods without introducing prior knowledge. As a result, our method possesses great generalizability in practice. This makes our approach of great value in the study of quantum many-body problems. Relations with Prior Work. GNN (Kochkov et al., 2021) is proposed as the first and generic method that can be applied to various lattice shapes. To this end, it is natural that LCN uses the same experiment setting as GNN. While GNN uses different hand-crafted sublattice encoding techniques for different lattice structures, LCN only needs to augment different lattices in a simple and principled way without any prior knowledge. This significantly enhances the generalization capability of LCN in practice. Roth and MacDonald (2021) proposes a general framework called Group-CNN. However, it can only be easily applied to square and triangular lattices. Moreover, it still needs to consider specific symmetry groups for different lattice systems as prior knowledge. Choo et al. (2019) applies CNN on square lattice, but it needs to use specific quantum physics knowledge such as point group symmetry and the Marshall sign rule, which is the known sign structure of ground state. However, the Marshal sign rule only works for bipartite graphs (such as square lattice) and non-frustrated regimes.

2. BACKGROUND AND RELATED WORK

In quantum mechanics, a quantum state is represented as a vector in Hilbert space. This vector is a linear combination of observable system configurations {c i }, known as a computational basis. In the context of spin 1/2 systems, each spin can be measured in two states, spin-up or spin-down, which are represented by ↑ and ↓, respectively. All the combinations of spins form a basis. Given N spins, there are in total 2 N configurations in the computational basis. Specifically, a state can be written as |ψ⟩ = 2 N i ψ(c i )|c i ⟩, where |c i ⟩ represents an array of spin configurations of N spins, e.g., ↑↑↓ • • • ↓, and ψ(c i ) is the wave function, which is in general a complex number. The summation is over all possible 2 N spin configurations. The squared norm |ψ(c i )| 2 corresponds to the probability of system collapsing to configuration c i when being measured, and 2 N i |ψ(c i )| 2 = 1 due to normalization.

2.1. GROUND STATES

The ground state of a quantum system is its lowest-energy state. Usually, many physical properties can be determined by the ground state. Particle interactions within a given quantum many-body system are determined by a Hamiltonian, which is an Hermitian matrix H in the Hilbert space. System energy and its corresponding quantum state are governed by the time-independent Schrödinger equation: H|ψ⟩ = E|ψ⟩, which is an eigenvalue equation. The eigenenergy E is the eigenvalue of H and |ψ⟩ is the corresponding eigenvector. In principle, those can be obtained by eigenvalue decomposition given H. The lowest eigenvalue is called the ground state energy, and its associated eigenvector is called the ground state. The ground state and the ground state energy determine the property of the quantum system at zero temperature.

2.2. VARIATIONAL PRINCIPLE IN QUANTUM MECHANICS

Given a system of size N , the dimension of the Hamiltonian matrix is 2 N × 2 N . Since the dimension of the matrix grows exponentially with system size, it is intractable to use eigenvalue decomposition directly even for relatively small systems. The state-of-the-art algorithm using Lanczos method, that explores the sparseness of H, can obtain the ground state energy and the ground state for N up to ∼ 48. For larger systems, a common approach is to use variational principle to approximately solve the Schrödinger equation. According to the variational principle, the energy of any given quantum state is greater than or equal to the ground state energy. So we can optimize parameterized wave functions to make the energy as low as possible. Specifically, we can approximate the ground state of a Hamiltonian H by minimizing the variational term E shown as below: E = ⟨ψ|H|ψ⟩ ⟨ψ|ψ⟩ ≥ E 0 , where E is the expectation value of energy of a variational quantum state |ψ⟩ for a given Hamiltonian H, and E 0 is the true ground state energy. The state |ψ⟩ takes the form of Eq. equation 1. Given H, the expectation value E is determined by the wavefunction ψ(c i ) that can be any parameterized functions. The goal is to find the optimal function ψ(c i ) that minimizes E. The success of the variational method relies on the expressivity of the parameterized function. Therefore it is natural to explore neural networks as variational ansatz of the wavefunction.

2.3. RELATED WORK

Variational quantum many-body states with wave functions given by neural networks are called neural-network quantum states, initially studied by Carleo and Troyer (2017) , where they use RBM to represent many-body wave function. Subsequent studies (Choo et al., 2018; Saito, 2018; Cai and Liu, 2018) apply fully-connected neural networks as variational ansatz, which has been shown to be more effective than RBM methods. But these methods do not explicitly consider the structure information when applied to two-dimentional systems (Cai and Liu, 2018) . Motivated by the spatial symmetry of periodic quantum systems and successful practices of convolutional neural networks (CNNs) in computer vision (Krizhevsky et al., 2012) , Liang et al. (2018) , Choo et al. (2019) and Szabó and Castelnovo (2020) CNN is able to effectively represent highly entangled quantum systems (Levine et al., 2019) than RBM based representations (Deng et al., 2017) , which benefits from its information reuse. However, CNN cannot be naturally used on non-grid like systems. Recently graph neural networks (GNNs) have also been applied to represent wave functions (Yang et al., 2020a; Kochkov et al., 2021) . GNNs can work with arbitrary geometric lattices or even random graph systems but structure information is not precisely captured. So additional hand-crafted sublattice encoding is needed to augment system configurations in order to respect underlying quantum symmetry (Kochkov et al., 2021) . Wave functions are usually complex-valued, therefore it is necessary to predict both amplitudes and phases. Choo et al. (2019) use complex-valued weights and biases to predict amplitudes and phases simultaneously, and design a generalized activation function. Whereas amplitudes and phases are predicted separately using real-valued networks in Szabó and Castelnovo (2020) ; Kochkov et al. (2021) .

3. LATTICE CONVOLUTIONAL NETWORKS

While GNN can be naturally applied to non-square lattices, isotropic weights prevent it from capturing rich structure information. Therefore auxiliary hand-crafted structure encoding is needed to augment the original spin configuration input on lattices. We argue that CNN is more suitable to model wave functions for lattice systems, which features repetitive local patterns. In this section, we introduce LCN, a novel lattice convolutional network that has strong capability to model wave functions for non-square lattice systems without using any extra structure encoding.

3.1. MOTIVATION AND OVERVIEW

In this work, we focus on designing CNNs on four types of lattice, including square, triangular, honeycomb and kagome, which are the four of the most common lattice structures that describes two dimensional materials. The lattice structures are shown in Figure 1 . CNNs are known to be efficient feature extractors on regular structures. Inside the network, each convolution layer applies It is straightforward to apply CNNs on a square lattice due to its image-like structure. Triangular lattices can be viewed as sheared square lattices where every unit cell of the square lattice undergoes the same affine transformation. Therefore we can shear the convolution kernel accordingly to match the shape of the transformed unit cells. However, for lattices that cannot be converted into grids, such as honeycomb and kagome, the key is to define the shape of the convolution kernel and optimize the weight sharing across convolution sites. In the proposed LCN, we solve these challenges in a principled way by converting non-square lattices into grid-like augmented lattices through a set of operations such that regular convolution kernels can be applied.

3.2. AUGMENTED LATTICES

As mentioned in Section 3.1, triangular lattices can be seen as sheared square lattices, which implies that the local structure is same everywhere on the lattice, hence regular square kernels can be naturally applied. By contrast, honeycomb and kagome lattices have multiple local structures, making it difficult to share kernel weights across different structures. Moreover, different local structures are arranged in a staggered manner, which impedes the information reuse among the same local structures. Critically, we make the key observation that honeycomb and kagome lattices can be converted from triangular lattices by removing some vertices and edges. Conversely, honeycomb and kagome lattices can be viewed as triangular lattices through augmenting virtual vertices back on the original lattices. As shown in Figure 1 (c) and (d), virtual vertices (green dots) are inserted in the center of each hexagon sub-structure. Through this augmenting operation, we can apply identical kernel everywhere on the lattices regardless of original local structures but still can capture different structure information. The advantages of this operations are two folds. First, during convolution, the virtual vertices participate in the convolution in the same way as the original vertices. i.e., values of virtual vertices are also updated. By doing so, the virtual lattices can gather and distribute the information from the original vertices, which can help increase receptive field and boost information exchange. Second, by adding virtual vertices, we can overlap the same convolution kernel in order to enable information reuse, which is crucial to capture long-range spin correlations (Liang et al., 2018) . To some extent, augmenting with virtual vertices could enhance the wave function representation ability.

3.3. LATTICE CONVOLUTIONS

After augmentation, the lattice becomes either square or triangular, both of which are grid-structured. However, due to the characteristics of the input, additional processing steps are required in order to apply regular convolutions on the augmented lattice. Boundary alignment (BA). Lattices with finite vertices often have irregular boundaries. However, in current deep learning libraries, convolutions on image-like data typically requires the input grid to have equal number of elements on each row. Therefore, we zero-pad the augmented lattices into parallelograms. In our experiments, square lattices already have regular boundaries so this step is omitted. The aligned boundary is drawn with solid line in Figure 2 . Periodic padding (PP). For images, boundaries are commonly zero-padded before convolutions to preserve the size of feature maps. Finite quantum systems often consider periodic boundary conditions so that the lattice can be repeated to fill the entire space. To preserve this important structure information, after padding the aligned boundaries with zero, we replace the padding values around the original lattice area with the values given by the periodic boundary condition. We can optionally do the periodic padding for the virtual vertices as well. In Figure 2 , the padded boundary is drawn with dashed line, the original lattice area is marked with pink shadow and the periodic padding is marked with green shadow. Mask. Finally, after each convolution, to clean up the artifacts introduced in the two previous steps, we reset all vertices used for the boundary alignment and the periodic padding to zero. We do not reset the virtual vertices to allow information passing through them, i.e., we only reset the vertices outside the original lattice area. We also conduct ablation study of mask operation in Appendix K.1 To summarize, the proposed lattice convolution applied on an input lattice U is defined as: LatticeConv(U; W) = Mask(W * PP(BA(Aug(U)))), ( ) where W is the convolution weight matrix; PP, BA and Mask stand for the above mentioned three processing steps and Aug stands for the augmentation step defined in Section 3.2; The symbol * denotes the regular convolution which is defined as: (W * U ′ ) i,j = s m=-s s ′ n=-s ′ W mn U ′ i-m,j-n , where U ′ ∈ R H×W ×d denotes a H × W feature map with d input channels. W has shape (2s + 1) × (2s ′ + 1) × d ′ × d where d ′ is the number of output channels and (2s + 1) × (2s ′ + 1) is the size of the receptive field. Square. As shown in Figure 1 (a), for square lattices, the 3 × 3 kernel receptive field contains all four nearest neighbors and four second nearest neighbors (or called next nearest neighbor in some references), which is defined by the euclidean distance, instead of by connectivity.

3.4. INSTANTIATIONS

Triangular. For triangular lattices, the receptive field of 3 × 3 convolution kernel includes six nearest neighbors and two second nearest neighbors of center vertices, as shown in Figure 1 (b) . This structure pattern is same at all positions across the lattice. Honeycomb. For honeycomb lattices, the 3 × 3 convolution kernel centered on the original vertices captures all three nearest neighbors and two second nearest neighbors. As shown in the Figure 1 (d), depending on the local structure, the mapping between the kernel weights and the original vertices can have two different situations. The kernel centered on virtual vertices captures six original vertices around it. For honeycomb lattice, we also apply the periodic padding for the virtual vertices. Kagome. For kagome lattices, as shown in Figure 1 (c), the 3 × 3 kernel can capture all nearest neighbors and some next nearest neighbors. The virtual vertices can receive information from eight original vertices around it. There are three local structures centered on original vertices. And we only apply the periodic padding for the virtual vertices on kagome lattices of size 36.

3.5. CNN VERSUS GNN

The convolution applies different weights for neighbors with different relative positions, which we argue is critical to capture the structure information. This is to the opposite with graph neural networks where same weights are used for all one-hop neighbors on a graph. For example, we can define a graph on the square lattice by defining edges as the spatially nearest neighbors. For quantum many-body systems, the spatially second nearest neighbors play an important role in defining the energy. But the two-hop neighbors on the graph will also include the third nearest neighbors. To be able to capture the structure information with GNNs, Kochkov et al. (2021) propose to augment the vertices with sublattice encoding to explicitly provide structure information at input. Our experiments show that such encoding is indeed critical for GNNs. On the contrary, our lattice convolution accurately learns the ground state without any additional input. To some extent, the proposed lattice convolution can learn the structure encoding automatically in the kernel space.

4. NETWORK ARCHITECTURE AND TRAINING

After constructing the grid-like input and defining the convolution operation, any existing convolution neural network architecture can be applied. We aim to design variational ansatz to have powerful expressivity and capture spatial long-range spin correlations. To this end, our model is developed based on recent advanced deep learning modules and attention mechanisms. The details of main components are described below. Squeeze-and-Excitation Block. Squeeze-and-Excitation (SE) block (Hu et al., 2018) can improve the quality of representations produced by a network by explicitly modeling the channel-wise interdependencies. It utilizes squeeze operation to aggregate channel-wise global spatial information and excitation operation to capture channel-wise dependencies through self-gating recalibration. Details of formulation can be found in Appendix I. Non-Local Block. We find it is useful to incorporate spin-spin global interaction other than only locality interaction defined by lattice edges. Non-local operation (Wang et al., 2018) is designed to capture long-range dependencies. Specifically, non-local operation make features at one position attend to all other position's features. Details of formulation can be found in Appendix I. In our model, we predict amplitudes and phases separately with real-valued network, following Kochkov et al. (2021) . We first use lattice convolution to transform original spin configuration input into embedding space and then stack multiple SE-Non-Local layers to obtain the final latent representation. At last, we flatten the feature maps into a vector to keep all information on each lattice vertex and use MLP to obtain the log amplitude and argument of the wave function value. The overall architecture of our variational ansatz is shown in Figure 3 . We also conduct ablation study on network choice in ??

SE-Non

Training. As discussed in Section 2.2, we use Variational Monte Carlo framework to optimize the variational ansatz iteratively by minimizing the statistic expectation of system energy ⟨E⟩. The data (spin configurations) are sampled from the probabilistic distribution defined by variational ansatz 

5. EXPERIMENTS

In this section, we evaluate the proposed LCN on learning ground states of the spin-1/2 J 1 -J 2 Heisenberg model, where the input vertices represent quantum spins. We show that our model can accurately approximate the ground state energies and achieve on par or better results with GNN models. Lattice. Following Kochkov et al. (2021) , four kinds of lattice are used in our experiments including square, honeycomb, triangular and kagome. Periodic boundary conditions are used so that the neighborhood patterns of the boundary vertices are the same with the internal vertices. All the lattice geometries we use in the experiments can be found in Kochkov et al. (2021, Appendix A.5.a) . J 1 -J 2 Quantum Heisenberg model. The J 1 -J 2 quantum Heisenberg model is the prototypical model for studying the magnetic properties of quantum materials. Its Hamiltonian matrix is given by: H = ⟨i,j⟩ S i • S j + J 2 ⟨⟨i,j⟩⟩ S i • S j , where S i = (S x i , S y i , S z i ) are the spin-1/2 operators of the i-th vertex. The spin operator S α i is an Hermitian matrix of dimension 2 N , defined as S α i = I ⊗j-1 ⊗ σ α ⊗ I ⊗N -j , where ⊗ stands for Kronecker product, I is the two-by-two identity matrix and σ α is the two-by-two Pauli matrix for α = x, y, z. The term S i • S j describes the antiferromagnetic exchange between the spin on site i and the spin on site j. ⟨•, •⟩ denotes the nearest neighbors and ⟨⟨•, •⟩⟩ denotes the second nearest neighbors, both in terms of euclidean distances; J 2 controls the interaction strength between the next nearest neighbors. The interaction strength between nearest neighboring spins is set to 1 as the unit. Setup. We test our models on four kinds of lattice with various sizes and J 2 values. We compare the proposed LCN with GNN (Kochkov et al., 2021) as well as reference energies. We use the same references as in Kochkov et al. (2021) . We also choose specific J2 values under which the ground state is much harder to learn for the GNN model. The energy per site of the learned wave function is used as evaluation metric where the total energy of the system is divided by the number of spins (vertices). A lower energy per site indicates a more accurate approximation to the ground state. More implementation details can be found in Appendix D. Considering we are proposing a generic CNN network that fits multiple kinds of lattice without prior knowledge, we don't include other customized physics-incorporated networks for square lattice as baselines. But for completeness, we conduct experiments on full range J2 of square lattice to compare with other state-of-the-art models in Appendix L. Results. The experimental results are summarized in Table 1 . For small systems (N = 32, 36), the reference energies are the exact ground state energies computed by direct diagonalizing the Hamiltonian matrix (Schulz et al., 1996; Albuquerque et al., 2011; Iqbal et al., 2016; Changlani et al., 2018) . For large systems (N = 98, 108), the exact diagonalization is computationally infeasible. Reference energies are calculated with quantum Monte Carlo (QMC) (Sandvik, 1997) and RBM+PP (pair-product states) (Nomura and Imada, 2021) for square with J 2 = 0.0 and J 2 = 0.5, respectively, Table 1 : Estimated energy per site of the learned wave function (with error bars in parenthesis). Lower is better. Four kinds of lattice with various sizes are used for comparison. The J 2 value controls next nearest neighboring interaction on the lattice, resulting in differences in the ground states. The GNN-2 model doubles the parameters and computation by using separate branches for predicting the amplitude and argument of the wave function. Best result in each row is denoted in bold (if two are the same, we only bold-face LCN for clarity), and LCN's results that worse than GNN-2 but better than or same with GNN are underlined. We use * to denote the results measured from plots in their reference papers. For the reference ground state energies we also annotate the employed methods: † for exact diagonalization, ∞ for infinite-size estimates, ‡ for RMB+PP, § for QMC methods, and ¶ for DMRG methods. and density-matrix renormalization group (DMRG) (Iqbal et al., 2016; Yan et al., 2011) for triangular and kagome. In some cases, the estimates for infinite-size lattice are used as reference energies, extrapolated from exact diagonalizations for honeycomb or from DMRG for triangular. These reference energy of large systems can be seen as a very tight upper bound of true ground state energy.

Lattice

Results show that the wave functions learned by the proposed LCN consistently gives energies close to the reference ground state energies on both small and large systems. For small systems, LCN gives similar ground state energies with GNN on square and honeycomb lattices while achieves better ground state energies on triangular and kagome lattices. For large systems at J 2 = 0, LCN achieves similar ground state energies with GNN on the square lattice while falls behind on triangular and kagome lattices. For large systems at J 2 ̸ = 0, LCN outperforms GNN on all tested lattices including square, honeycomb and triangular. The good performance proves that the proposed LCN is able to accurately represent the ground states of quantum many-body systems without explicit structure encoding. Quantum State and Energy. The quality of the variational result is measured by the ratio (E -E 0 )/∆, where ∆ is the gap of the Hamiltonian, i.e. the difference between the energies of the first excited state and the ground state. Usually for gapless model, the exact gap ∆ is unknown. However, for the Heisenberg J1-J2 model, approximately improvement on order 1/N 2 on the energy per spin could be considered significant in approaching the true ground state. As listed in Table 1 , our results can achieve this improvement in most cases. But even though energy value improvement is slightly less than order 1/N 2 , there still could be dramatic improvements in approaching true ground-state, we conduct an experiment to show this on 12 node kagome lattice in Appendix G. Kernel Design Comparison. Apart from applying regular convolution kernels on augmented lattices, we also design special convolution kernels for original honeycomb, triangular and kagome lattices, which only capture the nearest neighbors of each center vertices. Details of these special kernel design can be found at Appendix E. We compare the performance of these two categories of kernel design on small systems over honeycomb, triangular and kagome lattices, as shown in Table 2 . Experimental results show that regular convolution kernels consistently outperform special kernels. We argue the reasons are two folds. First, the regular kernel directly captures interaction between part of second nearest neighbors, which is more helpful when J 2 is not zero. Second reason is that information reuse is hindered to some extent for special kernels. For example, when using special kernels for original kagome lattices, three distinctive kernel shapes need to be used for capturing different local structures on lattices. And the same local structure has little overlap with each other, resulting in less overlap between same kernels. This impedes the information reuse among the same local structures that is crucial for capturing long-range spin correlations (Liang et al., 2018) . So we conclude that lattice augmentation together with regular kernels are necessary for processing quantum lattice systems. On one hand, lattice augmentation is general for these four kinds of lattice and does not need any prior domain knowledge compared with GNN's sublattice encoding. On the other hand, with virtual vertices added, regular kernels can be applied, which has many advantages over special kernels such as boosting information exchange and reuse, as described in Section 3.2. Different Kernels for Honeycomb. For honeycomb lattices, the special kernel has half less parameters than regular kernel. So for fairness comparison, we enlarge the special kernel to capture all next nearest neighbors and hence the number of parameters is similar to the regular kernel. As shown in Table 3 , performance of the 2-hop special kernel is still worse than regular kernel with virtual vertices added. Another observation is that the 2-hop special kernel performs even slightly worse than 1-hop special kernel, which implies that enlarging the receptive field of special kernel cannot help capture further useful spin-spin correlation. Network Architecture Comparison. We conduct experiments on kagome lattice to show the effect of Squeeze-and-Excitation (SE) block and Nonlocal block, as shown in Table 4 . Adding SE block will improve performance upon using residual connection only. We hypothesize that SE blocks use squeeze operation to aggregate global spatial information and rescaling to recalibrate the importance of channels to capture channel-wise dependencies, which can implicitly capture long-range spin correlation. And non-local block makes spin at one position attend to all other spins, which explicitly capture spin-spin global interaction, so using non-local block can further improve performance. 

6. CONCLUSION

We propose lattice convolutions to process non-square lattices by converting them into grid-like augmented lattices through a set of operations. So regular convolution can be applied without using hand-crafted structure encoding, which is needed in the previous GNN method. And we design lattice convolutional networks that use self-gating and attention mechanisms to capture channel-wise interdependencies and spatial long-range spin correlations, which contribute to high expressivity of variational wave functions. We experimentally demonstrate the effectiveness of lattice convolution network wave functions and achieve performance on par or better than existing methods. A SUMMARY OF TEST LATTICES Ĉ ← Metropolis-Hastings(ψ θ , Ĉ, m) 13: until Ê0 is converged

C VARIATIONAL MONTE CARLO

As discussed in Section 2.2, we use Variational Monte Carlo approach to optimize the variational ansatz by minimizing the statistic expectation of system energy. The equation is shown as below: E = ⟨ψ|H|ψ⟩ ⟨ψ|ψ⟩ = i,j ⟨ψ|c i ⟩ ⟨c i |H|c j ⟩ ⟨c j |ψ⟩ ⟨ψ|ψ⟩ , = i |ψ(c i )| 2 j H ij ψ(cj ) ψ(ci) i |ψ(c i )| 2 , = E ci∼D   j H ij ψ (c j ) ψ(c i )   , where the energy expectation is computed over the probability distribution D(c i ) = |ψ(ci)| 2 i |ψ(ci)| 2 , and j H ij ψ(cj ) ψ(ci) is refereed to as local energy. Specifically, at each optimization step, we first use Markov-Chain Monte Carlo (MCMC) method to sample system configurations c i from target distribution, which is given by the neural network or called variational wave function. Then we can stochastically estimate the gradient ∇ w ⟨E⟩ (Kochkov et al., 2021) to update network parameters w. Meanwhile we can evaluate energy expectation in every optimization step by taking average of all the local energy associated with each sampled configuration. The matrix multiplication between H and |ψ⟩ can be performed efficiently because of the sparseness of Hamiltonian, which is determined by the lattice topology and the specific quantum system. Given a quantum system of N spin-1/2, the dimension of H is 2 N , however the typical number of nonzero values in each row is only of order N .

D IMPLEMENTATION DETAIL

We use the same CNN architecture for all lattices. The convolution operation in each CNN layer is replaced with the proposed lattice convolution according to the lattice type. Pre-activation is not used for triangular lattices. For kagome lattice of size 108, mask processing after convolution is not used. We implement our models and VMC procedures in PyTorch (Paszke et al., 2019) and all models were trained on NVIDIA RTX A6000. For efficiency we implement the VMC procedures with GPU parallelization. A batch of configurations are sampled in parallel at each step. Samples are kept at intervals to minimize the correlation between consecutive samples where the interval is equal to the size of the system. We further initialize each MCMC chain with a different random seed to maximize independence between chains. During training, we use the stochastic gradient estimated from each mini-batch of samples and optimize the models with the Adam optimizer (Kingma and Ba, 2015) . The optimization generally converges within 30,000 steps. We save the model with the lowest stable training energy for testing, where the energies in neighboring steps have small variance. During testing, the energy is estimated from 200,000 configurations sampled from equilibrated Markov chains. Hyperparameters for optimization can be found in Appendix J. To estimate the stochastic error bars, we group Markov chains into 100 bins. The error bar is computed as the standard deviation of average energies from each bin.

E SPECIAL KERNEL DESIGN

We can design special convolution kernel structures based on different repetitive local patterns of different lattices, where each type of kernels only capture all the nearest neighbor nodes of the center vertices. Honeycomb. We consider the convolution kernel that covers the nearest neighbors. For every vertex in the honeycomb lattice, its nearest neighbors always form an equilateral triangle. However, for two adjacent vertices, the orientations of the triangles formed by their neighbors are different. Specifically, two neighbors can be related by a 180-degree rotation. As a result, as shown in Figure 4 (a), we rotate the convolution kernel by 180 degrees when going from one vertex to one of its neighbors. Triangular. For each center vertex, all of its nearest neighbors form a hexagon and this pattern is repetitive across the whole lattice. So we can naturally design hexagon shape convolution kernel to capture this pattern, as shown in Figure 4 (b). In practice, we implement this kernel by masking two positions of the square kernel at the diagonal corner. Kagome. We observe that Kagome lattice has three different repetitive local patterns across the space, where each pattern can capture nearest neighbors around the center vertices in the pattern. So we design three different kernels corresponding to these three patterns, as shown in Figure 4 (c). And these three kernels do not share weights with each other.

F PERIODIC PADDING VERSUS ZERO PADDING

As discussed in Section 3.3, periodic padding is used before convolution in order to consider the periodic boundary condition for lattices. We compare the performance between periodic padding and zero padding, as shown in Table 6 , which shows that periodic padding is essential for achieving better results. 

G QUANTUM STATE OVERLAP AND ENERGY

As described in main experimental results, our goal is to identify the ground-state. There could be dramatic improvements in approaching true ground-state even though the corresponding energy value improvement is small. We conduct an experiment on a small kagome lattice with 12 nodes to clarify this, where true ground state can be obtained. As shown in Figure 5 , we can observe that even though energy accuracy has 0.9% improvement (from -0.4492 to -0.4533) , the ground state overlap (up to 1) can increase by 10% (from 0.9087 to 0.9971).

H MODEL CAPACITY COMPARISON

In this section, we compare the number of model parameters between Lattice Convolutional Networks (LCN) and GNN (Kochkov et al., 2021) model over different lattices, as shown in Table 7 . For triangular and kagome lattices with 36 nodes, LCN with only 0.29M and 0.49M trainable parameters is able to achieve better performance than the GNN model with 0.86M parameters. And on honeycomb and square lattices, LCN with 0.36M and 0.28M parameters yields same results with the GNN model. So we argue that LCN is more expressive and parameter efficient than GNN model on various small lattices. As for the lattices with larger nodes, the number of parameters in LCN is increased, which is caused by the final MLP layer and boundary alignment operation on the augmented lattices. Since we flatten the final feature maps into a vector which is then processed by a MLP layer, the number of parameters will increase with the size of the network input. Also, we need to zero-pad the augmented lattices into parallelograms due to the irregular boundaries of original lattice geometry structures, which then increases the size of the input feature map. 

I NETWORK COMPONENTS

Squeeze-and-Excitation Block. The process of SE-block can be represented as (Hu et al., 2018) :  z c = F sq (u c ) = 1 H × W



Figure1: Structures of four different lattices studied in this work. Shadow areas denote the regular convolution kernels. Red dots represent original vertices in lattices, and green dots represent padded virtual vertices. In (c) and (d), dashed lines denote the nearest neighbors of virtual vertices. On the right are different substructures captured by convolution. Note that shear directions won't affect the result due to lattice symmetry. Even though the lattices shown are finite but they are periodically arranged in whole space, which is realized by applying periodic boundary condition.

Figure 2: Boundary alignment and periodic padding.

-Local Layer. Based on the SE block and non-local block, we propose SE-Non-Local Layer by further introducing skip connection and pre-activation techniques. Therefore, the resulting SE-Non-Local Layer consists of the following components: Normalization → Activation → LatticeConv → SE Block → Addition → Non-local Block. We apply LayerNorm (LN) (Ba et al., 2016) as the normalization and ReLU as the activation function.

Figure 4: Special Kernel Design.

Figure 5: Improvement of state overlap on lattice

i, j), s = F ex (z, W) = σ(g(z, W)) = σ (W 2 δ (W 1 z)) , xc = F scale (u c , s c ) = s c u c ,(8)where u c ∈ R H×W denote the c-th output feature map of convolution operator. z c denote the c-th element of channel descriptor z squeezed from u c . δ refers to ReLU(Nair and Hinton, 2010) activation function and σ refers to sigmoid function.W 1 ∈ R C r ×C and W 2 ∈ R C× C r .r is the dimension reduction factor. The final output xc is computed by channel-wise multiplication between s c and u c .

use CNNs to represent quantum states for square lattices. Albergo et al. (2021) and Boyda et al. (2021) combine flow-based sampling with CNN for lattice field theories.

Then we stochastically estimate the gradient ∇ w ⟨E⟩(Kochkov et al., 2021) to update network parameters w. Details of the training method can be found in Appendix B and Appendix C.

Performance comparison between regular kernel and special kernel on honeycomb, triangular and kagome lattices.

Experiments on honeycomb with different kernel design.

Network architecture selection

LiYang, Zhaoqi Leng, Guangyuan Yu, Ankit Patel, Wen-Jun Hu, and  Han Pu. Deep learningenhanced variational monte carlo method for quantum many-body physics. Physical Review Research, 2(1):012039, 2020b. Han Zheng, Zimu Li, Junyu Liu, Sergii Strelchuk, and Risi Kondor. Speeding up learning quantum states through group equivariant convolutional quantum ansätze. arXiv preprint arXiv:2112.07611, 2021.

Test lattices settings with different lattice types, system sizes and J 2 values. Training Algorithm of Lattice Convolutional Networks 1: Input: Lattice structure L, number of spin sites N , Lattice convolution network ψ θ with trainable parameter θ, learning rate α, Markov Chain batch size B, initial annealing step s, measure step m 2: Set different random seeds for each Markov chain 3: Randomly initialize spin configurations C ∈ {+1, -1} B×N with equal number of +1 and -1 4: Ĉ ← Metropolis-Hastings(ψ θ , C, s)

Performance comparison between periodic padding and zero padding.

Model parameters of Lattice Convolutional Networks and GNN. For square lattice of size 100, the number for LCN is counted with 4 SE-Non-Local layers. We also use a 3-layer model which is slightly smaller and contains 1.0M parameters. See Appendix J for details.

annex

Non-Local Block. Response at a position, denoted as y i , is computed as a weighted sum of features at all positions. The output of each position is formulated as (Wang et al., 2018) : xj ) ,where g is a linear embedding that transform the input feature map x: g (x i ) = g x i . And f (x i , x j ) is embedded gaussian. θ (x i ) = W θ x i and ϕ (x j ) = W ϕ x j are two embeddings to compute dot product similarity. C(x) = ∀j f (x i , x j ) is normalization factor. z i is the final output of this block at position i by adding residual connection. As described in Section 3.3, we reset all vertices used for the boundary alignment and the periodic padding to zero after each convolution. The vertices used for boundary alignment do not belong to the original lattice area and periodic padding, they are only used for transforming original lattice area to regular shape where regular square kernel can be applied on the boundary. Also the vertices used for periodic padding do not belong to original lattice area. So this motivates us to use mask operation. But in practice we found mask operation could be an optional choice but it indeed has benefit in certain cases. For example, mask operation could improve performance on kagome lattice of size 36, as shown in Table 9 . Choo et al. (2019) . Results for RBM+PP are taken from Nomura and Imada (2021) . Results for RBM+Lanczos are taken from (Chen al., 2022 ) (only J 2 = 0.5, 0.55, 0.6 use one Lanczos step). Our method is the only one without any prior physical knowledge incorporated in the model.

J HYPERPARAMETERS

Exact (Schulz et al., 1996 For completeness, we compare with other physics-incorporated methods targeting on square lattice.We conduct experiments for full range J2 on square lattice of size 36, as shown in Table 10 . Compared with CNN (Choo et al., 2019) , our method consistently performs well on full range of J2, especially for frustrated regime (J2 ≈ 0.5) and small J2 while without using any prior physical knowledge. For large J2, CNN is better than ours, it might be because they enforce C 4 symmetry which is important in striped order phase at large J2 (Choo et al., 2019) . Besides, CNN incorporates Marshall sign rule which defines the sign structure of the ground state wavefunction for square lattice at J1=0 or J2=0. Marshal sign rule only works for bipartite graph (such as square lattice) and non-frustrated regimes.If the prior sign rule is violated (J2=0.5), CNN result is worse than ours, as expected.The latest SOTA methods RBM+PP (Nomura and Imada, 2021) and RBM+Lanczos (Chen et al., 2022) have very good results on square lattice. RBM+PP is a hybrid method which heavily relies on intense physical knowledge (GNN (Kochkov et al., 2021) uses some results from RBM+PP as reference energy). RBM+PP combines the restricted Boltzmann machine and the pair-product wavefunction. The pair-product wavefunction has been used in physics for a long time and it is known that the pair-product wavefunction excellently captures the ground state of the Heisenberg model on a square lattice. Therefore the combination RMB+PP is very effective on square lattices. However, it is unclear how it performs on other lattices. RBM+Lanczos incorporates spin flip, translation, and lattice point group symmetries for square lattice in the network. And it uses Lanczos iterations to further improve results. But computation cost of Lanczos method increases dramatically with Lanczos steps, and Lanczos correction will become smaller for larger systems.

M GNN WITH AND WITHOUT SUBLATTICE ENCODING.

To illustrate the necessity of using hand-crafted sublattice encoding in the GNN model, we try to implement the GNN model based on details provided in Kochkov et al. (2021) and reproduce the results. We conduct ablation study on sublattice encoding and summarize the results in Table 11 . For triangular and kagome lattices, we cannot get valid results without using sublattice encoding (either too large or too small). One possible reason is that the distribution of learned wave function is very hard for accurate sampling. On square and honeycomb lattices, sublattice encoding is essential for achieving better results. So we can draw the conclusion that the hand-crafted sublattice encoding is important for GNN model. 

