NEURALPCG: LEARNING PRECONDITIONERS FOR SOLVING PARTIAL DIFFERENTIAL EQUATIONS WITH GRAPH NEURAL NETWORKS

Abstract

Fast and accurate partial differential equation (PDE) solvers empower scientific and engineering research. Classic numerical solvers provide unparalleled accuracy but often require extensive computation time. Machine learning solvers are significantly faster but lack convergence and accuracy guarantees. We present Neural-Network-Preconditioned Conjugate Gradient, or NeuralPCG, a novel linear second-order PDE solver that combines the benefits of classic iterative solvers and machine learning approaches. Our key observation is that both neural-network PDE solvers and classic preconditioners excel at obtaining fast but inexact solutions. NeuralPCG proposes to use neural network models to precondition PDE systems in classic iterative solvers. Compared with neural-network PDE solvers, NeuralPCG achieves converging and accurate solutions (e.g., 1e-12 precision) by construction. Compared with classic solvers, NeuralPCG is faster via data-driven preconditioners. We demonstrate the efficacy and generalizability of NeuralPCG by conducting extensive experiments on various 2D and 3D linear second-order PDEs. 1

1. INTRODUCTION

Partial differential equations (PDEs) are fundamental mathematical models with broad applications in science and engineering, for example, the Navier-Stokes equation in fluid dynamics, Poisson's equation in computational geometry, and the Black-Scholes equation in mathematical finance. Despite their powerful modeling ability and wide applications, it is notoriously difficult to find analytical solutions to a general PDE. Therefore, numerical solvers have long been the mainstay of solving PDEs. Classic PDE solvers provide accurate solutions to well-understood PDEs but typically at the cost of long computation time. Speeding up these classic solvers is non-trivial and often requires complex numerical techniques, e.g., multi-grid methods (Briggs et al., 2000) , domain decomposition (Smith, 1997) , and model reduction (Holmes et al., 2012) . Recently, several pioneering works (Li et al., 2018; Sanchez-Gonzalez et al., 2020) introduce machine learning techniques to solving PDEs, particularly in the field of physics simulation. While this line of methods typically outperforms classic solvers in speed by a large margin, it struggles with converging into a highly precise solution (e.g., 1e -12). A lack of theoretical analysis on convergence and accuracy inhibits neural-network PDE solvers' applications in mechanical engineering, structure analysis, and aerodynamics, where precise PDE solutions have a higher priority than fast yet inexact results. This work proposes NeuralPCG, a novel and hybrid method that combines the benefits of classic and machine-learning PDE solvers. Our key observation is that neural-network solvers are fast at estimating PDE solutions with a low to moderate accuracy. This property aligns with the preconditioning technique in numerical methods, which uses an easy-to-solve approximation of the original PDE to speed up numerical solvers. Based on this intuition, NeuralPCG proposes to learn a neural network that preconditions a classic iterative solver. This is in sharp contrast to existing neural-network PDE solvers that replace the numerical solver entirely with a neural network. The backbone of NeuralPCG remains to be a classic solver, empowering it to inherit convergence and accuracy guarantees. Moreover, the learned preconditioner even outperforms classic preconditioners because it adapts to data distributions from target PDE applications. To demonstrate the efficacy of NeuralPCG, we evaluate it on a set of 2D and 3D representative linear second-order PDEs. We compare its performance with two baselines of very different natures: (1) MeshGraphNet (MGN), a state-of-the-art neural-network PDE solver (Pfaff et al., 2020) ; (2) the preconditioned conjugate gradient (PCG) algorithm with classic preconditioners; Our experiments show that NeuralPCG has its unique advantage over both baselines: Compared with MGN and similar machine-learning solvers, NeuralPCG generates convergent and accurate solutions by construction and avoids error accumulation. On the other hand, NeuralPCG also outperforms classic PCG solvers in speed because our method learns a preconditioner tailored to the training data distribution, which classic preconditioners typically do not exploit. Finally, NeuralPCG generalizes to physics parameters or meshes with moderate differences (5σ) from the training data and is robust to outliers by design. In summary, our work makes the following contributions: 1. We propose NeuralPCG, a novel and efficient solver for linear second-order PDEs that combines the benefits of classic and machine-learning PDE solvers. 2. We present a framework for studying the performance of preconditioners in classic iterative solvers, including a benchmark problem set with carefully designed performance metrics. 3. We conduct extensive experiments to evaluate the efficacy and generalizability of Neu-ralPCG and demonstrate its advantages over existing methods.

2. RELATED WORK

Numerical PDE solvers Classic PDE solvers discretize the continuous PDEs into a numerical system. Popular discretization schemes include finite differences (Strikwerda, 2004) , finite elements (Hughes, 2012), and finite volumes (LeVeque et al., 2002) . Once the discretized numerical system is formulated, one often needs to solve a linear system. Solving these linear systems accurately and efficiently is the key to a robust PDE solver. Previous work has studied numerical linear algebra extensively (Trefethen & Bau III, 1997; Golub & Van Loan, 2013) . These solvers have formed the backbone of many successful, mature numerical PDE software, e.g., ANSYS (Madenci & Guven, 2015) , Abaqus (Helwany, 2007) , COMSOL (Pryor, 2009) , etc. These solvers are crucial for science and engineering, but extensive wall-clock computation time limits their wider deployment. This time bottleneck typically comes from (repeatedly) solving a linear system, and the problem becomes even more severe when the size of the problem scales up. To accelerate linear solvers, prior work has constructed efficient preconditioners for iterative linear solvers via matrix decomposition (Khare & Rajaratnam, 2012) , sparsity control (Liu, 1990; Davis & Hager, 1999; Schäfer et al., 2021) , and multiscale techniques (Chen et al., 2021) . Machine learning (ML) methods for solving PDEs Researchers have attempted to incorporate machine learning techniques into solving PDEs. A particular line of work focuses on numerical simulation problems, e.g., solving the underlying PDEs for simulating fluid (Kim et al., 2019) , cloth (Pfaff et al., 2020) , and elastoplastic solids (Sanchez-Gonzalez et al., 2020) . Typically, these approaches aim to train a surrogate neural network model that replaces the PDE solving process entirely (Brandstetter et al., 2022b; a) . Such methods enjoy a major speedup over classic numerical methods thanks to the fast inference time of a trained network model. However, while the network models are capable of producing visually plausible results, they yield no guarantee on the accuracy of the solution. As such, these methods often suffer from error accumulations, inhibiting their wider usage in applications where accuracy is a priority. Combining numerical methods with networks Given the pros and cons of classic and machine learning methods, researchers have been combining the classical numerical solver with neural networks (Belbute-Peres et al., 2020; Um et al., 2020; Li et al., 2020a; b) , with physics-informed neural networks being a particularly notable example (Raissi et al., 2019; Karniadakis et al., 2021) . Since neural networks serve as the backbones of these methods, they often have a hard time generating quantitatively accurate solutions (e.g., 1e -12 precision) and enforcing hard constraints (Márquez-Neila et al., 2017) , both of which are relatively simple with classic numerical solvers. Additionally, generalization can be an issue when faced with data out of the distribution seen in the training set (Kim et al., 2021) . Our method avoids these issues by construction because we use a classic solver as the backbone and fine-tune its preconditioner with neural networks. Therefore, convergence and accuracy of the solutions are guaranteed even for unseen data -the performance may degrade but the solving process will not fail. Azulay & Treister (2022) also model preconditioners with neural networks, but their convolutional neural network (CNN) architecture only supports voxel grid discretization and has not been shown to obtain performance gain. Our task and method is perhaps most similar to previous works that uses CNN to learn preconditioners (Sappl et al., 2019; Ackmann et al., 2020) . However, we are different from these works in three folds. Becuase CNN architecture only supports grid-based data domains, these methods cannot be directly applied to mesh-based data, whereas we focus on mesh-based problem domains that has several advantages over grid-based data, such as sharp interface handling. Additionally, we propose a novel loss function that exploits the data distribution of field values, but previous methods use condition number as loss energy that only focus on the system matrix. Finally, our proposed method is more generic and works and tested on all second order linear PDEs, whereas previous methods uses hard-coded threshold to satisfy the constraint for a specific problem and is not applicable to other problem settings.

3.1. PROBLEM SETUP

In this work, we focus on solving linear second-order PDE problems, which cover some of the most common PDEs in scientific and engineering applications, e.g., the heat equation in thermodynamics, the wave equation in acoustics, and Poisson's equation in electrostatics. Linear second-order PDEs Formally speaking, we write linear second-order PDEs in the following format: 1 2 ∇ • A∇f (x) + b • ∇f (x) = c(x), ∀x ∈ Ω. Here, Ω ⊂ R d is the problem domain, f : Ω → R is the function to be solved, A ∈ R d×d and b ∈ R d are constants, and c : Ω → R is a user-specified function. Without loss of generality, we can safely assume A to be a symmetric matrix. The definition of A classifies linear second-order PDEs into elliptic (e.g., the Poisson or Laplace equation), hyperbolic (e.g., the wave equation), and parabolic (e.g., the heat equation) ones. This work studies the model PDEs from these categories.

Boundary conditions

We consider a mixture of Neumann and Dirichlet boundary conditions: Numerical problems after discretization The PDE defined above is a continuous problem that one needs to discretize before applying a numerical solver. Discretization approximates gradient operators with numerical computation and converts the continuous PDEs into numerical systems to be solved. Popular discretization schemes include finite difference, finite element, and finite volumes. ∂f (x) ∂n =N (x), ∀x ∈ ∂Ω N , f (x) =D(x), ∀x ∈ ∂Ω D . In this work, we adopt the standard Galerkin method from the finite element theory (Johnson, 2012) , resulting in a linear system of equations: Kf = c, where K ∈ R n×n , with n being the number of degrees of freedom after discretization, is the stiffness matrix of the PDE system after discretization, which is large, sparse, and symmetric-positivedefinite (SPD). The vector c ∈ R n discretizes the function c on the domain and also typically fuses information from (discretized) boundary conditions. The goal is to solve f ∈ R n , the support of f after discretization at each degree of freedom, from which we can reconstruct the final solution f through numerical interpolation. This linear system (K, c) becomes the input to our method and other baseline algorithms discussed in this paper.

3.2. MOTIVATION

Our strategy is to build a hybrid PDE solver that combines the advantages of both machine learning approaches and classic numerical solvers. Traditional numerical methods for solving Kf = c fall into two main categories: direct solvers and iterative solvers. Direct solvers factorize K into matrices that are easier to solve (e.g., triangular matrices) and are most useful only if the left-hand side matrix K remains fixed. We are more interested in iterative solvers because they are more suitable for varying K and f . Iterative solvers, e.g., the conjugate-gradient (CG) method, repeatedly apply matrix-vector products to refine an estimation of the solution until convergence. The runtime of an iterative solver largely depends on the condition number of the stiffness matrix K, motivating the preconditioning techniques in numerical methods. At a high level, preconditioning a numerical system means working on a modified system similar to the original problem but faster to solve. It uses the solution of the modified system to bias the iterative solvers towards solving a system with better conditioning, e.g., by clustering the eigenvalues of the stiffness matrix (Solomon, 2015) . Our method is motivated by the observation that the preconditioners described above share remarkable similarities with neural-network PDE solvers (Sanchez-Gonzalez et al., 2020): They both generate inexact solutions with moderate errors in fast computation time. This observation inspires us to propose the Neural-Network-Preconditioned Conjugate Gradient method, or NeuralPCG, which uses CG as the backbone but with a trained neural network as its preconditioner. We see this combination as mutually beneficial for both neural-network methods and classic solvers. On the network side, instead of replacing the whole PDE solver with a neural network as in many previous works (Li et al., 2018; Sanchez-Gonzalez et al., 2020; Pfaff et al., 2020) , using a network only to replace the preconditioner ensures convergence, accuracy, and robustness of the solution. It also enables us to generalize the network method to problems that favor precision over speed and to unseen data. NeuralPCG also benefits CG solvers by proposing a new perspective for designing preconditioners. Designing a high-performance preconditioner is challenging because its two desired properties (fast derivation speed and high similarity to the original system) often conflict. Otherwise, it would imply that the stiffness matrix K itself was too easy to solve in the first place. The design of classic preconditioners, e.g., Incomplete Cholesky or Symmetric Successive Over-Relaxation (SSOR) (Golub & Van Loan, 2013) , is defined on the left-hand-side matrix K only. We argue that this design decision unnecessarily limits the full power of preconditioners because they overlook the right-handside vector c and its distribution among actual PDE problem instances. In contrast to these classic preconditioners, we propose to learn a neural-network preconditioner from both left-hand-side matrices and right-hand-side vectors in the training data. We expect a higher performance due to such exploitation of data distributions.

3.3. THE NEURAL-NETWORK PRECONDITIONER

Network design Given a linear system Kf = c, our network preconditioner takes as input (K, c and outputs a preconditioner P that CG, a standard iterative solver, can use seamlessly to solve the system. The representative of K depends on the discretization scheme of the PDEs. For example, a grid discretization leads to a K matrix conveniently stored as a grayscale image on the grid, and a triangulated PDE results in a K stored as the node and edge values on a graph. Our work demonstrates our preconditioner on triangle and tetrahedron finite elements and uses GNNs as the network model. However, we believe our core idea is agnostic to discretization schemes. More concretely, consider a PDE problem on Ω discretized as a triangle mesh in 2D or a tetrahedron mesh in 3D. We construct a GNN whose nodes and edges are mesh vertices and edges, respectively. We store the input K as a one-dimensional edge feature: If K ij is a nonzero entry in K, we add it as an edge feature in the edge from node i to j. Similarly, we store vector f as a one-dimensional node feature on the graph. We then apply the same encoder-message-passing-decoder network architecture as in previous work Pfaff et al. (2020) to predict a one-dimensional feature on each edge, which serves as the output of our network. We leave more details about our network in Appendix. The last step in our network preconditioner is constructing a valid preconditioner from the network output. Unfortunately, directly assembling the predicted edge feature into a matrix often fails to serve as a valid preconditioner because there is no guarantee of its symmetry or positive definiteness. Therefore, we first construct a lower-triangular matrix L(K, c) from the network output, We use LL ⊤ as our preconditioner, and this construction ensures its symmetry and positive definiteness. Loss function Designing many classic preconditioners can be cast as a problem of minimizing their discrepancy to the given linear system over a set of easy-to-compute matrices: min P∈P L(P, K) (5) where K is the stiffness matrix defined above and the system that we want to precondition upon, P is the feasible set of preconditioners, and L(•, •) is a loss function defined on the difference between the two input matrices. A careful choice of P ensures the preconditioners remain fast to compute. For example, one can compute the Jacobi preconditioner by choosing P as the set of all diagonal matrices and L as the difference between the diagonals of the two matrix inputs. Following the idea described above, it is now tempting to consider the following loss function definition for our neural-network preconditioner: min θ (Ki,fi,ci) ∥L θ (K i , c i )L ⊤ θ (K i , c i ) -K i ∥ 2 F ( ) where θ is the network parameters to be optimized, L(θ) is the lower-triangular matrix from the network's output, and ∥•∥ 2 F represents the squared Frobenius norm. The index i loops over training data tuples (K i , f i , c i ). This definition closely resembles the goal of the famous Incomplete Cholesky preconditioner, especially since L shares the same sparsity pattern as the lower triangular part of K. However, a closer look at the loss function reveals some potential inefficiencies in its design: L := i ∥LL ⊤ -K i ∥ 2 F (7) = i ∥(LL ⊤ -K i )I∥ 2 F (8) = i j ∥LL ⊤ e j -K i e j ∥ 2 2 (9) where e j stands for the one-hot vector with one at the j-th entry and zero elsewhere. This derivation shows that this loss encourages a well-rounded preconditioner with uniformly small errors in all e j directions, regardless of the actual data distribution in the training data (K i , f i , c i ). Therefore, we consider a new loss function instead: L := i ∥LL ⊤ f i -K i f i ∥ 2 2 (10) = i ∥LL ⊤ f i -c i ∥ 2 2 . ( ) Comparing these two losses, we can see that the new loss replaces e j with f i from the training data. Therefore, the new loss encourages the preconditioner to ensemble K not uniformly in all directions but towards frequently seen directions in the training set. Essentially, this new loss trades generalization of the preconditioner with better performance for more frequent data. Remark It is worth revisiting the competing factors in designing high-performance preconditioners and clarifying how our approach handles these factors. Traditionally, designing a preconditioner strikes a balance between fast computation time and similarity with the underlying system, which often conflict with each other. For example, the Jacobi preconditioner is extremely fast to compute. However, it only approximates the diagonal of the system. On the other hand, the Incomplete Cholesky preconditioner well approximates nonzero entries in the system but requires a much longer computation time. Our neural-network preconditioner resolves the conflicts between these factors by inheriting the speed from eural networks and achieves high approximation accuracy by targeting certain data distributions learned from the training data set. 

4. EXPERIMENTS

Our experiments aim to answer the following questions: 1. How does the proposed method compare with end-to-end network models in speed and accuracy? 2. How does the proposed method compare with classical preconditioners in speed and accuracy? 3. Is the data-dependent loss introduced in the paper effective? 4. Does our approach generalize well to unseen inputs? We introduce the experiment setup in Sec. 4.1 followed by answering the four questions from Sec. 4.2 to Sec. 4.5.

4.1. EXPERIMENT SETUP

Environments We provide three environments that study the representative linear second-order PDEs: The heat environment studies the heat equation, a parabolic PDE, the wave environment studies the wave equation, a hyperbolic PDE, and the poisson environment studies the Poisson equation, an elliptic PDE. Each environment is associated with a 2D triangle mesh and/or a 3D tetrahedron mesh for discretization purposes (Fig. 1 ). More details can be found in the Appendix. Baselines We consider two sets of baselines: neural network and classic numerical methods. For neural network method baseline, we consider the state-of-the art MeshGraphNet (Pfaff et al., 2020) (MGN), which takes the input (K, c) and outputs a prediction f . For classic solvers, we compare our approach with PCG solvers using two standard preconditioners (Jacobi and IC), which we call pcg-jacobi and pcg-ic, respectively. We use an all-zero vector as an initial guess for the PCG solver. More details about baselines can be found in the Appendix. Our method The neural network architecture for our preconditioner is based on the MeshGraph-Net. We make modifications of the last layer in order to generate the LL ⊤ matrix (see Section 3.3). We then pass in the learned preconditioner into the PCG solver. The only difference between our method and pcg-jacobi and pcg-ic is the preconditioner. Therefore, any performance difference can be attributed to the preconditioner quality. More details can be found in the Appendix.

Evaluation Metrics

We consider all baselines and our method as linear solvers: given a pair of K and c, all methods needs to output f with the goal of satisfying Kf = c. The performance of our neural network baseline MGN can be quantified by checking the residual error ∥Kf -c∥. For PCG baselines and our method, we quantify the performance by comparing the total wall-clock time spent for each preconditioner to reach desired accuracy levels. To ensure a fair comparison between all methods, we summarize the performance of PCG solvers not in a single number but the following values: (a) the time spent on precomputing the preconditioner for the given K and c; (b) the iterations and (c) the total time (including the precomputing time) to reach different precision thresholds (e.g., 1e -12). We then compare the performance between iterative solvers and end-to-end networks (e.g., MGN) by checking the wall-clock time for the iterative solver to achieve the same residual error as MGN.

4.2. COMPARISONS WITH END-TO-END NETWORK MODELS

In the first experiment, we demonstrate one distinct advantage of our approach over end-to-end network methods: We ensure accurate solutions while end-to-end networks accumulate errors over time. To show this quantitatively, we solve the wave-2d equation for 100 consecutive time steps using our approach and MGN. Fig. 2 shows that while our approach agrees exactly with the ground truth (1e -10 precision), MGN deviates from the ground truth over time. At time step 100, MGN has an error of 517.4%. We can expect MGN to be faster in wall-clock time, as MGN only needs to run network inference once, while we need to run network inference to compute the preconditioner followed by running PCG solvers. Please refer to Sec. A.4.1 in Appendix for detail. To summarize, MGN is good at estimating solution rapidly while our approach has the flexibility of achieving arbitrary solution precision, just like a standard CG solver. Therefore, network methods are suitable for applications where speed dominates accuracy, while our approach is better for application that requires high precision, e.g., in scientific computing and engineering design.

4.3. COMPARISON WITH CLASSIC SOLVERS

Next, we compare our approach with numerical preconditioners, pcg-jacobi and pcg-ic. Table 8 summarizes the time cost and iteration numbers of PCG solvers using different preconditioners up to convergence threshold 1e-2, 1e-4, 1e-6, 1e-8, 1e-10, and 1e-12. pcg-jacobi conducts the simple diagonal preconditioning so it has little precomputation overhead. However, the quality of the preconditioner is mediocre, and PCG takes many iterations to converge. pcg-ic takes fewer iterations to converge. However, its precomputation process is sequential and therefore expensive to compute. By contrast, our approach features an easily parallelizable precomputation stage (like pcg-jacobi) and produces a preconditioner with a quality close to pcg-ic. Therefore, in terms of total computing time, our approach outperforms both pcg-jacobi and pcg-ic across a wide range of precision thresholds. We also show additional experimental results in ap- Table 2 reports the performances of our loss function Eqn (10) and the naive loss function Eqn (6) on the heat-2d environment. We observe that the preconditioner trained with Eqn (10) converges in fewer iterations than Eqn (6). As such, we conclude that enforcing data distribution dependence during training allows us to achieve better in-distribution inference during test time.

4.5. GENERALIZATION

Physics parameters. First, we consider generalizing the PDEs on their physics parameters, which govern system K. We use poisson-2d as an example. Our method is trained on a fixed density distribution, and we test the performance of our method on distributions that gradually deviates from the training distribution. Table 3 reports the performance of our approach on these parameters. Since changing physics parameters does not affect the matrix sparsity, the precomputation time remains largely unchanged across different data distributions (see Table 3 Column 1). Even on the challenging out-of-distribution datasets, our approach still maintains reasonable performance, achieving high precision while using less total time than pcg-ic and pcg-jacobi. Geometry. We also conduct generalization tests on mesh models, i.e., training network models on one mesh and testing its performance on unseen meshes. We demonstrate this on the poisson-3d environment, where we train the network preconditioner on a "connector mesh" (Fig. 1 ). We then test the trained network model on a new mesh "armadillo" (Fig. 3 ). By comparing the bottom and middle rows in Fig. 3 , we first see that the end-to-end network method (MGN) struggles to generate accurate solutions when deployed on the unseen mesh (0.5315 error), whereas our approach achieves arbitrary accuracy by construction (1e -9 error threshold here). We also report the time and iteration cost of our approach and classic preconditioners on poisson-3d with armadillo mesh. (Table 4 ). We notice that our method can generalize to unseen mesh. Even in the challenging case of solving PDE on an unseen test mesh, we are still able to converge in less time than pcg-ic and pcg-jacobi across different precision requirements.

5. CONCLUSIONS, LIMITATIONS, AND FUTURE WORK

This work presented NeuralPCG, a hybrid ML-numeric PDE solver. While prior ML PDE solvers often do not have precision guarantees, our approach attains arbitrary precision (up to floating-point error) by construction. Our key observation is that the preconditioner for classic iterative solvers does not require exact precision and is, therefore, an ideal candidate for neural network approximation. NeuralPCG approximates the preconditioner with a graph neural network and embeds this preconditioner into a classic iterative conjugate gradient solver. Compared to end-to-end ML approaches, our approach takes a longer wall-clock time to compute but reduces the error level from Table 4 : Our approach generalizes to unseen armadillo mesh in the poisson-3d environment, and outperforms both pcg-ic and pcg-jacobi in total wall-clock time. 1e -1 to 1e -12. Compared to classic preconditioners, our approach is faster while achieving the same accuracy. Currently, our approach is limited to linear PDEs. Future work may consider extending to more complex PDEs, such as the elastodynamics equations and the Navier-Stokes equations shown in prior end-to-end ML approaches (Sanchez-Gonzalez et al., 2020) . Additionally, we enforce the preconditioner's sparsity to be the lower triangular sparsity of A. We consider relaxing this constraint and exploring more effective sparsity control (Schäfer et al., 2021) as an exciting future research direction.

A APPENDIX

A.1 ENVIRONMENT DETAILS 

A.2 IMPLEMENTATION DETAILS AND HYPERPARAMETER

In this Section, we describe the implementation details of each method.

A.2.1 BASELINES

MeshGraphNet To ensure a fair comparison, we re-implement MeshGraphNet in PyTorch, the framework we used for our method, based on their paper and released code. We have also manually tuned its hyperparameters on our dataset to ensure we see reasonably good performance from Mesh-GraphNet. We follow the Encoder-Processor-Decoder. We show our proposed method compare with a learning based preconditioner Sappl et al. (2019) in Table 9 . Comparing to previous learning-based preconditioner, NN baseline, our method shows stable convergence across various settings using different second order linear PDEs. NN baseline (Sappl et al., 2019) is not orignally designed to be a generic method to work in different problem domains. For example, the hard-coded diagonal clipping threshold of ϵ = 1e -3 is carefully picked by the authors for the urban water problem and there is no general principles to picking such threshold for other problem settings. We can see that NN baseline Sappl et al. (2019) works on par with our method on the wave-2d setting, but it does not work in many other settings, heat-2d, poisson-2d. For Poisson-2d setting the resulting convergence iteration is even worse than the pcg-jacobi method. Additionally, the method Using condition number as the loss energy is not computationally efficient. computing condition number for a system requires full eigen decomposition, which is O(n 3 ). Computation scales cubicly with problem size. Our poisson-3d setting uses mesh of size 23,300 nodes. We train the NN baselinemethod for three days, 72 hours, and it only make through less than one percent of the training set, using the same hardware setup. Empirically, we found that using condition number as loss energy is computationally infeasible with problems more than 10,000 nodes. 



Supplementary videos are available on the project webpage: https://sites.google.com/view/ neuralpcg



Here, ∂Ω N and ∂Ω D are a partition of the domain's boundary ∂Ω, and N : ∂Ω N → R and D : ∂Ω D → R are two user-specified functions. The Neumann boundary condition defines the behavior of the directional gradient of f along the normal direction n at the boundary, and the Dirichlet boundary condition sets the target value of f at the boundary explicitly.

Figure 1: Environment Overview. Left to right: heat-2d, wave-2d, poisson-2d, and poisson-3d.3.4 THE FULL NEURALPCG SOLVERWe are now ready to describe the full NeuralPCG solver during test time. For any (K, c) input from a PDE instance after discretization and proper boundary handling, we first send the neural network preconditioner (K, c) to obtain its LL ⊤ factorization. Next, we plug the LL ⊤ preconditioner into the standard CG solver and run it until the solution reaches a target precision, concluding the full NeuralPCG algorithm. Note that the neural network model only affects the beginning of the CG algorithm by generating the LL ⊤ factorization, after which the CG solver no longer needs to access the network. This framework enables a direct and fair comparison between classic CG solvers and NeuralPCG when evaluating their performances: Any performance difference can only come from the difference between preconditioners.

Figure 2: MGN Error accumulation. Field values vs. time step (wave-2d): the ground-truth field solved using PCG with 1e-10 convergence threshold (top), the field predicted by MGN (middle), and their difference (bottom), all evaluated at time step 1, 5, 30, 70, 100 (left to right).

Figure 3: on unseen mesh: MGN (middle), our method (bottom), training mesh (top). Left to right: solution fields at different cross-section heights.

Figure 4 shows examples demonstrating varying boundary conditions. For heat-2d and wave-2d, varying length and position of mesh geometric boundary nodes are selected as Dirichlet boundaries. For poisson-2d equation, we use the inviscid-euler fluid equation as a demonstration. All solvers are only responsible for solving the pressure that makes the velocity field incompressible, which is a Poisson equation. The advection and external force steps are then applied to generate the data visualization. For poisson-2d, two sets of varying length and position of mesh geometric outer border boundary nodes are selected as influx and Dirichlet boundary. The remaining mesh geometric border nodes, including the remaining outer border and all nodes in the inner border, are obstacle boundary.

Comparison between preconditioners for PCG. We report precompute time, total time (pcgicl. precompute time) for each precision level, and the corresponding PCG iterations (in parenthesis). The best value in each category is in bold. ↓: lower the better. pendix condition number comparison between our method and previous methods in Appendix 7.4.4 ABLATION ON THE LOSS FUNCTIONTo highlight the value of our loss function targeting data distributions on the training set, we train the network with the losses in Eqn (6) and in Eqn (10), respectively. Compared to Eqn (6), Eqn (10) exposes the data distribution of c to the training process.

Wall-clock time and iterations: our method with two different loss functions on heat-2d.

Generalization: physpcg-ics parameters. We test the PCG approaches on testing datasets increasingly deviating from the training data distribution. σ stands for standard deviation of training set, test-σ, test-3σ, and test-5σ means 1, 3, and std. dev away from training distribution, respectively.

lists the environment details for experiment section 4.2, section 4.3, section 4.4, section 4.5.

The wall-clock time of MGN and our method stopped at the same error produced by MGN.

Condition number comparison between various methodsA.4.3 ADDITIONAL COMPARISON WITH CLASSICAL PRECONDITIONERSHere we show additional experimental results on the four equations, heat-2d, wave-2d, poisson-2d, and poisson-3d, under other physical parameters or other meshes.

Additional experimental results comparing with classical preconditioners.A.4.4 ADDITIONAL COMPARISON WITH LEARNING-BASED PRECONDITIONER

Comparison between preconditioners for PCG. We report precompute time, total time (pcgicl. precompute time) for each precision level, and the corresponding PCG iterations (in parenthesis). The best value in each category is in bold. ↓: lower the better.

annex

Classical Methods PCG-IC is implemented using pytorch and is fully vectorized for optimized computation. This implementation follows Golub & Van Loan (2013) . PCG-Jacobi is implemented using pytorch and fully vectorized. The Jacobi or diagonal preconditioner is described in Axelsson (1996) .

A.2.2 OUR METHOD

Technical Details and Justification We follow the diagonal decomposition LDL ⊤ as a way of lower triangular decomposition for the original system K. It is easy to see that this diagonal decomposition is equivalent to lower triangular decomposition.The diagonal decomposition LDL ⊤ has several advantages, similar to lower triangular decomposition LL ⊤ , it is easy to invert, and guarantees symmetry. Additionally, we enforce the diagonal element D to be the diagonal elements of original system K. This way, we enforce the value and gradient range for the lower triangular matrix L θ ′ to ensure positive definiteness of the learned decomposition.Implementation Details We follow the same Encoder-Processor-Decoder architecture implementation for MeshGraphNet. A 2 layer MLP encoder and a 2 layer MLP decoder are used for the heat-2d, poisson-2d, and poisson-3d equation. The Processor Module is composed of 5 iterations of message passing layer. For the Poisson 3D equation, we use a 2 layer encoder, a 2 layer Decoder, and 3 iterations of message passing for the processor module. All MLPs used are of hidden dimension size of 16.

A.3 TRAINING SPECIFICATION

We use a workstation equipped with 64-core AMD 3995W CPU and a NVIDIA RTX A6000 GPU for all our experiments.We adopt end-to-end training scheme with the loss functions described in section 3. We train our method for the heat-2d equation environment for 6 hours. The wave-2d equation is trained for 6 hours. The poisson-2d equation is trained for 6 hours. The poisson-3d Equation is trained for 8 hours.We train the NN baselinebaseline for 3 days, 72 hours, for experiments wave-2d, poisson-2d, and heat-2d.

A.4.1 TIME COMPARISON WITH END-TO-END NEURAL NETWORK

We first report in Table 6 the time cost of network methods and that of our method achieving the same residual error. Specifically, for each environment, we first run MGN on the test set. We then record the wall-clock time and the residual error from each pair of (K, c). Next, we use the residual error as the convergence threshold in NeuralPCG and record the wall-clock time cost. It is unsurprising from the table to see that MGN is substantially faster than our approach: For a given pair of K and c, MGN only need to run network inference once, while we need to run network inference to compute the preconditioner followed by running PCG solvers. We notice that our method usually converge to the MGN residual error in one iteration.

A.4.2 CONDITION NUMER

We show the condition number comparison in Table 7 .

