IMPLICIT NEURAL SPATIAL REPRESENTATIONS FOR TIME-DEPENDENT PDES

Abstract

Numerically solving partial differential equations (PDEs) often entails spatial and temporal discretizations. Traditional methods (e.g., finite difference, finite element, smoothed-particle hydrodynamics) frequently adopt explicit spatial discretizations, such as grids, meshes, and point clouds, where each degree-offreedom corresponds to a location in space. While these explicit spatial correspondences are intuitive to model and understand, these representations are not necessarily optimal for accuracy, memory-usage, or adaptivity. In this work, we explore implicit neural representation as an alternative spatial discretization, where spatial information is implicitly stored in the neural network weights. With implicit neural spatial representation, PDE-constrained time-stepping translates into updating neural network weights, which naturally integrates with commonly adopted optimization time integrators. Our approach requires neither training data nor training/testing separation. Our method is the solver itself, just like the classical PDE solver. We validate our approach on a variety of classic PDEs with examples involving large elastic deformations, turbulent fluids, and multi-scale phenomena. While slower to compute than traditional representations, our approach exhibits higher accuracy, lower memory consumption, and dynamically adaptive allocation of degrees of freedom without complex remeshing.

1. INTRODUCTION

Many science and engineering problems can be formulated as spatiotemporal partial differential equations (PDEs), F (f , ∇f , ∇ 2 f , . . . , ḟ , f , . . .) = 0, f (x, t) : Ω × T → R d . ( ) where Ω ∈ R m and T ∈ R are the spatial and temporal domains, respectively. Examples include the inviscid Navier-Stokes equations for fluid dynamics and the elastodynamics equation for solid mechanics. To numerically solve these PDEs, we oftentimes introduce temporal discretizations, {t n } T n=0 , where T is the number of temporal discretization samples and ∆t = t n+1 -t n is the time step size. The solution to Equation (1) then becomes a list of spatially dependent vector fields: {f n (x)} T n=0 . Traditional approaches represent these spatially dependent vector fields using grids, meshes, or point clouds. For example, the grid-based linear finite element method (Hughes, 2012) defines a shape function N i on each grid node and represents the spatially dependent vector field as f n (x) = P i=1 f n i N i , where P is the number of spatial samples. While widely adopted in scientific computing applications, these traditional spatial representations are not without drawbacks: 1. Spatial discretization errors abound in fluid simulations as artificial numerical diffusion (Lantz, 1971) , dissipation (Fedkiw et al., 2001) , and viscosity (Roache, 1998) . These errors also appear in solid simulations as inaccurate collision resolution (Müller et al., 2015) and numerical fractures (Sadeghirad et al., 2011) . 2. Memory usage spikes with the number of spatial samples P (Museth, 2013) . Ours Grid PINN 0.5 0.5 0.5 Figure 1: 1D advection example: A Gaussian-shaped wave initially centered at x = -1.5 moves rightward with a constant velocity of 0.25. From left to right, we show mean absolute error plot over time and solutions at t = 0s, t = 3s and t = 12s, respectively. The solution from grid-based finite difference method (green) tends to diffuse over time. PINN (yellow) , trained within temporal range 0 ∼ 3s, fails to generalize for t = 12s. Our solution (blue) approximates the ground truth (grey) the best over time. All three representations have the same memory footprint: our approach and PINN (Raissi et al., 2019) both use α = 2 hidden layers of width β = 20, and the finite difference grid resolution is 901. 3. Adaptive meshing (Narain et al., 2012) and data structures (Setaluri et al., 2014) can reduce memory footprints but are often computationally expensive and challenging to implement. We alleviate these limitations by exploring implicit neural representation (Park et al., 2019; Chen & Zhang, 2019; Mescheder et al., 2019) as an alternative spatial representation for PDE solvers. Unlike traditional representations that explicitly discretize the spatial vector via spatial primitives (e.g., points), neural spatial representations implicitly encode the field through neural network weights. In other words, the field is parameterized by a neural network (typically multilayer perceptrons), i.e., f n (x) = f θ n (x) with θ n being the network weights. As such, the memory usage for storing the spatial field is independent of the number of spatial samples, but rather it is determined by the number of neural network weights. We show that under the same memory constraint, implicit neural representations indeed achieve higher accuracies than traditional discrete representations. Furthermore, implicit neural representations are adaptive by construction (Xie et al., 2021) , allocating the network weights to resolve field details at any spatial location without changing the network architecture. Viewed from the lens of optimization-based time integrators, our PDE solver seeks neural network weights that optimize an incremental potential over time (Kane et al., 2000b) . Our solver does not employ the so-called training/testing split commonly appearing in many neural-network-based PDE approaches (Sanchez-Gonzalez et al., 2020; Li et al., 2020b) . Our approach is the solver itself and does not require training in the machine learning sense. As such, we avoid using the word "training" in the exposition but rather use "optimizing". We employ exactly the same "optimization" integrator formulation as the classical solvers (e.g., finite element method (Bouaziz et al., 2014) ). We compare the proposed solver to grid, mesh, and point cloud representations on time-dependent PDEs from various disciplines, and find that our approach trades wall-clock runtime in favor of three benefits: lower discretization error, lower memory usage, and built-in adaptivity.

2. RELATED WORKS

Many prior works have explored representing continuous vector fields with neural networks. Here we highlight two lines of work: implicit neural representation and physics informed neural network. Implicit Neural Representation uses neural networks to parameterize spatially-dependent functions. It has successfully captured the radiance fields (Mildenhall et al., 2020) and the signed distance fields (Park et al., 2019) in computer vision and graphics settings. It has also captured the solutions of strictly spatially dependent PDEs from elastostatics (Zehnder et al., 2021) , elliptic PDEs (Chiaramonte et al., 2013) , and geometry processing (Yang et al., 2021) . has been shown to excellently model forward simulation (Shin et al., 2020; Hennigh et al., 2021; Lu et al., 2021a; Krishnapriyan et al., 2021) , inverse design (Raissi et al., 2020; Mao et al., 2020; Mishra & Molinaro, 2022) , optimal control (Mowlavi & Nabi, 2021) , and uncertain quantification (Lye et al., 2020) . PINN has found success in a wide range of application domains, including turbulence (Hennigh et al., 2021) , elasticity (Rao et al., 2020 ), acoustics (Sitzmann et al., 2020) , and topology optimization (Zehnder et al., 2021) . Due to its mesh-free nature, PINN can robustly handle high-dimensional PDEs. The recent review by Karniadakis et al. ( 2021) offers more details. When it comes to using neural networks to represent vector fields, we face an important design choice as which dimension of the vector field is represented through the network. In the case of spatiotemporal vector fields, we have the choice of representing both the spatial and temporal dimensions via neural networks; we can also represent just the spatial dimension or just the temporal dimension with a network. We observe that the spatial variable x is oftentimes bounded, e.g., a fixed geometry with well-defined boundaries. However, the temporal variable t can be unbounded, e.g., in a virtual reality application where the user interacts with a physical environment indefinitely (Sun et al., 2018) . Modeling the additional temporal dimension also puts extra burden on the network. Motivated by these observations, we opt to treat the spatial and the temporal dimensions differently. In particular, we use the neural network strictly as a spatial representation and do not consider the temporal dimension as an input to the network. We then evolve this spatial representation by updating the network weights θ n (See Figure 2 ), potentially for an indefinite amount of time. Such an approach is different from standard PINN that takes both the spatial dimension x and the temporal dimension t as an input to the network (Raissi et al., 2019; Karniadakis et al., 2021) which cannot resolve PDE solution outside a pre-defined temporal range (Kim et al., 2021) (See Figure 1 ). Du & Zaki (2021); Bruna et al. (2022) ; Krishnapriyan et al. (2021) also explore evolution of neural network weights over time, with the goal of resolving PINN's limited time range as well as solving high-dimensional problems that classical solvers often suffer. Our work differs from these works by focusing on low-dimensional settings (1D-3D) that heavily rely on classical solvers (e.g., finite element method). Our primary goal is to understand if we only replace classical solver's spatial representation with a neural network, while keeping the rest unchanged (e.g., time integrator, boundary condition), what tradeoffs do we get? Optimization Time Integrators. Since our approach only replaces the spatial representations of traditional numerical solvers with neural networks while keeping the rest of the solver intact, it is compatible with any classical time integration schemes (e.g., implicit Euler). In particular, we formulate time integration as an energy minimization problem (Radovitzky & Ortiz, 1999; Kane et al., 2000b; Marsden & West, 2001; Kharevych et al., 2006) . These integrators find wide applications in PDE solvers on traditional representations, such as grids (Batty et al., 2007) , tetrahedral meshes (Bouaziz et al., 2014), and point clouds (Gast et al., 2015) . In the case of neural spatial representations, time integration translates into optimizing the neural network weights at every time step. Machine Learning for PDEs is an emerging field with exciting techniques, such as graph neural network (Sanchez- Gonzalez et al., 2020) , neural operator (Li et al., 2020b; c) , and DeepONet (Lu et al., 2019) . These techniques usually train on a dataset and are then validated on a test dataset. However, due to the machine learning nature, these methods's time-stepping schemes neither enforce PDE constraints at test time (Pfaff et al., 2020) nor generalize to scenarios (e.g., initial conditions, boundary conditions) drastically different from the training cases (Wang & Perdikaris, 2021) . As a major point of departure, our approach does not employ any training data. There is not a so-called training/inference separation in our approach. Our method is the solver itself, just like the classical solvers (e.g., FEM). As such, we enjoy classical solver's unparalleled generalizability and explicit PDE constraints. See Table 1 for a comparision of these techniques. Relatedly, Wandel et al. (2020) also proposes a data-free approach but still employs a training / testing split. 3 METHOD: TIME-STEPPING ON NEURAL SPATIAL REPRESENTATIONS Our goal is to solve time-dependent PDEs on neural-network-based spatial representations. In Section 3.1, we first discuss representing spatial vector fields with neural networks. Afterward, we will describe our time-stepping technique that evolves from one neural spatial representation to another.

3.1. NEURAL NETWORKS AS SPATIAL REPRESENTATIONS

We parameterize each of the time-discretized spatial vector fields with a neural network: f n = f θ n , where θ n are the neural network weights at time t n . Specifically, the field quantity at an arbitrary spatial location x ∈ Ω can be queried via network inference f θ n (x). Traditional representations explicitly discretize the spatial vector field using primitives such as points, tetrahedra, or voxels. These primitives explicitly correspond to spatial locations due to their compactly supported basis functions (Hughes, 2012) . By contrast, neural spatial representations implicitly encode the vector field via neural network weights. These weights do not directly correspond to specific spatial locations. Instead, each weight affects the vector field globally. Such global support is also an attribute of spectral methods (Canuto et al., 2007a; b) . Compared to spectral methods, our approach does not need to know the required complexity ahead of time in order to determine the ideal basis functions (Xie et al., 2021) . Our neural representation automatically optimizes its parameters to where field detail is present. Whereas memory consumption of traditional explicit representations scales poorly with the number of spatial samples, memory consumption for implicit neural representations is independent of the number of spatial samples (Xie et al., 2021) . Rather, memory use is determined by the number of neural network weights. Network Architecture Following the implicit neural representation literature, we adopt a multilayer perceptron (MLP) architecture with SIREN activation function for its accuracy and quick convergence speed advantages (Sitzmann et al., 2020) . Each MLP has a total of α hidden layers, each layer of width β. The specific choice of these hyper-parameters will be described in Section 4. Spatial Gradients Traditional spatial representations (e.g., the finite element method) compute spatial gradients via basis functions. Higher-order gradients require higher-order basis functions. By contrast, a neural spatial representation is C ∞ by construction. We evaluate their gradients via computation-graph-based auto-differentiation with respect to the input (not the weights).

3.2. TEMPORAL EVOLUTION

Given previous-time spatial vector fields {f n (x)} n k=0 , optimization-based time integrators compute the next time-step (t n+1 ) vector field by optimizing f n+1 = argmin f n+1 x∈M⊂Ω I(∆t, {f k } n+1 k=0 , {∇f k } n+1 k=0 , {∇ 2 f k } n+1 k=0 , . . .) . Traditional time integrators, whether explicit and implicit, can be expressed in optimization forms (Kharevych et al., 2006) . Furthermore, this optimization formulation applies to any spatial representation, and has been explored thoroughly for traditional discretizations (Batty et al., 2007; Bouaziz et al., 2014; Gast et al., 2015) , which is defined over a finite number of the spatial integration samples M := {x j ∈ Ω | 1 ≤ j ≤ |M|}, e.g., grids or meshes. Applying this formulation to a neural spatial representation, we optimize for where {θ k } n k=0 are the (fixed, not variable) neural network weights from previous time steps. Figure 2 illustrates our time integration process. The particular choice of the objective function I depends on the PDE of interest. In all the examples presented in this work, we solve this time-integration optimization problem via Adam (Kingma & Ba, 2014), a firstorder stochastic gradient descent method. θ n+1 = argmin θ n+1 x∈M⊂Ω I(∆t, {f θ k } n+1 k=0 , {∇f θ k } n+1 k=0 , {∇ 2 f θ k } n+1 k=0 , . . .) Spatial Sampling Explicit spatial representations (e.g., tetrahedra mesh) are often tied to a particular spatial sampling; remeshing is sometimes possible, but can also have drawbacks, especially in higher dimensions (Alliez et al., 2002; Narain et al., 2012) . By contrast, implicit spatial representations allow for arbitrary spatial sampling by construction (Equation ( 3)). Following Sitzmann et al. ( 2020), we dynamically sample M during optimization. For every gradient descent iteration in every time step, we use a stochastic sample set M from the spatial domain Ω; M corresponds to the "mini-batch" in stochastic gradient descent, with batch size |M|. By directly drawing samples from the entire spatial domain Ω, our approach is reminiscent of mesh-free Monte Carlo methods (Sawhney & Crane, 2020) . Boundary Condition PDEs are typically accompanied by spatial (e.g., Dirichlet or Neumann) boundary conditions, which we formulate as additional penalty terms in the objective Equation (3), θ n+1 = argmin θ n+1 x∈M⊂Ω I(∆t, {f θ k } n+1 k=0 , {∇f θ k } n+1 k=0 , {∇ 2 f θ k } n+1 k=0 , . . .) +λ x b ∈M b ⊂∂Ω C(f θ n+1 , ∇f θ n+1 , ∇ 2 f θ n+1 , . . .) , where λ is the weighting factor and ∂Ω is the boundary of the spatial domain. The particular choice of the boundary constraint function C depends on the problem of interest.

Initial Condition

The neural network is initialized using the given initial condition, i.e., the field value at time t = 0, by optimizing θ 0 = argmin θ 0 x∈M⊂Ω ||f θ 0 (x) -f 0 (x)|| 2 2 , where f 0 is the given initial condition. Similar to Equation (3), we solve this optimization problem using Adam (Kingma & Ba, 2014) and stochastically sample M at each gradient descent iteration.

4. EXPERIMENTS

In this section, we evaluate our method on three classic time-dependent PDEs: the advection equation, the inviscid Navier-Stokes equation and, the elastodynamics equation. For each problem, we first discuss the continuous PDE and the specific objective function I for temporal evolution (recall Equation ( 3)). Then we demonstrate the advantages of our approach by comparing with baselines using discrete spatial representations (i.e. a grid, tetrahedral mesh, or point cloud). We refer readers to Appendices B and C for other implementation details (e.g., initial and boundary conditions) and additional results. The temporal evolutions of the PDEs are best illustrated by the supplementary video.  + (a • ∇)u = 0 , ( ) where a is the advection velocity, and the vector field of interest is the advected quantity f = u. It is well known that traditional spatial representations, such as grid-based finite differences, exhibit numerical dissipation for the advection equation (Courant et al., 1952; Selle et al., 2008) .

Time Integration

We adopt the same time integration scheme in both the traditional representation and ours. Choosing the energy-preserving midpoint method (Mullen et al., 2009) yields the time integration operator I = ∥ u n+1 (x) -u n (x) ∆t + (a • ∇)( u n+1 (x) + u n (x) 2 )∥ 2 2 . Results Figure 1 compares our results with those of grid-based finite differences and PINN (Raissi et al., 2019) , subject to equal memory usage of the three methods. A Gaussian-shaped wave moves with constant velocity a = 0.25. Our approach uses α = 2 hidden layers of width β = 20, and the finite difference grid resolution is 901. We set PINN to use the same network architecture (with SIREN activation) as ours. For ours and grid-based methods, we set ∆t = 0.05. PINN does not require ∆t but needs a pre-specified temporal range for training. For this temporal range, we use [0, 3]. As shown in Figure 1 , the solution from the grid-based method diffuses over time due to its spatial discretization. While PINN can accurately capture the result up to t = 3s, it fails to produce meaningful solutions beyond its trained temporal range (see t = 12s) (Kim et al., 2021) . By contrast, our solution does not suffer from numerical dissipation and agrees well with the ground truth at all frames (see the error plot in Figure 1 ).

4.2. INVISCID NAVIER-STOKES EQUATIONS

In the incompressible and inviscid Navier-Stokes Equations ρ f ( ∂u ∂t + u • ∇u) = -∇p + ρ f g, ∇ • u = 0, the vector field of interest is the fluid velocity field f = u; p is the pressure, g is the external force, and ρ f is the fluid density. In our experiments, we consider ρ f = 1 and g = 0. The pressure field p is represented with another MLP network.

Time Integration

We apply the Chorin-style operator splitting scheme (Chorin, 1968; Stam, 1999) to both the neural spatial and finite-difference grid representations. The scheme involves three sequential steps: advection (adv), pressure projection (pro), and velocity correction (cor). Advection uses a semi-Lagrangian method, encoded by the operator (Staniforth & Côté, 1991 ) I adv = ∥u adv n+1 (x) -u n (x backtrack )∥ 2 2 , whose optimization yields the advected velocity u adv n+1 . The backtracked location is given by x backtrack = x -∆tu n (x). While traditional spatial representations compute the backtracked velocity using interpolation (e.g., linear basis function), our approach requires no interpolation, only direct evaluation via network inference at the location x backtrack . Pressure projection is encapsulated by the operator I pro = ∥∇ 2 p n+1 (x) -∇ • u adv n+1 (x)∥ 2 2 . ( ) Plugging I pro into the optimization solver, we obtain the pressure p n+1 that enforces incompressibility. Note that the MLP that represents the velocity field u adv is kept fixed in this step. Velocity correction is formulated by the operator I cor = ∥u n+1 -(u adv n+1 (x) -∇p n+1 (x))∥ 2 2 , which adds the pressure gradient to the advected velocity yielding the incompressible velocity u n+1 .

Results

We first test our method on the 2D Taylor-Green vortex with zero viscosity (Taylor & Green, 1937; Brachet et al., 1983) . The closed-form analytical solution is given by: u(x, t) = (sin x cos y, -cos x sin y) for x ∈ [0, 2π] × [0, 2π]. To compare under the same memory usage (for storing the velocity field), we use α = 3 hidden layers of width β = 32 for our MLP and set grid resolution to 48 for the grid-based projection method. We set ∆t = 0.05 and execute both methods for 100 timesteps. In Figure 3 , we show the mean squared error of the solved velocity field over time. This example demonstrates that our method excellently preserves a stationary solution. Compared to the grid-based method, our method has less diffusion and achieves higher accuracy. For discrete grid representation, efficiently capturing multi-scale details usually requires difficultto-implement adaptive data structures (Setaluri et al., 2014) . Instead, implicit neural representations are adaptive by construction (Xie et al., 2021) and enable us to capture more details under the same memory storage. We setup an example where the initial velocity field is composed by two Taylor-Green vortices of different scales (see Figure 8 for illustration). We compare our approach with PINN and the grid-based projection method under the same memory constraint for storing the spatial representations. Specifically, our approach and PINN uses a MLP with α = 3 hidden layers of width β = 32 and the grid-based projection method uses resolution 48. We execute our approach and the grid-based method for 50 timesteps with ∆t = 0.05, and train PINN with the same temporal range of 2.5 seconds. Using the solved velocity field, we advect a density field to visualize the amount of fine details captured by different representations. As shown in Figure 4 , we are able to capture the fine details of the smaller vortex and best approximate the reference solution. The grid-based method (resolution 48) suffers from severe dissipation and fails to capture the vorticity. PINN is unable to correctly capture this two-vortices field and we found its training loss remains high (∼ 1e -3 ) after convergence. This is in agreement with previous findings (Chuang & Barba, 2022) that suggest PINN approaches have difficulty solving inviscid Navier-Stokes equations for non-trivial examples involving turbulence.

4.3. ELASTODYNAMICS EQUATION

In the third experiment, we study the Elastodynamics equations ρ 0 φ = ∇ • P (F ) + ρ 0 b (12) that describe the motions of deformable solids (Gonzalez & Stuart, 2008) . The vector field of interest is the deformation map f = ϕ. Here ρ 0 is the density in the reference space, P is the first Piola-Kirchhoff stress, F = ∇ϕ is the deformation gradient, φ and φ are the velocity and acceleration, and b is the body force. We assume a hyper-elasticity constitutive law, i.e., P = ∂Ψ ∂F , where Ψ is the energy density function. In particular, we assume a variant of the stable Neo-Hookean energy (Smith et al., 2018 ) Ψ = λ 2 tr 2 (Σ -I) + µ(det(F ) -1) 2 , ( ) where λ and µ are the first and second lame parameters, Σ are the singular values of the deformation gradient F , and det(F ) is the determinant of the deformation gradient F . When µ = 0, the elastic energy recovers the As-Rigid-As-Possible energy (Sorkine & Alexa, 2007) . Our method handles large deformations matching the mesh-based finite element method (FEM), while the point cloud based material point method (MPM) suffers from incorrect numerical fracture. Time Integration We apply the implicit Euler time integration scheme (Gast et al., 2015; Kane et al., 2000a) to the (1) tetrahedral finite element, (2) material point method, and (3) our neural representation, using the operator I = 1 2 ρ 0 ( φn+1 - φn ) T ( φn+1 - φn ) kinematic energy + Ψ(ϕ n+1 ) elastic energy -ρ 0 b T ϕ n+1 external force potential , ( ) where φn+1 = (ϕ n+1 -ϕ n )/∆t, ρ 0 is the density, b is the external force. We can also incorporate boundary conditions, e.g., positional and contact constraints, by introducing additional energy terms (Bouaziz et al., 2014; Li et al., 2020a ) (see Appendix B.4).

Results

We first compare our implicit neural representation to the traditional tetraheral mesh representation (Finite Element Method, FEM (Hughes, 2012; Reddy, 2019) ) and the point cloud rep- resentation (Material Point Method, MPM (Sulsky et al., 1995; Jiang et al., 2016) ). We use α = 3 hidden layers of width β = 68 for our MLP, which takes the same memory as the FEM mesh (0.8K vertices, 1.5K faces) and MPM point cloud (1.7K points). As shown in Figure 5 and Figure 11 , our method is capable of handling the large elastic deformations and matches the result of the traditional mesh-based method (FEM), while the point-cloud-based method (MPM) suffers from incorrect numerical fracture due to its meshless nature. To avoid these fractures, meshless methods require sophisticated modifications of the underlying kernel and basis functions (Gray et al., 2001; Su et al., 2022) . By virtue of its implicit nature, our representation is able to represent more intricate details compared to the traditional explicit representations under the same memory usage. In Figure 6 , we show that our implicit neural representation allows the deformed square to gracefully fit the boundary of the sphere during the non-trivial collision. In contrast, the traditional mesh-based representation struggles to produce smooth result due to its insufficient mesh resolution. To alleviate such artifacts, the traditional mesh-based representation either needs to increase resolutions thusly inducing higher memory cost or conducts complex remeshing (Narain et al., 2012) . In Figure 7 , our implicit neural representation allows for more complex dynamics and fine geometry details compared to the traditional tetrahedral mesh representation. Note that we adopt the same collision detection and handling strategy for both the neural representation and the mesh-based representation (FEM). Specifically, we use a spring-like penalty force and the corresponding energy to move the collided point out of its collision surface, similar to (McAdams et al., 2011; Xian et al., 2019) . Since our approach and FEM share the same time integration scheme and the same collision handling method, the difference reported in Figure 6 and Figure 7 strictly stems from the underlying spatial representations. These advantages extend to other complex 3D simulations. Figure 2 and Figure 13 depict a cow and statue deforming as they collide with the ground, exhibiting complex geometry, and rich contactinduced deformations.

5. DISCUSSION AND CONCLUSION

In this work, we explore implicit neural representations as spatial representations for numerically modeling time-dependent PDEs. This representation naturally integrates with widely adopted optimization-based time integrators. PDE solvers with neural spatial representation offers improved accuracy, reduced memory, and automatic adaptivity compared to traditional explicit representations such as a mesh, grid, or point cloud. While offering important benefits, neural-spatial-representation-based PDE time-stepping requires longer wall-clock computation time than existing methods (see also Table 1 by Zehnder et al. (2021) and Section 7 by Yang et al. (2021) ). Optimizing neural networks weights takes longer than optimizing grid values even if there are fewer number of neural network weights than the number of grid nodes. For instance, for the bunny example (Figure 7 ), our neural network optimization takes around 30 minutes per timestep while the corresponding FEM simulation takes less than 1 minute. Future work therefore lies in exploring advanced training techniques that reduce training time (Liu et al., 2020; Martel et al., 2021; Takikawa et al., 2021) . In particular, Müller et al. (2022) offer a promising direction where they show that we can reduce implicit neural representation training time from hours to seconds via advanced data structures and optimized implementation. Our work demonstrates the effectiveness of neural spatial representations in solving time-dependent PDEs and observes empirical convergence under refinement (see Figure 12 ). Future work should consider theoretical analysis (Mishra & Molinaro, 2022) on convergence and stability. More challenging physical phenomena, such as turbulence (Wilcox et al., 1998) , intricate contacts (Johnson & Johnson, 1987) , and thin shells (Pfaff et al., 2020) , are also important future directions. Currently, our work enforces "soft" boundary conditions. Enforcing "hard" boundary conditions on a neural architecture is another exciting direction (Lu et al., 2021b) .

A COMPARISON OF DIFFERENT PDE SOLVERS

In the table below, we compare other ML-PDE solvers including MeshGraphNet (Pfaff et al. (2020) ), GraphNetworkSim (Sanchez-Gonzalez et al. ( 2020)), DeepOnet (Lu et al. (2019) ), Fourier Neural Operator (Li et al. (2020b) ). We solve our time-integration optimization problem (Equation ( 3)) with the Adam optimizer (Kingma & Ba, 2014) . For all examples in our experiments, we set an initial learning rate lr 0 and reduce it by a factor of 0.1 if the loss value does not decrease for iter p iterations. We stop the optimization process when the learning rate is smaller than lr min or until it reaches a maximum of iter max iterations. Specific values of these hyper-parameters are described for each example below. We implement our method using PyTorch library and performed our experiments on an NVIDIA GeForce RTX 3090 GPU. 1 , the 1D spatial domain is Ω = [-2, 2] . We consider the Dirichlet boundary condition, i.e., the advected quantity at boundaries equals zero. Hence we set the boundary constraint term in Equation ( 4) as C = ||u n+1 (x)|| 2 2 , with the weighting factor λ = 1. The initial condition for this example is 19)). Right: density field (Equation ( 20)). û0 (x) = e- (x -µ) 2 2σ 2 , ( ) with µ = -1.5 and σ = 0.1. We set the optimization hyper-parameters lr 0 = 1e-4, lr min = 1e-8, iter p = 500 and iter max = 20000. For each gradient descent iteration, we randomly sample |M| = 5000 points within the spatial domain [-2, 2] . For this example, our method takes ∼ 80s to compute per timestep, while the grid-based method (using the same memory) takes ∼ 4e-3s.

B.3 INVISCID NAVIER-STOKES EQUATIONS

For our 2D fluid examples, the spatial domain is Ω = [-1, 1] × [-1, 1]. We consider solid boundary conditions, i.e., the fluid cannot go through the boundaries. Recall that we adopt the operator splitting scheme. Therefore, the boundary constraint terms for the three sequential steps are C adv = ||u n+1 adv⊥ (x)|| 2 2 C pro = ||∇ ⊥ p n+1 (x)|| 2 2 C cor = ||u n+1 ⊥ (x)|| 2 2 (17) where ⊥ indicates the perpendicular direction against the boundary. The weighting factor λ = 1. 2D Taylor-Green vortex Standard 2D Taylor-Green is originally defined in domain [0, 2π] × [0, 2π]. We translate and scale the domain to [-1, 1] × [-1, 1] such that the input range fits our MLP with the SIREN activation (Sitzmann et al., 2020) . Therefore, the initial condition for the velocity field becomes û0 (x) = ( 1 π sin[π(x + 1)] cos[π(y + 1)], - 1 π cos[π(x + 1)] sin[π[y + 1]]). After the simulation, we convert it back to domain [0, 2π] × [0, 2π] for evaluation and comparison. We set the optimization hyper-parameters lr 0 = 1e-5, lr min = 1e-8, iter p = 500 and iter max = 20000. The size of sample set |M| = 256 2 . For this example, our method takes ∼ 10min to compute per timestep, while the grid-based method (using the same memory) takes ∼ 0.03s. Two vortices of different scale For the example shown in Figure 4 , the initial condition for the velocity field is û0 (x) =    (sin[2π(x + 1)] cos[2π(y + 1)], -cos[2π(x + 1)] sin[2π(y + 1)]) x ∈ [-1, 0] 2 (sin[8π(x -7 4 )] cos[8π(y -7 4 )], -cos[8π(x -7 4 )] sin[8π(y -7 4 )]) x ∈ [ 7 4 , 1] 2 (0, 0) otherwise. ( ) The density field that we advect is initialized as  d0 (x) = 1 ||2x + 1|| ≤ 0.5 or ||8x + 7|| ≤ 0.5 0 otherwise. ( (x) = (0, 0) (2D), φ0 (x) = (0, 0, 0) (3D) The boundary constraint for elasticity examples involves positional constraints or collision constraints. Positional constraints, or Dirichlet boundary conditions, can be realized by defining the position of the constraint set ∂Ω as the desired goal positions ϕ ∂Ω : I pos = ∥ϕ n+1 ∂Ω -ϕ ∂Ω ∥ 2 2 . ( ) Collision constraints can be handled by adding unilateral constraints dynamically and viewing the collision penalty force as external force. Specifically, for a colliding point q c , we first find the closest surface point b c with normal n c , and define our spring-like collision penalty force as: f col = k col ((b c -q c ) ⊤ n c )n c . ( ) where k col is the ratio for the collision penalty force. The corresponding collision energy can be defined as the work exerted by the collision force: I col = ρ 0 f T col ϕ n+1 . ( ) Experiment Setup For all the 2D comparison under the same memory usage, we use α = 3 hidden layers of width β = 68 with SIREN activation function (Sitzmann et al., 2020) for our MLP, which takes the same memory (57 KB) as the FEM mesh (0.8K vertices, 1.5K faces) and MPM point cloud (1.7K points) in use. We initialize the 2D deformation field of the network to be zero Figure 12 : Sampling convergence test. Quasi-static elasticity simulation using different number of samples. We trained the implicit neural representation using different number of samples and visualized the result using samples |M| = 50 3 (left). We further compute the error with respect to |M| = 50 3 when using different number of samples (right). As we increase the number of the training samples in use, the deformation field converges to the result trained on the highest resolution. using |M| = 1000 2 uniform and random samples. Then we train the network using |M| = 100 2 uniform and random samples at each training iteration. We use Bartels (Levin, 2020) and Taichi (Hu et al., 2019) to perform the FEM and MPM simulation, respectively. We run our FEM and MPM comparison on CPU using a MacBook Pro with Apple M2 processor and 24GB of RAM. For the 3D comparison under the same memory usage, for the bunny example (Figure 7 ), we use α = 3 hidden layers of width β = 66 with SIREN activation function for our MLP, which takes the same memory (53 KB) as the FEM mesh (0.5K vertices, 1.5K tetrahedra) in use. For the statue example (Figure 13 ), we use α = 3 hidden layers of width β = 128 with SIREN activation function for our MLP, which takes the same memory (197 KB) as the FEM mesh (2.0K vertices, 7.0K tetrahedra) in use. We initialize the 3D deformation field of the network to be zero using |M| = 100 3 uniform and random samples. Then we train the network using |M| = 20 3 uniform and random samples at each training iteration. Here for simplicity we use the mesh vertices as the uniform samples. We further report all the parameters and experiment setup in Table 2 . In addition, we set the hyper-parameters iter p = 800 and lr min = 1e-8 for all elasticity examples. For sampling of the shapes involving nonregular geometry, for simplicity we choose to use a triangle or tetrahedral mesh and perform sampling within it. An ideal alternative would be adopting the implicit representation of the surface and performing rejection sampling based on it. For rendering, we simply sample sufficient number of points from the undeformed shape and evaluate the trained model at time t on the sample positions to predict their deformation. Here we only sample the surface of the shape in 3D cases. Then we render the shape as a dense point cloud.

C.1 ELASTODYNAMICS EQUATION

We validate the physical plausibility of our method using a small 2D patch test (Figure 9 ) and a 3D twisting example (Figure 10 ). We show that our implicit neural representation can exhibit volumepreserving property under both stretching, compression and twisting.



Figure 2: Time integration. We represent the field of interest using a neural network f θ n , whose weights θ n are updated at each timestep via an optimization problem (Equation (3)). In this case, the spatial domain Ω is the interior of the initial object and the represented field f is the deformation map. The governing PDE is the elastodynamics equation (see Section 4.3).

Figure 3: 2D Taylor-Green vortex simulation. Left: mean squared error of the velocity field for 100 timesteps. Right: velocity magnitude of solutions from the ground truth, ours, and the grid-based method at timestep n = 100. Under the same memory usage (for storing the spatial representation), our solution has a significantly smaller error than the grid-based method. 4.1 ADVECTION EQUATION Consider the classic 1D advection equation, ∂u ∂t + (a • ∇)u = 0 , (6)

Figure4: Two vortices of different scales. We show the advected density field after 2.5 seconds from the reference (top-left), our method (top-right), the grid-based method of resolution 48 (bottom-left) and PINN (bottom-right). The reference is obtained by running the high-resolution grid-based method (we use resolution 1024). Our MLP (α = 3, β = 32) has the same memory footprint as grids of resolution 48. PINN uses the same MLP network as ours. Under the same memory constraint, our approach suffers from less dissipation, captures more vorticity, and best resembles the reference solution, whose grids takes ∼ 450× memory compared to our network. See Figure8for the initial condition of this example.

Figure 6: When an elastic square collides with a circle, the finite element mesh (top) conforms poorly at the interface compared to the neural representation (bottom), for equal memory usage.

Figure 8: Initial condition for the example in Figure 4. Left: velocity field (Equation (19)). Right: density field (Equation (20)).

Figure8visually illustrates the above initial conditions. After the simulation, we convert it back to domain [0, 2π] × [0, 2π] for evaluation and comparison. We set the optimization hyper-parameters lr 0 = 1e-5, lr min = 1e-8, iter p = 500 and iter max = 20000. The size of sample set |M| = 128 2 . For this example, our method takes ∼ 10min to compute per timestep.

Figure11: Error of the elastic tension test. We visualize the L2 position error e of our result and the low-resolution FEM result (0.8K vertices) in Fig.5with respect to high-resolution FEM ground truth (3.2K vertices). Under the same memory footprint (for storing the spatial representation), our method (∥e∥ ∞ = 8.89e-2) is closer to the high-resolution ground truth than the mesh-based method (∥e∥ ∞ = 1.99e-1).

and point clouds, scale poorly with spatial resolutions. Adaptive discretizations can reduce memory but their generations are expensive. By contrast, neural representations are adaptive by construction and can use their representation capacities at arbitrary locations of interest without memory increases or data structures alterations. We refer to the recent review byXie et al. (2021) for additional contexts.

Comparison with other ML-PDE solvers.

annex

We demonstrate qualitative and quatitative convergence of our method when increasing the number of training samples in use. In Figure 12 , we compare the quasi-static stretching results visualized using |M| = 50 3 uniform samples when using different number of training samples (|M| = 5 3 , 10 3 , 20 3 , 30 3 , 40 3 , 50 3 ), and report the error with respect to the high-resolution trained result (trained and visualized both on |M| = 50 3 ).Finally, we provide an additional example for elasticity involving complex contact-induced deformations in Figure 13 . Our implicit neural representation is able to maintain more intricate geometry details compared to the traditional tetrahedral mesh representation under the same memory usage.

D QUANTITATIVE RESULTS

We present the quantitative results (error, runtime and memory usage) for each of our tested examples in separate tables. 

