CROM: CONTINUOUS REDUCED-ORDER MODELING OF PDES USING IMPLICIT NEURAL REPRESENTATIONS

Abstract

The long runtime of high-fidelity partial differential equation (PDE) solvers makes them unsuitable for time-critical applications. We propose to accelerate PDE solvers using reduced-order modeling (ROM). Whereas prior ROM approaches reduce the dimensionality of discretized vector fields, our continuous reduced-order modeling (CROM) approach builds a low-dimensional embedding of the continuous vector fields themselves, not their discretization. We represent this reduced manifold using continuously differentiable neural fields, which may train on any and all available numerical solutions of the continuous system, even when they are obtained using diverse methods or discretizations. We validate our approach on an extensive range of PDEs with training data from voxel grids, meshes, and point clouds. Compared to prior discretization-dependent ROM methods, such as linear subspace proper orthogonal decomposition (POD) and nonlinear manifold neuralnetwork-based autoencoders, CROM features higher accuracy, lower memory consumption, dynamically adaptive resolutions, and applicability to any discretization. For equal latent space dimension, CROM exhibits 79× and 49× better accuracy, and 39× and 132× smaller memory footprint, than POD and autoencoder methods, respectively. Experiments demonstrate 109× and 89× wall-clock speedups over unreduced models on CPUs and GPUs, respectively. Videos and codes are available on the project page: https://crom-pde.github.io.

1. INTRODUCTION

Many scientific and engineering models are posed as partial differential equations (PDEs) of the form F (f , ∇f , ∇ 2 f , . . . , ḟ , f , . . .) = 0, f (x, t) : Ω × T → R d , subject to initial and boundary conditions. Here f is a spatiotemporal dependent, multidimensional continuous vector field, such as temperature, velocity, or displacement; ∇ and ( •) are the spatial and temporal gradients; Ω ⊂ R m and T ⊂ R are the spatial and temporal domains, respectively. We may solve for f by discretizing in space, f (x, t) ≈ f P (x, t) = P i=1 a i (t)N i (x), transforming the continuous spatial representation to a (P • d)-dimensional vector whose coefficients a i (t) : T → R d and the corresponding basis functions N i (x) : Ω → R (e.g., polynomial basis, fourier basis) approximate the continuous solution. For instance, if N i is the linear finite element basis, the coefficients a i (t) = f (x i , t) are field values at spatial samples x i (Hughes, 2012). After introducing temporal samples {t n } T n=0 , we temporally evolve the solution by solving for P unknowns {a i (t n+1 )} given the previous state {a i (t n )}. Unfortunately, when P is large, processing and memory costs of these full-order solves become intractable. To alleviate this computational burden, prior model reduction techniques (Berkooz et al., 1993; Willcox & Peraire, 2002; Benner et al., 2015) construct a manifold-parameterization function g P : R r → R P d , with r ≪ P d, such that every low-dimensional latent space vector q(t) ∈ R r maps to a discrete field g P (q) → (a 1 , . . . , a P ) T . For instance, for linear finite elements (Barbič & James, 2005) , g P (q) → f (x 1 , t), . . . Figure 1 : Model reduction solves PDEs via temporal evolution of the low-dimensional latent space vector q(t). (a) Prior work assumes that the low-dimensional representation g P is built for the already-discretized vector field; (b) our approach constructs the manifold-parameterization function g directly for the continuous vector field itself. In this case, the vector field f represents the twisting material governed by the elastodynamics equation. as depicted in Figure 1a . ROM saves computation because it requires evolving only r ≪ P d latent space variables. 1Since existing ROM approaches apply to already-discretized fields, model training and PDE solving are tied to the dimension and discretization type of the training data, causing key limitations: Discretization dependence. If we alter the training simulation resolution (P ) or the discretization types (e.g., meshes to point clouds), we must also alter the architecture and numbers of parameters. Memory scaling. Memory footprint grows with discretization resolution P . Fixed discretization. We cannot dynamically adapt spatial resolution P , discretization type, or basis function N i during latent-space-PDE solves, e.g., dynamic remeshing (Peraire et al., 1987) . Altogether these problems arise because the architecture of g P (q) is tied to the discretization (a 1 , . . . , a P ) T . Introducing a discretization-independent architecture In an alternative point of departure, we train a manifold-parameterization function g(x, q) ≈ f (x, t) to approximate the continuous field itself, not its discretization (see Figure 1b ). Note that the domain and co-domain of g are continuous domains: they do not depend on the choice of discretization(s) used at any stage of the process, i.e., during preparation of training data, nor during latent-space-PDE solving. In this sense, the manifoldparameterization architecture is discretization independent. In our implementation, g is embodied as an implicit neural representation (Park et al., 2019; Chen & Zhang, 2019; Mescheder et al., 2019) , also known as a neural field, yielding a smooth and analytically-differentiable manifold-parameterization. This representation's memory footprint depends on the complexity of fields produced by the PDE, not the discretization resolution. After training, we evolve the latent variables, as governed by the PDE, for previously-unexplored parameters. Unlike approaches that discard the PDE after training, we evaluate the original PDE at a small number of domain points at every time integration step. We validate our approach on classic PDEs with discretized data from voxels, meshes, and point clouds. In comparison to the full-order model, our approach reduces the number of spatial degrees of freedom, memory, and computational cost. In comparison to prior linear and nonlinear discretization-dependent model reduction methods, our method exhibits higher accuracy and consumes less memory. To highlight another benefit of being discretization-agnostic, we demonstrate an elasticity simulation that readily adapts mesh resolution.

2. RELATED WORK

Reduced-Order Modeling for PDEs. Early works on identifying a low-dimensional latent space focused on linear methods (Berkooz et al., 1993; Holmes et al., 2012) , e.g., proper orthogonal decomposition (POD) or principal component analysis (PCA). Recent nonlinear manifolds (Fulton et al., 2019; Lee & Carlberg, 2020) , often constructed via autoencoder neural networks, have been shown to significantly outperform their linear counterparts on slowly decaying Kolmogorov n-width problems (Peherstorfer, 2022) . Most of these prior works exclusively focus on building a latent space for the already-dicretized vector fields. Chen et al. (2023) ; Pan et al. (2023) and our work attempt to construct the latent space for the continuous vectors themselves. However, Chen et al. (2023) 's treatment specializes in the material point method (discretization) and elasticity (PDE); our general treatment is both discretization and PDE agnostic. Likewise, Pan et al. (2023) train a discretization-agnostic latent space for PDE data; we further solve the PDEs in the reduced space via rapid latent space traversal. Specifically, we evolve the latent space in a PDE-constraint manner while Yin et al. (2023) take a data-driven approach. Additional references to ROM are listed in Appendix A. Implicit neural representations use (fully-connected) neural networks to represent arbitrary vector fields. While the Euclidean space spatial coordinates always form part of the input to the network, it is also common to have a latent space vector to complement the rest of the input. Different latent space vectors correspond to different states of the continuous vector field (see Figure 1b ), e.g. different geometries (Park et al., 2019; Chen & Zhang, 2019; Mescheder et al., 2019) or different radiance fields (Mildenhall et al., 2020) . A key contribution of our work is nonlinearly traversing the latent space of neural representations under an explicit PDE constraint. Machine learning (ML) for PDEs. Physics-informed neural networks (PINNs) (Raissi et al., 2019; Sitzmann et al., 2020b) demonstrate that PDEs can be accurately solved via neural representations. Notably, PINN enables prediction and discovery from incomplete models and incomplete data (Karniadakis et al., 2021) . However, the degrees of freedom involved in their approaches are still P d, and the underlying gradient-descent-based solver is often computationally more expensive than traditional solvers (see Table 1 by Zehnder et al. (2021)) . By contrast, our goal is building a computationally more efficient solution that solves for only r degrees of freedom (r ≪ P d). In fact, our approach can be viewed as an extension of PINN for model reduction. Setting the latent space vector q(t) in our formulation (see Figure 1b ) to the time variable t recovers the exact formulation of PINN. In addition to PINN, Sanchez-Gonzalez et al. (2020) show graph neural network (GNN) architectures are also capable of learning PDEs. Yet, like PINNs, GNNs also do not offer dimension reduction.

3. METHOD: OVERVIEW AND MANIFOLD CONSTRUCTION

Overview Our goal is to efficiently obtain the solution of Equation (1). We begin by constructing a low-dimensional embedding and the corresponding manifold-parameterization function (see below), after which we solve PDEs by time-integrating the dynamics of the manifold's latent space vector (see Section 4). As we demonstrate in examples from various scientific disciplines (see Section 5), this general method is applicable regardless of the discretization of the training data (e.g., voxel grids, meshes, point clouds) or the discretization deemed most useful for evaluating the gradients (e.g., physical forces) when solving PDEs on the constructed manifold. Manifold Construction As depicted in Figure 1b , we seek a manifold-parameterization function g(x, q), g(x, q(t; µ)) ≈ f (x, t; µ) , ∀x ∈ Ω , ∀t ∈ T , ∀µ ∈ D , that well approximates the continuous field f (x, t; µ) throughout the spatiotemporal domain Ω × T , and for a workable range of problem parameters µ ∈ D. For ease of exposition, we omit the dependencies of q and f on the problem parameters µ. Here D is an arbitrary parameter space (e.g., material properties, external forces, user settings). For scenarios that do not feature trivial parameterizations, e.g., external force via crowd-sourcing (Barbič & James, 2005) , D can also be implicitly defined. What distinguishes our approach from typical model reduction is that g takes the position x ∈ Ω as an input. Thus, unlike prior approaches that infer only discrete coefficients, our approach infers field values at arbitrary domain positions x. We parameterize g with a neural network g θg whose weights θ g satisfy the minimization problem min θg P i=1 T n=0 µ∈Dtrain ∥g θg (x i , q(t n )) -f (x i , t n )∥ 2 2 , 𝒒(𝑡 ! ) 𝒇(𝒙 " , 𝑡 ! ) ⋮ 𝒇(𝒙 # , 𝑡 ! ) ⋮ 𝒇(𝒙 $ , 𝑡 ! ) 𝒙 " 𝒙 # 𝒙 $ ⋮ ⋮ 𝒇(𝒙 # , 𝑡 ! ) 𝒆 % ! 𝒈 % " 𝒈 % " 𝒈 % " Encoder network 𝒆 % ! Low-dimensional manifold 𝒈 % " Figure 2 : Constructing the manifold-parameterization function as a neural network trained via supervised learning. We pass each snapshot (time step) from the training dataset into an encoder to obtain a latent space vector q. We then concatenate q with the spatial coordinates and pass that into the manifold-parameterization function with the goal of reconstructing f for each individual spatial sample. The same q is shared among all spatial samples in this time step. where D train ⊂ D is the training set, and q(t n ) is the latent space vector shared among all spatial samples. This objective aims to reproduce all the field values present in the training data, generated via full-order PDE solutions. Notably, our approach imposes no limit on the discretization strategy of the PDE solver. For instance, this framework is applicable to training data from both finite difference methods and finite element methods as well as both voxel grids and meshes. There are commonly two approaches to define the latent space vector q: the auto-decoder approach (Park et al., 2019) that trains q along with g θg and the encoder approach (Chen & Zhang, 2019; Mescheder et al., 2019) that trains a separate network to output q. While both approaches work for our application, we adopt the latter. The encoder network e θe with weights θ e takes an input vector constructed by concatenating all the discrete degrees of freedom from the training data and outputs a latent space vector (see Figure 2 ): e θe ( - → f (t)) = q(t), where - → f (t) = f (x 1 , t), . . . f (x i , t), . . . f (x P , t) T . We emphasize that this discretization-dependent encoder (Xie et al., 2021) is a tool for training the smoothly varying latent space and for determining the initial latent space vector from the discretized training data. Otherwise, the implicit neural representation g θg remains a discretization-agnostic architecture. Adding the encoder, Equation (3) now becomes min θg,θe P i=1 T n=0 µ∈Dtrain ∥g θg (x i , e θe ( - → f (t n ))) -f (x i , t n )∥ 2 2 . ( ) Figure 2 illustrates the training pipeline. Please refer to Appendix D for network and training details and Appendix F for hyperparameter selection.

4. METHOD: LATENT SPACE DYNAMICS

After the manifold is constructed, we compute the low-dimensional latent space dynamics (q n → q n+1 ) in three steps (see Figure 3 and appendix Q): (1) network inference, (2) PDE time-stepping, and (3) network inversion. The neural network strictly serves as a kinematic spatial representation in the maps from and to the latent space (steps 1 and 3, respectively). The time integration (step 2) itself uses the original PDE, not a neural network approximation thereof. The ROM community has demonstrated that these three steps can yield strong long-time stability even on stiff and chaotic dynamical systems (Carlberg et al., 2013; 2017) . Appendices O and P demonstrate smooth latent space trajectories and empirical stability analysis using this approach. Commonly shared among all three steps are "integration samples", a finite set of spatial domain points M := {y j ∈ Ω | 1 ≤ j ≤ |M|} chosen at the user's discretion (see Section 4.4). These samples need not coincide with the previously mentioned full-order finite element discretization samples {x i } P i=1 . 𝒒 ! 𝒒 !"# 𝒇 !"# , ∀𝒚 ∈ ℳ 2. PDE time-stepping with boundary conditions 𝒇 ! , 𝛁𝒇 ! , … , ḟ! … , ∀𝒚 ∈ ℳ 1. Network inference 3. Network inversion 𝒈 $ ! 𝒈 $ !

4.1. STEP 1: NETWORK INFERENCE

We first aim to gather all the full-space spatiotemporal information (∀y ∈ M) necessary for PDE time integration. The function value f itself can be evaluated via inferencing of the neural network f (y, t n ) = g θg (y, q n ). The spatial and temporal gradients are computed either by differentiating the network, ∇f (y, t n ) = ∇ y g θg and ḟ (y, t n ) = ∂g θg ∂q qn , respectively, or by numerical approximation. Higher-order gradients may be generalized in a similar manner. Further details on gradient computation are listed in Appendix E.

4.2. STEP 2: PDE TIME-STEPPING

We now evolve time from t n to t n+1 . Unlike end-to-end learning-based latent space dynamics methods (Lusch et al., 2018) , we evaluate time derivatives using the exact PDE (1), not a learned surrogate. At each integration point y, we evaluate the temporal derivative by solving the PDE (1) for ḟ n+1 (y): F (f n , ∇f n , . . . , ḟ n+1 , . . .) = 0 . (5) We evolve the configuration to time t n+1 = t n + ∆t using the chosen explicit time integration method I F subject to given boundary conditions, e.g., Runge-Kutta methods (Dormand & Prince, 1980) : f n+1 = I F (∆t, f n , ḟ n+1 , . . .) ∀y ∈ M . While this work focuses on explicit time integration, we can also extend the framework for implicit time integration (Carlberg et al., 2017) .

4.3. STEP 3: NETWORK INVERSION

We project back onto the reduced manifold by finding the corresponding input q n+1 that best matches the evolved configuration f n+1 in a least-squares sense: (Quarteroni et al., 2014) min q n+1 ∈R r y∈M ∥g θg (y, q n+1 ) -f (y, t n+1 )∥ 2 2 . ( ) The objective is similar to the training loss found in Equation (3), but with two dimensions significantly reduced: the dimension of the unknown q n+1 , and the summation bound |M|. Consequently, instead of using a stochastic gradient descent type of method, such as the auto-decoder scheme by Park et al. (2019) , we achieve rapid inversion using the Gauss-Newton algorithm (Nocedal & Wright, 2006) with conditionally quadratic convergence. Further details are listed in Appendix C.

4.4. SPATIAL SAMPLE REDUCTION

A necessary condition for the well-posedness of the least squares formulation from Equation ( 7) is r ≤ d|M|. Greedily selecting the samples (blue) allows us to use significantly fewer degrees of freedom than the full-order simulation (purple) while getting a much higher accuracy than naive uniform sampling (red). The depicted field is temperature governed by the heat equation after 100 time steps. Right: as dictated by the PDE, the reduced system is conservative: total energy (stored thermal energy plus cumulative flux at boundary) is conserved. choose r d ≤ |M| ≪ P . To obtain the next-time step f (y, t n+1 ), ∀y ∈ M necessary for the least squares solves, we only require PDE updates (Section 4.2) and spatiotemporal data (Section 4.1) at these |M| samples. As such, the entire latent space dynamics framework (Figure 3 ) requires only |M| samples, compared to the full-order solver's P samples. Hyper-reduction approaches like this have captured a wide range of real-world scenarios, including massive elasticity deformations (Fulton et al., 2019) and large turbulent flows (Grimberg et al., 2021) . Unlike our discretization-independent approach, prior methods only support hyper-reduction samples that coincide with the full-order discretization. A naive selection of the integration samples can lead to inaccurate latent space dynamics, even if |M| ≥ r d ; refer to Figure 4 for the failure case of uniform sampling. As noted in the hyper-reduction literature, stochastic sampling can eliminate such errors (Carlberg, 2011) . To better control hyperreduction error, we draw inspirations from the cubature approach by An et al. (2008) and propose a greedy algorithm that augments the sample set to meet a target residual (see Appendix B). This strategy iteratively adds spatial samples to the sampling set until the target residual is met. In this case, the user decides the target accuracy, not the sample set size |M|. Alternatively, the user can terminate the algorithm early once it reaches a chosen sample set size |M|. Results of our sampling approach are shown in Figures 4, 9 and 20.

5. EXPERIMENTS

We analyze the proposed framework on classic PDEs, with training data produced using a variety of discretizations (voxel grids, meshes, and point clouds). Unless otherwise noted, for each PDE, we delineate a testing set where D test ⊂ D with D train ∩ D test = ∅. We construct the manifold (Section 3) with data from D train , and then validate the latent space dynamics (Section 4) on D test . We compare our approach with prior discretization-dependent ROM methods, including POD (Berkooz et al., 1993; Holmes et al., 2012) and neural-network-based autoencoder approaches (Fulton et al., 2019; Lee & Carlberg, 2020; Shen et al., 2021) , under the same latent space dimensions. Additional implementation and reproducibility details are listed from Appendix G to Appendix L. Experiment statistics are summarized in Table 1 . The temporal evolutions of the PDEs are best illustrated via the supplementary video. Thermodynamics, ∂u ∂t -ν(x) ∂ 2 u ∂x 2 = 0 . Temperature u is governed by a one-dimensional heat equation. ν describes the spatially-varying diffusion speed. Figure 4 displays our approach's ability to use very few integration samples and to capture conservation of energy. Image Processing, ∂u ∂t -ν(x)∇ 2 u = 0 . We model image blurring with the 2D diffusion equation (Perona & Malik, 1990) . Figure 5 shows that under the same latent space dimension (r = 3), our method is more accurate than POD (Berkooz et al., 1993; Holmes et al., 2012) , both visually and quantitatively. With its architecture independent of pixel count, CROM uses an order of magnitude less memory than POD (see Figure 5c ). Transport dominated systems. Next, we examine two transport-dominated slowly decaying Kolmogorov n-widths problems, the Advection Equation and Burgers' Equation, where classic model reduction techniques often struggle (Peherstorfer, 2022) : Advection Equation, ∂u ∂t + (a • ∇)u = 0 . Here u is the advected quantity and a is the advection velocity. Figure 6a depicts CROM's favorable trajectory tracking relative to both POD and the convolutional autoencoder (CAE) of Lee & Carlberg (2020) , for equal latent space dimension (also see Figure 21 ). (a) Visually, our approach better captures the sharp initial state and the smoothed finial state than POD. (b) Quantitatively, CROM obtains a higher PSNR than POD (↑ the higher the better). (c) Our approach also uses an order-of-magnitude less memory than POD (↓ the lower the better). Both POD and our approach use the same latent space dimension (r = 3). For both cases, we use the intrinsic solution-manifold dimension (Lee & Carlberg, 2020) , the lower bound of r, as the latent space dimension r. To isolate the source of error, no hyper-reduction method is applied. Burgers' Equation, ∂w ∂t + ∂0.5w 2 ∂x = 0.02e µ D x . Figure 6b shows that CROM more accurately captures the nonlinear dynamics than both POD and CAE (also see Figure 22 ). Additionally, CROM uses 12× less memory than CAE (see Figure 27 ). These accuracy and memory advantages are consistent with the implicit neural representation literature (Chen & Zhang, 2019) . Whereas CAE is applicable only to voxel grid data, CROM is applicable to data from any discretization, as highlighted by the tetrahedral mesh and point cloud data examples (see Figure 9 ). (Barbič & James, 2005; Fulton et al., 2019; Shen et al., 2021) only support reduced-order dynamics with fixed discretizations, forcing a tough choice between accuracy and computational cost; (c) our approach allows dynamic mesh adaptivity to balance accuracy and cost as problem difficulty varies over time. (d) Our resolution-independent encoding of the deformation map allows us to visualize the field using any method (e.g., a high resolution mesh) without regard to the computational mesh. CROM's speedup over prior discretization-dependent ROM methods is observed for 1-32 threads, and emphasized in the case of limited computational resources. To ensure representativeness of prior ROM architectures, we report timing strictly for the PDE-time-stepping stage (Step 2), which is shared among all prior ROM architectures. (Barbič, 2012) of the same latent space dimension consumes 26× more memory while suffering from volume-gain artifacts (5.76% error). (d) These simulations adopt a tetrahedral discretization (d, pink mesh, P = 66, 608). Instead of using the expensive high-resolution mesh, our approach computes dynamics using very few integration samples (d, colorful spheres, |M| = 40). (e) Naive down-resolution of the ground truth simulation yields a similar runtime but leads to significantly worse quality. (f) After training, our model can capture a wide range of material properties. (g) The same manifold-parameterization function architecture (different network weights) can also be used for model-reducing point-cloud based simulation (reproduced from the work by (Chen et al., 2023) ). The speedup plot on the right demonstrates the effectiveness of our approach on diverse computing platforms. Disclaimer: the authors do not support animal cruelty. Solid Mechanics, ρ 0 φ = ∇ • P (∇ϕ) + ρ 0 B . We solve the second-order elastodynamics equation for the deformation map ϕ of soft bodies, where ρ 0 is the initial density, P is the first Piola-Kirchhoff stress, and B is the body force density (see Figures 8 to 10 and 25). Adaptive resolution. Prior ROM methods (see Figure 1a ) fix the discretization at onset, precluding the dynamic adaptivity approaches that benefit problems with evolving complexity. When modeling a falling block (see Figure 8 ), prior ROM methods must begin with a high-resolution mesh, despite that the high-resolution is required only during contact (Figure 8a and b ). Our discretization-independent representation allows the time integrator to freely adapt the spatial discretization throughout the dynamic evolution. For instance, we can employ a coarse mesh (P = 8 vertices) to economize computation during the rigid falling phase (Figure 8c ), and a high-resolution mesh (P = 2, 065) to capture details during contact (Figure 8c ), yielding 35×-48× wall-clock speedups over fixed discretization ROM methods. To ensure generalizability of the comparison, the timing is done strictly for the PDE time-stepping stage (Step 2). Prior ROM architectures (i.e, both POD and autoencoder approaches) share the same PDE time-stepping routine. Figure 10 : Our approach uses significantly less memory and simultaneously yields higher accuracy than POD and the autoencoder-based ROM methods of Fulton et al. (2019) and Shen et al. (2021) for the solid mechanics experiment depicted in Figure 9 . For example, with r = 3, our approach uses 39× and 132× less memory, while simultaneously offering 79× and 49× more accuracy, compared to POD and the autoencoder approach of Fulton et al. (2019) , respectively. Memory footprint. Most prior ROM approaches are discretization-dependent. As discretization resolution (P ) increases, so does memory consumption for g. By contrast, our discretizationindependent architecture does not "see" the parameter P and in practice memory footprint does not scale with P . Rather, as with all neural fields (Xie et al., 2021) , memory grows with the intrinsic complexity of the training data. Figure 10 compares memory footprint and reconstruction error for CROM, POD (Barbič, 2012) , and neural-network-based autoencoder approaches (Fulton et al., 2019; Shen et al., 2021) , as a function of latent space dimension; Figure 9 depicts the corresponding large deformation simulations. Figure 25 compares CROM's memory and accuracy on another large deformation example. For these examples, CROM offers simultaneously lower memory consumption and reconstruction error. Furthermore, due to hyper-reduction (from P = 66, 608 to |M| = 40), CROM also obtains significant wall-clock speedups on CPUs, consumer GPUs, and data center GPUs over the unreduced model (see Figure 9 ). Different discretizations, same architecture. Because CROM is not constructed for a fixed discretization, we can adopt an identical manifold-parameterization function architecture to modelreduce simulations of both tetrahedral meshes and point clouds (see Figure 9g ). Moreover, even though these examples (Figures 8 and 9 ) leverage completely different meshes, they use the same network architecture. This opens the door for future work on transfer learning (Weiss et al., 2016) and weight sharing among various discretizations and PDEs. Additional comparisons between CROM, previous ROM, and the full-order models are discussed in Appendices M and N.

6. DISCUSSION AND CONCLUSION

CROM is a model reduction framework for PDEs featuring a discretization-independent architecture. CROM disentangles the low-dimensional embedding from the discretization by reformulating the reduced manifold as a map accepting not only q but also x as an input. CROM outperforms discretization-dependent ROM approaches, such as POD and autoencoders, in terms of accuracy and memory consumption for equal latent space dimension, and in the ability to dynamically adapt the discretization during time integration. While offering key advantages over prior ROM methods, CROM also inherits a limitation of ROM. It can only treat PDE solutions in the space spanned by the manifold-parameterization function (determined by the training data) and does not generalize to arbitrary unseen scenarios. Such a limitation is also commonly found among other implicit neural representation works (Xie et al., 2021) . Future research may consider improving generalizability and data efficiency via meta-learning and integration of stronger priors (Sitzmann et al., 2020a) . Compared to end-to-end ML solutions to PDEs (Sanchez-Gonzalez et al., 2020) , CROM employs the neural network strictly as a spatial representation (see Sections 4.1 and 4.3) for the kinematics and solves the PDE using classical PDE-integration numerical methods (see Section 4.2). As such, we believe CROM will open doors for more forthcoming hybrid ML-PDE solutions. As shown in our work, these solutions can retain the PDE's physical invariants (see Figure 4 ), allow for easy integration with existing PDE solvers (see Section 4.2), and obtain practical computational savings that can be directly employed in production (see Figure 9 ). Accelerating PDE solutions via reduced-order modeling (ROM) has a rich history. Two common building blocks of ROM are (1) identifying a low-dimensional latent space of the original complex system and (2) solving PDEs via latent space dynamics, i.e., evolving the latent space vector over time. Early works on latent space construction started with linear methods where the the latent space and the full space has a linear relationship. Notable works include proper orthogonal decomposition (Berkooz et al., 1993; Holmes et al., 2012; Barbič & James, 2005; Barbič, 2012) , the reduced-basis technique (Prud'homme et al., 2002; Rozza et al., 2007) , balanced truncation (Moore, 1981) , rational interpolation (Baur et al., 2011; Gugercin et al., 2008) , and Craig-Bampton model reduction (Craig Jr & Bampton, 1968) . Recently, nonlinear methods (Hartman & Mestha, 2017; Kashima, 2016; Gu, 2011; Erichson et al., 2019; Maulik et al., 2020; Regazzoni et al., 2019; Fulton et al., 2019; Maulik et al., 2021; Romero et al., 2021; Shen et al., 2021) , often constructed via neural networks, have gained significant attentions due to their ability to more accurately construct the manifold-parameterization function. In particular, nonlinear manifolds significantly outperform their linear counterparts on slowly decaying Kolmogorov n-width problems (e.g., advection) (Peherstorfer, 2022; Ohlberger & Rave, 2013; Peherstorfer & Willcox, 2015; Taddei et al., 2015; Peherstorfer, 2020; Ehrlacher et al., 2020; Lee & Carlberg, 2021) . Most of these prior works exclusively focus on building a latent space for the already-dicretized vector fields of interest. Chen et al. (2023) ; Pan et al. (2023) and our work are the first attempts at constructing the latent space for the continuous vectors themselves, thereby allowing the training data to come from any discretizations. After the latent space is identified, solving PDEs with parameters unseen during training requires evolving the latent space vector over time. Two families of approaches have been proposed for latent space dynamics. The first family of approaches (Kim et al., 2019) computes latent space dynamics via a data-driven operator that is learned from the training data. A key benefit from learning the operator from data is that the user requires no knowledge of the system's governing equations and does not need access to the training solver's source code. However, the downside of this approach is that additional assumptions (e.g., the availability of a Koopman invariant subspace as in Lusch et al. (2018) and the existence of a finite-dimensional quadratic representation as it is in the lift and learn approach by Qian et al. ( 2020)) are made. Without these assumptions, the learned operator is prone to violation of physical laws. The second family of approaches (Lee & Carlberg, 2020; Fulton et al., 2019) , to which our method belongs, computes latent space dynamics according to the PDE used for training data generation. The advantage of this approach is that it does not require learning an additional time-evolution operator. It works with arbitrary dynamical systems with known PDE governing equations and conserves physical laws by construction since explicit PDE model is employed. However, the disadvantage is that the PDE model must be known.

B ROBUST SAMPLING

To select the integration samples robustly, we aim to balance the computation speed and accuracy. CROM employs |M| spatial samples, and the computation speed roughly scales linearly with the number of samples. To maximize computation speed, we aspire to select the fewest number of samples possible. Given a target accuracy, our greedy algorithm selects the fewest number of integration samples from the discrete samples {x i } P i=1 of the full-order PDE solution in order to achieve the target accuracy.  res_list = 0 ∈ R Q for i in indices do 10 res_list[i] = M etric(Calculate_Residual(M ∪ {x i })) end j = M in_Indices(res_list, 1) // find the index with the smallest global residual M = M ∪ {x j } end end Function Calculate_Residual(M): res_vec = 0 ∈ R P for µ ∈ D train do Obtain the final latent space vector q(t T ; µ) with the integration samples (M) In every iteration, the greedy algorithm adds one spatial sample to the sample set M and lowers the error of latent space dynamics. Specifically, the algorithm loops over Q spatial samples with the largest individual residuals. We test adding each sample to the integration sample set and compute a new global residual. From these Q samples, we select the one leading to the smallest new global residual and add it to the actual sample set. The algorithm repeats until the target accuracy is met. In practice, we find Q = 10 gives sufficient results. Algorithm 1 describes the details. Compared to the naive uniform sampling approach, our robust sampling scheme yields significantly more accurate results (see Figure 4 ). We also highlight that Algorithm 1 happens before any simulation occurs (latent space dynamics). It is strictly a pre-computation. for i = 1, . . . , P do 21 res_vec[i]+ = ∥g θg (x i , q(t T ; µ)) -f (x i , t T )∥

C NETWORK INVERSION

In order to generate latent space dynamics, we need to invert the neural network employed for the manifold-parameterization function. We do so by solving the minimization problem: min q n+1 ∈R r y∈M ∥g θg (y, q n+1 ) -f (y, t n+1 )∥ 2 2 . ( ) Since this is a nonlinear least-squares problem, we solve it with the classic iterative Gauss-Newton algorithm (Nocedal & Wright, 2006) . Unlike Newton's method, Gauss-Newton does not require the Hessian of the objective function. Consequently, in addition to the neural network itself, we only need to evaluate the gradient of the neural network with respect to the latent space vector ∂g θg ∂q (y, q n+1 ). We also equip our Gauss-Newton implementation with a standard backtracking line-search scheme to improve the convergence rate.

C.1 INITIAL GUESS FOR THE NONLINEAR SOLVER

An initial guess close to the minimum is necessary for the convergence of Gauss-Newton. The previous time step latent space vector q n serves as an ideal initial guess since f does not vary much over a time step. However, in the extreme cases where f deviates significantly over one time step, we find it helpful to compute the initial guess through an encoder network. For convenience, we employ the encoder network from training, which requires computing the dynamics for P samples. Future work should consider building this encoder network strictly for the degrees of freedom in M. We can further facilitate nonlinear solver convergence by encouraging smoothness of the latent space, such as employing a Jacobian penalty term (Chen et al., 2023) / local isometry term (Du et al., 2021) and enforcing Lipschitz constraints (Liu et al., 2022) . These smoothness regularizations would be particularly important for challenging PDE cases involving bifurcations and buckling (Brush et al., 1975) .

C.2 LINEARIZATION

We can further obtain computation-save by bypassing the iterative nonlinear solver and linearizing the least-squares problem. We define ∆f as, ∆f (y, t n+1 ) = f (y, t n+1 ) -f (y, t n ) (9) = f (y, t n+1 ) -g θg (y, q n ). ( ) Employing Taylor expansion of g θg at q n , we have g θg (y, q n+1 ) ≈ g θg (y, q n ) + ∂g θg ∂q (y, q n )∆q n+1 , where q n+1 = q n + ∆q n+1 . Therefore, the original objective function can be approximated as, g θg (y, q n+1 ) -f (y, t n+1 ) = g θg (y, q n+1 ) -g θg (y, q n ) -∆f (y, t n+1 ) (11) ≈ ∂g θg ∂q (y, q n )∆q n+1 -∆f (y, t n+1 ). Therefore, we can obtain the latent space dynamics by solving the linear least-squares problem, min ∆q n+1 ∈R r y∈M ∥ ∂g θg ∂q (y, q n ))∆q n+1 -∆f (y, t n+1 )∥ 2 2 . ( ) This is the normal equation and can be solved in the closed form: ∆q n+1 = (J T J ) -1 J T b, ( ) where J is the (|M| • d) by r Jacobian matrix that contains ∂g θg ∂q (y, q n ), ∀y ∈ M; b is the (|M| • d) by 1 residual vector containing ∆f (y, t n+1 ), ∀y ∈ M. Since our work features both small |M| and small r, the closed-form network inversion costs just a few small dense matrix multiplications and thereby introduces minimal overhead. In fact, the solver itself is extremely efficient, and the majority of the computation cost is forming the Jacobian matrix J itself, whose optimization will be discussed in Appendix E.1.1. Additional performance details are listed in Appendix N. Although our network inversion step follows the standard optimal-projection-based ROM literature (Carlberg et al., 2013) , future work may consider directly learning an inverse during training time. The encoder introduced in the manifold construction process may be a step towards learning an inverse. However, the encoder in its current form does not have optimal accuracy and is bounded to a particular discretization. Future work should consider lifting these limitations.

D NETWORK DETAILS

Consistent with the implicit neural representation literature, we parameterize the manifoldparameterization function manifold with an MLP network g θg . The input dimension of the network is m + r while the output dimension of the network is d, where m is the dimension of the input spatial vector while d is the output dimension of the vector field of interest. Our MLP network contains 5 hidden layers, each of which has a width of (β • d), where β is the hyperparameter that defines the learning capacity of the network. Essentially, our network has two tunable hyperparameters, r and β. A detailed study on these hyperparameters will be discussed in Appendix F. Since we require the network to be continuously differentiable with respect to both the spatial coordinates x and the latent space vector q, we adopt continuously differentiable activation functions. In practice, either ELU (Clevert et al., 2015) or SIREN (Sitzmann et al., 2020b) serves this purpose. For the encoder network e θe , the input is a 1D vector of length P with d channels. The output is a vector of dimension r. Using MLPs would lead to enormous network size when P is large. We therefore design an encoder network first to apply multiple 1D convolution layers of kernel size 6, stride size 4, and output channel size d until the output 1D vector's length is the closest to 32/d but no smaller. Afterward, the encoder network reshapes the vector into 1 channel and applies an MLP layer to reduce the dimension of the vector to 32. The last MLP layer then transforms the previous 32-dimensional vector into dimension r. Discretization-specific encoder networks can also be used, e.g., PointNet for point clouds and convolutional neural network for grid data.

D.1 TRAINING DETAILS

We use the Adam optimizer (Kingma & Ba, 2014) for stochastic gradient descent. We use the Xavier initialization for ELU layers and the default initialization (ω 0 = 30) for SIREN layers. Unless otherwise noted, we train with a base learning rate of lr = 1e -4 and adopt a learning rate decay strategy (10 • lr → 5 • lr → 2 • lr → 1 • lr → 0.5 • lr → 0.2 • lr). For each aforementioned learning rate, we train for 30, 000 epochs. We adopt a batch size 16 (i.e., 16 simulation snapshots) for the encoder and therefore a batch size of 16 • P for the implicit-neural-representation-based manifoldparameterization function. We implement the entire training pipeline in PyTorch Lightning (Falcon et al., 2019) which facilitates distributed training across multiple GPUs. The training vector fields are standardized to have zero mean and unit variance. The training spatial coordinates are also standardized for the ELU implementation, but they are preprocessed to be between [-1, 1] for the SIREN implementation.

E NETWORK GRADIENTS

Gradients of the implicit neural representation (∇ x g θg and ∂g θg ∂q ) are crucial for our reducedorder pipeline. We use them to compute the spatial and temporal gradients of the vector fields, ∇f (x, t) = ∇ x g θg (x, q) and ḟ (x, t) = ∂g θg ∂q (x, q) q. Furthermore, the gradient with respect to the latent space ∂g θg ∂q is also a key ingredient for the network inversion (Appendix C).

E.1 NETWORK GRADIENTS VIA DIRECT DIFFERENTIATION

One way to compute the gradients is through directly differentiating the continuously differentiable network. Unfortunately, computing these differentiations via auto-diff (computational graph tracking) is too slow for the high-performance application explored in this work. To ensure maximum efficiency, we implement the gradients analytically through the chain rules of each network layer, similar to the grad net approach by (Lindell et al., 2021) . Since we only use fully-connected layers, the implementation is straightforward. In addition, along the way of computing the gradient via chain rules, we also obtain the function value of the neural network itself. Therefore, we obtain both the gradient and the function value in a single forward pass. Algorithm 2 describes the details.

E.1.2 FAILURE MODES

While direct differentiation is generally accurate, we observe failure modes when the training samples are sparse. Consider the case where g θg (x, q) = x, e.g., the undeformed, reference configuration of elastodynamics. In this case, the ground truth spatial gradient is ∇ x g θg = I. We train the neural network with spatial samples discussed in Appendix L.4.1. After training, we evaluate the spatial gradients at the center of each tetrahedron and compute its error (from the identity matrix) in the L2 norm. As shown in Figure 11 , errors are observed in both ELU and SIREN networks while SIREN's error is significantly larger, c.f., (Yang et al., 2021) . This discrepancy can be understood as the high-frequency prior by SIREN being more suitable for approximating high-frequency functions. By contrast, the low-frequency prior by ELU is better at approximating low-frequency functions (Hertz et al., 2021) . We find training with more samples and/or gradient supervision (Chen et al., 2023) reduces the errors above at the cost of additional training resources but does not resolve the issue completely. Future work may consider more advanced gradient regularization methods (Liu et al., 2022) .

E.2 NETWORK GRADIENTS VIA NUMERICAL DIFFERENTIATION

Another approach to computing neural network gradients is numerical approximation, such as the finite difference method and the finite element method. Figure 11 demonstrates the drastically improved gradient accuracy with the finite element method (linear basis function) on networks trained with SIREN. We find such a hybrid approach to be an ideal middle ground, i.e., we represent the function itself with the discretization-independent neural network and compute the gradient via discretization-dependent numerical differentiation. With numerical gradients, users still benefit from our continuous representation that allows them to use any discretization for gradient computation (e.g., ground impact elasticity deformation with remeshing). Prior ROM approaches require these numerical gradients to be computed by a fixed mesh.

F HYPERPARAMETERS

During the offline training stage, we can modify two network architecture hyperparameters: the size of the latent space (r) and the width of the MLP (β). After training and during the online deployment stage of the model, the key hyperparameter is the number of integration samples (|M|). In this section, we run a sensitivity analysis on these hyperparameters. Figure 12 shows that increasing the width of each MLP layer improves the reconstruction accuracy. Unless otherwise noticed, all reconstruction errors reported are mean squared errors (MSE). While Figure 12 demonstrates an empirical convergence, future work should consider a theoretical analysis (Kunisch & Volkwein, 2002) . However, large layer widths also entail higher computational cost (see Figure 13 ). Compared to the highly parallelized GPUs, CPUs suffer from a more severe performance drop due to the larger network size. Unlike traditional PCA-based model reduction approaches (e.g., POD), increasing the dimension of the latent space vector does not yield higher accuracy for our approach (see Figure 10 ). In prior ROM approaches, augmenting the dimension of the latent space vector directly increases the number of entries in the manifold-parameterization function matrix, which, in turn, improves the approximation capacity of the manifold-parameterization function. However, as shown in Figure 12 , in the implicit neural representation framework, we can control the learning capacity of the network architecture by explicitly modifying the width of the MLP layer. Therefore, the importance of the latent space dimension diminishes. This experimental observation is consistent with prior work's theoretical analysis (Remark 2.1 by Lee & Carlberg (2020) ) which says the lower bound of the latent space dimension is the dimension of the problem parameters (e.g., material properties) plus one. Consequently, one should consider increasing the learning capacity of the manifold-parameterization function instead of increasing the latent space dimension. During the online stage of the reduced-order model, we select integration samples via the greedy sampling scheme (Appendix B). Increasing the number of integration samples leads to higher accuracy (Figure 14 ).

G THERMODYNAMICS

G.1 CONTINUOUS PDE ∂u ∂t -ν(x) ∂ 2 u ∂x 2 = 0 (15) In thermodynamics, we study the 1D heat equation of the spatiotemporal dependent temperature u. We assume a zero-Dirichlet boundary condition, though other boundary conditions can also be incorporated.

G.2 FULL-ORDER MODEL

The full-order model discretizes the spatial vector field with a regular Eulerian grid using 501 equally spaced samples (P = 501). We then approximate the spatial gradient using the finite difference method, ∂ 2 u ∂x 2 (x i , t n ) = u(x i -∆x, t n ) + u(x i + ∆x, t n ) -2u(x i , t n ) ∆x 2 , ( ) where ∆x is the grid spacing. We use this to compute the next-time step velocity, ui n+1 = ν(x i ) ∂ 2 u ∂x 2 (x i , t n ). ( ) We assume a first-order explicit time-stepping scheme, We sample another 4 ν's for testing purposes. All training and testing data adopt the same initial condition (Figure 15a ). u i n+1 = u i n + ∆t ui n+1 .

G.5 RESULTS

Figure 16 demonstrates the performance of the reduced-order model on the testing dataset. With the robust sampling scheme, we can achieve 0.43% error using just 22 spatial samples (see Figure 4 ). In addition, by incorporating different initial conditions (e.g., Gaussian profiles centered at different horizontal locations) in the training data, our approach can capture diffusion with a wide range of initial conditions (Figure 17 ). To gauge our approach's ability to conserve physical laws, we measure the energy evolution over time in the region between x 1 = 0.396 and x 2 = 0.526 (Figure 16 ). As shown by Cannon (1984) , the escaped energy (EE) is measured by the flux difference ( ∂u ∂x (x 1 , t) -∂u ∂x (x 2 , t)) at the left and right boundaries integrated over time; the stored energy (SE) is measured by the heat energy density, proportional to the temperature (u), integrated over space; the total energy (TE) is the sum of the prior two energies, i.e., TE=EE+SE. Figure 4 demonstrates that the reduced simulation preserves the total energy over time.

H IMAGE PROCESSING

H.1 CONTINUOUS PDE ∂u ∂t -ν(x)∇ 2 u = 0 In image processing, we study the Perona-Malik diffusion equation of the gray-scale image u (Perona & Malik, 1990 ).

H.2 FULL-ORDER MODEL

The full-order model discretizes the spatial 2D pixel field with a regular Eulerian grid using 256 × 256 equally spaced samples (P = 65, 536). We then approximate the spatial gradient using the finite difference method, ∂ 2 u ∂x 2 (x i , y i , t n ) = u(x i -∆x, y i , t n ) + u(x i + ∆x, y i , t n ) -2u(x i , y i , t n ) ∆x 2 , ∂ 2 u ∂y 2 (x i , y i , t n ) = u(x i , y i -∆y, t n ) + u(x i , y i + ∆y, t n ) -2u(x i , y i , t n ) ∆y 2 . ( ) We use these to compute the next-time step velocity, ui n+1 = ν(x i , y i )( ∂ 2 u ∂x 2 (x i , y i , t n ) + ∂ 2 u ∂y 2 (x i , y i , t n )). We assume a first-order explicit time-stepping scheme, u i n+1 = u i n + ∆t ui n+1 , where ∆x = ∆y is the grid spacing.

H.3 REDUCED-ORDER MODEL

The reduced-order model follows the same treatments as Appendix G.3.

H.4 TRAINING AND TESTING DATA

Similar to Appendix G.4, we generate training data by setting ν to different piecewise constant functions of four equally-spaced regions, i.e., the parameter vector µ ∈ D = {0, 0.2} 4 ⊂ R 4 . D train contains 11 parameter vectors (Figure 18 ), where 0, 2 or 3 regions have zero diffusion coefficients, i.e., no blurring. D test contains 4 parameter vectors, where only 1 region is unblurred. For each training and testing parameter vector, we run a simulation of 20 time steps. All training and testing data adopt the same initial condition (Figure 18a ).

H.5 RESULTS

Figure 19 demonstrates the reduced simulation's performance on the testing dataset. Furthermore, with the robust sampling scheme, we can achieve 36.1 PSNR using just 63 spatial samples (see Figure 20 ). The dimension reduction is 99.90%. 

I ADVECTION EQUATION

Nonlinear ROM techniques have been shown to significantly outperform linear ROM techniques on problem with slowly decaying Kolmogorov n-widths (Peherstorfer, 2022; Lee & Carlberg, 2020) . In this section, we test CROM, also a nonlinear ROM method, on a transport-dominated problem, given by the advection equation. I.1 CONTINUOUS PDE ∂u ∂t + (a • ∇)u = 0, ( ) where u is the advected quantity and a is the advection velocity.

I.2 FULL-ORDER MODEL

The full-order model discretizes the 1D spatial field with a regular grid using 100 equally spaced samples. The PDE is temporally discretized via forward Euler.

I.3 REDUCED-ORDER MODEL

We vary the latent space dimension r and use a fixed width of the MLP β = 20.

I.4 TRAINING AND TESTING DATA

In this example, we consider a reproductive case where the training and the testing data are the same. In particular, the initial condition is given by a Gaussian profile and is then advected under a constant velocity.

I.5 RESULTS

Since this example is reproductive and the only parameter of the system is time t, the intrinsic solution-manifold dimension (Lee & Carlberg, 2020 ) is 1. Figure 21 demonstrates that CROM indeed accurately captures the transport behavior of the Gaussian profile even at the the intrinsic solution-manifold dimension (r = 1). By contrast, POD suffers from serious artifacts with lower dimensional latent space and requires a significantly higher dimensional latent space (r = 16) in order to model the advection behavior. Furthermore, Figure 6 shows that our approach is also more accurate than the convolutional autoencoder approach by Lee & Carlberg (2020) when r = 1.

J BURGERS' EQUATION

In this section, we continue to investigate problems with slowly decaying Kolmogorov n-widths. In particular, we obtain the setup by Lee & Carlberg (2020) (see Section 7.1 of their paper) and compare with their (discretization-dependent) convolutional autoencoder manifold. J.1 CONTINUOUS PDE ∂w ∂t + ∂0.5w 2 ∂x = 0.02e µ D x , ( ) where w is the variable of interest. The initial condition is as follows. w(0, 0) = 4.25 (26) w(x, 0) ∀x ∈ (0, 100] We further assume a zero Neumann boundary condition.

K FLUID DYNAMICS

K.1 CONTINUOUS PDE ∂u ∂t + (u • ∇)u = -∇p + ν∇ 2 u + f ext , ∇ • u = 0 We study the incompressible Navier-Stokes equations, where u is the velocity field, p is the pressure field, and ν is the viscosity. We assume there is no external force, i.e., f ext = 0. We assume a no-penetration boundary condition for the velocity and a zero-Dirichlet boundary condition for the pressure.

K.2 FULL-ORDER MODEL

The full-order model assumes an operator-splitting scheme following the classic Chorin's projection method (Chorin, 1968) . In particular, we follow the implementation by Stam (1999) . We summarize the major ingredients and refer to his paper for details. We sequentially apply 3 linear operators to the velocity field: diffusion, advection, and projection. From the previous time step velocity field u n , we apply the diffusion operator according to the viscosity and obtain the diffused velocity u visc n . Next, we employ semi-Lagrangian to obtain the advected velocity field u adv n . For the last projection step, we first compute the pressure p n+1 by solving the poisson equation, ∇ 2 p n+1 = ∇ • u adv n . Afterwards, we apply the pressure gradient to obtain the divergence-free velocity u n+1 . All operations are done on a 2D 16 by 16 Eulerian grid.

K.3 REDUCED-ORDER MODEL

We compute latent space dynamics (q n+1 ) of the velocity field by projecting u n+1 onto the manifoldparameterization function g θg . For network hyperparameters, we adopt a latent space dimension r = 6 and a MLP width β = 10. Notice how the extrapolated region has a larger range (0.08) than the interpolated region (0.02). All training and testing data adopt the same initial condition (see Figure 23 ).

K.5 RESULTS

Figure 23 demonstrates our method's ability to qualitatively capture fluid dynamics with different viscosities on both interpolated data as well as extrapolated data. Surrogate models that predict velocity fields directly from viscosities (e.g., using a neural network) typically cannot handle extrapolation. By contrast, our approach handles extrapolated scenarios because we only approximate the spatial representation (kinematics) with a neural network, not the PDE itself. Therefore, as long as the extrapolated cases involve vector fields that can be represented by the low-dimensional implicit neural representation, our reduced-order solver is able to evolve the latent space vector to capture them according to the PDE. However, if the extrapolated quantity lies outside of the space spanned by the manifold-parameterization function, our approach would not be able to capture the dynamics. CROM, like most ROM approaches, does not handle arbitrary extrapolation scenarios. See more discussion on generalizability at the end of the main text. To further test the proposed method, we set up a Karman vortex street experiment. The Karman vortex street is particularly challenging as the flow bifurcates from a steady state to the periodic vortex shedding regime. Mathematically speaking, the fixed point solution undergoes a Hopf bifurcation and becomes a limit cycle (Huerre & Monkewitz, 1990) . To capture such an effect, the latent space dynamics needs to be robust under extreme nonlinearity. Many neural-network-based methods, such L.2 FULL-ORDER MODEL We adopt the tetrahedral-mesh-based linear finite element method (FEM) for the full-order model. We closely follow the course by Sifakis & Barbic (2012) . Here we review only the salient features and refer to their work for theoretical and practical details. We spatially discretize the domain of interest using linear tetrahedra. For each tetrahedron, we have four undeformed vertex positions (X 1 , • • • , X 4 ) and four deformed vertex positions (x 1 , • • • , x 4 ). For each vertex, we have x i = F X i + b, where b is the translation. Consequently, we can obtain the deformation gradient F of each tetrahedron by solving D s = F D m , where D s = [x 1 -x 4 x 2 -x 4 x 3 -x 4 ] and D m = [X 1 -X 4 X 2 -X 4 X 3 -X 4 ]. Furthermore, the internal elastic forces on the vertices (f 1 , • • • , f 4 ) can be computed though [f 1 f 2 f 3 ] = -1 6 | det D m |P (F )D m -T and f 4 = -f 1 -f 2 -f 3 . L.3 REDUCED-ORDER MODEL Since the initial condition is known for a given problem, we opt to construct a manifoldparameterization function for the displacement field u(X, t) = ϕ -X. The deformation map can then be computed via ϕ = u + X. The hyperparameters of the manifold-parameterization function are r = 2 and β = 20. Unlike previously discussed PDEs, the elastodynamics equation is second-order in time. Consequently, in addition to the function value of the manifold, we also need to infer the temporal derivative during latent space dynamics. We do so by employing the tangent space of the manifold, φ(x, t n ) = ∂g θg ∂q qn . , where qn = (q n -q n-1 )/∆t and q0 = 0. To compute the acceleration (and increment the velocity) of a vertex belonging to the integration samples set M, we need to evaluate the internal force at this particular vertex. We can calculate the vertex force by accumulating the internal forces of tetrahedra incident on this particular vertex. Tetrahedron forces are computed using the formula described in the previous section. To attain optimal gradient accuracy (Appendix E.2), we opt to use finite element's linear basis function to compute the deformation gradient F of each tetrahedron. To compute these internal forces and deformation gradients, we also need to obtain the position information (via the manifold-parameterization function) of all vertices in the one-ring neighborhoods of the integration samples. We refer to the set of these neighboring vertices as N . Therefore, the total number of spatial samples in the reduced-order model now becomes |M| + |N |. Since one-ring neighbors only entail a small subset of the original degrees of freedom, we still offer significant spatial sample reduction from the full order model (P ). L.4 TRAINING AND TESTING DATA

L.4.1 GRAVITY-INDUCED IMPACT

An initially-static rectangular-shaped deformable object (P = 2, 065 vertices and 9, 346 tetrahedra) accelerates due to downward gravity. After impacting the collision objects underneath, the material undergoes intense deformation. We generate training and testing data by sampling the shear modulus µ = µ ∈ D = [60, 000, 70, 000] ⊂ R. The first lame parameter λ is fixed to be zero, in which case the corotated linear elasticity constitutive law simplifies to the As-Rigid-As-Possible energy (Sorkine & Alexa, 2007) . Specifically, we uniformly generate 4 samples of the shear modulus for training data. We simulate 200 time steps for each shear modulus. Therefore, a total of 800 time steps are used for training. We then randomly generate 3 samples of the shear modulus from D. We also simulate 200 time steps for these shear moduli and use them for testing. We use the same initial condition for training and testing. Autoencoder [Fulton et al. 2019] Autoencoder [Shen et al. 2021] 32X memory usage than ours 542X memory usage than ours 11X memory usage than ours Figure 25 : CROM agrees well with the ground truth simulation while consuming far less memory than prior discretization-dependent ROM approaches, including POD (Barbič, 2012) and neuralnetwork-based autoencoder approaches (Fulton et al., 2019; Shen et al., 2021) . The experimental setup is taken from Shen et al. (2021) .

M COMPARISON WITH PRIOR APPROACHES

We compare our approach with prior reduced-order approaches that generate fast PDE solutions via latent space dynamics. In total, we compare with four baselines: (1) the classic widely-adopted proper orthogonal decomposition (POD) method (Berkooz et al., 1993; Holmes et al., 2012) . Our POD implementation follows the PCA approach by Barbič & James (2005)  = U q(t), where U is a matrix of size P d by r. U is most easily constructed via a singular value decomposition of the discretized training data (Sifakis & Barbic, 2012) . Autoencoders have the same input and output as POD. However, the linear basis is replaced with nonlinear neural networks, - → f (t) = f (x 1 , t), . . . f (x i , t), . . . f (x P , t) T = h(q(t)), where h is the decoder portion of the autoencoder structure. Fulton et al. (2019) ; Shen et al. (2021) ; Lee & Carlberg (2020) all follow this formulation but differ in their particular architecture choices. We implement both POD and the autoencoder approaches in the same PyTorch framework as CROM. These approaches share exactly the same PDE time-stepping algorithm (step 2) during latent space dynamics. For POD and the autoencoder approach, the network inference step (step 1) and the network inversion step (step 3) employ the linear basis U and the nonlinear decoder function h, respectively.

M.1 SPEED

We measure the wall-clock computation time during the falling stage (before contact) of the gravityinduced impact experiment (see Appendix L.4.1 and Figure 8 ). In particular, since all model reduction methods (including our approach, POD, and the autoencoder approaches) share the same PDE timestepping algorithm (step 2), we measure its wall-clock computation cost to ensure the generalizability of the comparison. Since POD and the autoencoder approach are baked into a particular discretization, it does not allow for adaptive discretization. Therefore, the high-resolution mesh (2, 065 vertices, 9, 346 tetrahedra) has to be used throughout the falling stage even though there is minimal deformation. Consequently, its computation cost is significantly higher than our approach, which employs a low-resolution mesh (8 vertices, 5 tetrahedra) during the falling stage. Figure 26 shows that our approach is faster Furthermore, Figure 5 shows that our method is more accurate than POD on the image processing experiment, both visually and quantitatively. optimize training time but to achieve efficient latent space dynamics. Nevertheless, specifically for our implicit neural representation approach, future work may consider expediting training time via more advanced data structures and optimization (Müller et al., 2022; Liu et al., 2020; Martel et al., 2021; Takikawa et al., 2021) .

N COMPARISON WITH THE FULL-ORDER MODEL

To gauge the practical performance of our approach over the (unreduced) full-order model, we measure the sheer wall-clock performances of our reduced-order model and the full-order model. To facilitate fair comparison, we implement, optimize, and (fully) parallelize the full-order PDE solution and the reduced-order approach within the PyTorch framework (Paszke et al., 2019) without any other dependency. The full-order model and the reduced model share the same (parallelized) PDE solver code (step 2), and the only difference between them is the neural network evaluation in the reduced model. To encourage fair comparison, for operations involving neural networks (step 1 and step 3), we do not use customized SIMD vectorizations, CUDA kernels, or highly optimized inference libraries such as TensorRT (Vanholder, 2016) , though these optimizations should lead to a further performance gain of our algorithm. In addition, having both models implemented in the same framework allows us to compare them under the same computing environments, avoiding biased comparison where the full-order model runs on a significantly limited computing resource (e.g., 8-core CPU) while the reduced-order model runs on a high-performance computing device (e.g., 5120-core GPU). Figure 29 reports the wall clock time of our reduced-order method and the full-order method across different computing platforms. While our approach is significantly faster on all computing platforms, its strongest speedup is obtained on the CPUs. This is unsurprising since the computationally expensive full-order model benefits more from high-end data-center GPUs (e.g., V100) than the reduced-order model. In addition, because the neural network in the reduced-order model only employs a few 60 by 60 matrices (d = 3, β = 20), inference on CPU is also efficient. This opens doors for employing our models on limited computing platforms such as mobile and VR devices. Figure 31: Accuracy vs. Speed: our approach under different setups. In comparison, the ground truth full-order takes 139.855s on CPUs and 24.086s on GPUs (see Figure 29 ). Our approach obtains significant speedups with less than 5% error under various settings. Figure 30 further breaks down the computation time spent on each component of the reduced-order model. While the overhead of the neural network operations (step 1 and step 3) is non-negligible, the bottleneck of the algorithm remains to be PDE time-stepping (step 2). In particular, the network inversion cost is very low, thanks to the linearized inversion solver discussed in Appendix C. Figure 31 reports the accuracy and the speed of our approach using different MLP layer widths (β), neural network inversion methods (nonlinear least squares (NLS) vs. linear least squares (LLS)), computing platforms (CPU vs. GPU), and integration sample sizes (|M|). In general, we observe similar accuracies between NLS and LLS (e. 

O LATENT-SPACE TRAJECTORY

A key element of CROM is latent space dynamics, in which the latent space vector evolves nonlinearly over time under explicit PDE constraints. Figure 32 demonstrates a nonlinear latent space trajectory from the image smoothing experiment. Guided by the PDE, the latent space vector traverses through the latent space and obtains the desirable smoothing effects in the image (see Figure 33 ). After the simulation is finished, as a postprocessing experiment, we linearly interpolate and extrapolate inside the latent space (see Figure 32 ). The interpolation end points are the latent vector for the unmodified image and the latent vector for the final image determined by the latent space dynamics. Figure 34 demonstrates the corresponding smooth transitions in the image space.

P STABILITY ANALYSIS

Figure 35 systematically studies the stability of our method. CROM exhibits much larger stable time step sizes than its unreduced counterpart. This is consistent with the theoretical analysis reported in the literature (Bach et al., 2018) . 

Q LATENT-SPACE DYNAMICS PSEUDOCODE

Algorithm 3: Latent-space Dynamics Input: Latent space vector q n Output: Next-time-step latent space vector q n+1 Network inference: gather full-space information (Algorithm 4). PDE time-stepping with boundary conditions: update full-space information via differential equations, e.g., the elastodynamics equation (more details available from Appendix G to Appendix L). Network inversion: optimally project the updated full-space information onto the low-dimensional embedding (Algorithm 5). Algorithm 4: Network inference: gather full-space information Input: Latent space vector q n Output: Full space information f n , ∇f n , ḟ n for y ∈ M do Compute the vector field value: f (y, t n ) = g θg (y, q n ). Compute the temporal gradient either by differentiating the network ḟ (y, t n ) = ∂g θg ∂q qn , where qn = (q n -q n-1 )/∆t, or by numerical approximation (Appendix E). Compute the spatial gradient either by differentiating the network, ∇f (y, t n ) = ∇ y g θg , or by numerical approximation (Appendix E). end Algorithm 5: Network inversion: optimally project the updated full-space information onto the low-dimensional embedding Input: Updated full space information f n+1 Output: Updated latent space vector q n+1 Find the latent space vector q n+1 that best matches the evolved configuration f n+1 by solving the least-squares: min q n+1 ∈R r y∈M ∥g θg (y, q n+1 ) -f (y, t n+1 )∥ 2 2 . 



The latent space vector is also known as the feature, subspace, or state vector; or the generalized coordinates.



f (x P , t) T , latent space trajectory 𝒒 𝑡 ! ∈ ℝ " , 𝑟 ≪ 𝑃𝑑 𝒒(𝑡 # ) 𝒒(𝑡 $ )

Figure 3: Latent space dynamics: temporally evolve from one latent space vector to another, governed by the PDE. The entire pipeline only involves degrees of freedom from a small spatial subset M, where |M| ≪ P .

Figure4: Thermodynamics. Left: integration samples (M). Greedily selecting the samples (blue) allows us to use significantly fewer degrees of freedom than the full-order simulation (purple) while getting a much higher accuracy than naive uniform sampling (red). The depicted field is temperature governed by the heat equation after 100 time steps. Right: as dictated by the PDE, the reduced system is conservative: total energy (stored thermal energy plus cumulative flux at boundary) is conserved.

Figure 5: Image processing: comparison with POD. Ground truth solution uses P = 65, 536 pixels.(a) Visually, our approach better captures the sharp initial state and the smoothed finial state than POD. (b) Quantitatively, CROM obtains a higher PSNR than POD (↑ the higher the better). (c) Our approach also uses an order-of-magnitude less memory than POD (↓ the lower the better). Both POD and our approach use the same latent space dimension (r = 3).

Figure 6: Breaking the Kolmogorov barrier: CROM outperforms POD and convolutional autoencoders(Lee & Carlberg, 2020) in tracking both the Advection and Burgers' trajectories. For both cases, we use the intrinsic solution-manifold dimension(Lee & Carlberg, 2020), the lower bound of r, as the latent space dimension r. To isolate the source of error, no hyper-reduction method is applied.

Figure7: The Karman vortex street. Our latent space dynamics solver accurately captures the flow both before and after the bifurcation: the steady state and the periodic vortex shedding regime. The visualized quantity is the the curl of the velocity field. Our approach even generalize well beyond the training temporal range, thanks to the kinematics-approximation-only nature of CROM.

Figure 8: Adaptive discretization in solid mechanics. A falling deformable body impacts two static objects. (a and b) Prior approaches(Barbič & James, 2005;Fulton et al., 2019;Shen et al., 2021) only support reduced-order dynamics with fixed discretizations, forcing a tough choice between accuracy and computational cost; (c) our approach allows dynamic mesh adaptivity to balance accuracy and cost as problem difficulty varies over time. (d) Our resolution-independent encoding of the deformation map allows us to visualize the field using any method (e.g., a high resolution mesh) without regard to the computational mesh. CROM's speedup over prior discretization-dependent ROM methods is observed for 1-32 threads, and emphasized in the case of limited computational resources. To ensure representativeness of prior ROM architectures, we report timing strictly for the PDE-time-stepping stage (Step 2), which is shared among all prior ROM architectures.

Figure 9: Solid mechanics (a) The ground truth is generated via the full-order PDE solver. (b) Our approach is 41-109× faster than the ground truth while capturing detailed shearing and volumepreserving behaviors (1.46% error). (c) Prior approach(Barbič, 2012) of the same latent space dimension consumes 26× more memory while suffering from volume-gain artifacts (5.76% error). (d) These simulations adopt a tetrahedral discretization (d, pink mesh, P = 66, 608). Instead of using the expensive high-resolution mesh, our approach computes dynamics using very few integration samples (d, colorful spheres, |M| = 40). (e) Naive down-resolution of the ground truth simulation yields a similar runtime but leads to significantly worse quality. (f) After training, our model can capture a wide range of material properties. (g) The same manifold-parameterization function architecture (different network weights) can also be used for model-reducing point-cloud based simulation (reproduced from the work by(Chen et al., 2023)). The speedup plot on the right demonstrates the effectiveness of our approach on diverse computing platforms. Disclaimer: the authors do not support animal cruelty.

Robust sampling M = {x k } // randomly initialize integration sample set, k ∈ {1, . . . , P } while true do res_vec = Calculate_Residual(M) if M etric(res_vec) < target_accuracy then return M else indices = M ax_Indices(res_vec, Q) // find Q indices with the largest individual residuals

Function M etric(res_vec):return mean(res_vec) + max(res_vec)

Figure 11: Gradient errors evaluated at the centers of tetrahedra. Numerical differentiation (specifically, finite element with linear basis) significantly improves the gradient accuracy of SIREN.

Figure 12: Reconstruction accuracy vs. the MLP layer width (β). For each setup, we repeat the training for 8 times. The mean values and the error bars are shown. Increasing MLP layer width leads to higher accuracy.

Figure13: Network inference cost (CPU and GPU) vs. the MLP layer width (β). Inference cost (over 100 runs) increases as the network size increases. CPU suffers from a more severe performance drop than GPU.

Figure14: PDE solution accuracy vs. the number of samples. Increasing integration sample counts yields higher accuracies on both training and testing data.

REDUCED-ORDER MODELInstead of a finite-difference treatment of the spatial gradient, we compute the gradient via direct differentiation of the network. The latent space dimension is r = 16 and the width of the MLP is β = 128.G.4 TRAINING AND TESTING DATAWe generate training data by setting ν to different piecewise constant functions of three regions, i.e., the parameter vector µ ∈ D = [0.2, 1.0] 3 ⊂ R 3 . Figure15displays the initial and the final states of training data. Notice how different regions diffuse with different speeds due to the spatially varying ν. In total, 8 PDE temporal sequences (of 100 time steps) with different ν's are generated for training.

Figure 15: Thermodynamics: training data.

Figure 16: Thermodynamics: reduced-order simulation on the testing dataset.

Figure 18: Image Processing:training dataset. The blurred region is numbered.

Figure 19: Image Processing: our reduced-order simulation successfully blurs the target regions in the testing dataset. The blurred region is numbered.

Figure 20: Blurring of portions of the image (20 time steps). (b) Ground truth solution uses all P = 65, 536 pixels. (c) Our approach uses very few integration samples (|M| = 63, blue circles) and obtains a similar result as the ground truth (PSNR 36.1). (d) Naive uniform sampling (|M| = 64, blue circles) leads to a poor agreement with the ground truth (PSNR 23.4).

TRAINING AND TESTING DATA We generate training data by varying the spatially-constant viscosity: µ = ν ∈ D = [0, 0.02] ⊂ R. We sample 3 different viscosities from D. For each viscosity ν, we run a simulation of 50 time steps. In total, 150 time steps are used for training. For testing, we first sample 3 viscosity values from D = [0, 0.02]. Furthermore, we also sample 3 values from the extrapolated region D extrapolate = [0.02, 0.1].

Figure 27: Memory comparison between our CROM and Lee & Carlberg (2020)

Figure 28: Training time for the elasticity example presented in Figure 25. POD(Barbič, 2012) uses significantly less time than neural-network-based methods, including(Fulton et al., 2019;Shen et al., 2021) and ours.

Figure28reports the offline training time for different ROM approaches. All neural-network-based methods, including ours, take longer time to train than the linear POD approach. We emphasize that the timing data is purely informative and calculated post-hod. The main target of our work is not to

β=5, LLS, GPUs, |M|=40 β=5, NLS, GPUs, |M|=10 β=5, NLS, GPUs, |M|=40 β=20, LLS, GPUs, |M|=10 β=20, LLS, GPUs, |M|=40 β=20, NLS, GPUs, |M|=10 β=20, NLS, GPUs, |M|=40 β=5, LLS, CPUs, |M|=10 β=5, LLS, CPUs, |M|=40 β=5, NLS, CPUs, |M|=10 β=5, NLS, CPUs, |M|=40 β=20, LLS, CPUs, |M|=10 β=20, LLS, CPUs, |M|=40 β=20, NLS, CPUs, |M|=10 β=20, NLS, CPUs, |M|=40

Figure31reports the accuracy and the speed of our approach using different MLP layer widths (β), neural network inversion methods (nonlinear least squares (NLS) vs. linear least squares (LLS)), computing platforms (CPU vs. GPU), and integration sample sizes (|M|). In general, we observe similar accuracies between NLS and LLS (e.g., [β = 20, LLS, GPUs, |M| = 40] vs. [β = 20, NLS, GPUs, |M| = 40]), while LLS is significantly faster. Increasing sample size lead to higher accuracies but also require longer computation time (e.g., [β = 5, NLS, CPUs, |M| = 10] vs. [β = 5, NLS, CPUs, |M| = 40]). Increasing the MLP layer width yields higher accuracies but also increases the computation time. In particular, the computation time is increased more on CPUs (e.g., [β = 5, LLS, CPUs, |M| = 40] vs. [β = 20, LLS, CPUs, |M| = 40]) than on GPUs.

Figure32: Latent-space trajectory. We visualize the first two principal components of the latent space vector. The red numbers displayed on top of the trajectory are time step numbers. The simulation starts from 0 and ends at 320. The corresponding images of these time steps are shown in Figure33. Overall, we observe a smooth nonlinear trajectory over time. In addition, we also linearly interpolate and extrapolate inside the latent space. The corresponding images are displayed in Figure34.

Figure 33: Reduce-order simulation of image smoothing at at different time steps t. The corresponding latent space trajectory is visualized in Figure 32.

Figure 34: Linear interpolation and extrapolation of the latent space vector. We observe smooth transitions. The corresponding latent space vector values are shown in Figure 32.

Figure35: Stability analysis. We systematically increase the time step size (∆t). Our method remains stable in a wide range of time step size and maintains an error close to 1%. By contrast, the full-order model (that was used for training data generation) goes unstable as the time step increases and suffers from major visual artifacts.

Since the manifold-parameterization function construction guarantees that r ≪ P d, we

;Barbič (2012); (2) the neural-network-based autoencoder approach byFulton et al. (2019); (3) the neural-network-based autoencoder approach byShen et al. (2021); (4) the convolutional autoencoder (CAE) approach byLee & Carlberg (2020). All these four baselines are discretization-dependent.SIREN, PINN, or GNN are related to our work but are very different in nature. While their goal is to directly solve PDEs with neural network architectures, our goal is to leverage neural networks for dimension reduction. Therefore, we do not compare with them and focus on comparison with other dimension reduction methods.

Dimension reduction statistics of our reduced-order model R REDUCED-ORDER MODEL'S STATISTICS CROM obtains considerable dimension reductions across different PDEs (Table1).

ACKNOWLEDGMENTS

This work was supported in part by the National Science Foundation (Grant CBET-17-06689), Meta, Natural Sciences and Engineering Research Council of Canada (Discovery Grant), and SideFX. We thank Honglin Chen and Rundi Wu for their insightful discussions. We thank Rundi Wu for sharing his implementation of Stable Fluids. We thank Raymond Yun Fei for rendering advice. We thank Keenan Crane for the "Origins of the Pig" mesh.

Appendix

We highly encourage viewing the supplementary video where our results are best illustrated. Intrinsic solution-manifold dimension

A ADDITIONAL LITERATURE REVIEW

Figure 22 : Reconstruction errors. Our approach yields higher accuracies than the convolutional autoencoder approach by Lee & Carlberg (2020) across different latent space dimensions. Our method's accuracy advantage is particularly noticeable at the intrinsic solution-manifold dimension (see the vertical dashed line).

J.2 FULL-ORDER MODEL

The full-order model discretizes the 1D spatial field with a regular grid using 256 equally spaced samples. The PDE is temporally discretized via forward Euler.

J.3 REDUCED-ORDER MODEL

We vary the latent space dimension r and use a fixed width of the MLP β = 64.

J.4 TRAINING AND TESTING DATA

We generate training and testing data by varying the parameter µ D . We set the training parameter to µ Dtrain = {0.015 + (0.015/7)j} j=0,1,...7 , resulting in 8 training cases. For online testing, we set the parameter to µ Dtest = 0.021.

J.5 RESULTS

Since we can uniquely parameterize the family of training and testing data with two variables t and µ D , the intrinsic solution-manifold dimension (Lee & Carlberg, 2020) is p * = 2, which is the lower bound for the latent space dimension r.Figure 6b reports the result for r = p * = 2. While both POD and convolutional autoencoder (CAE) approach from Lee & Carlberg (2020) display clear artifacts, CROM agrees well with the ground truth. Such an agreement at the latent space dimension lower bound (i.e., r = p * ) shows that CROM is very close to achieving the optimal performance of any nonlinear trial manifold.Furthermore, Figure 22 demonstrates the accuracy advantage of our approach over the CAE approach across different latent space dimensions. as physics informed neural network (Raissi et al., 2019) , are well-known to fail in this case (Chuang & Barba, 2022) .We follow the geometry and material parameters outlined by Nabizadeh et al. (2022) . No hyperreduction is considered. Figure 7 shows that our latent space dynamics method faithfully represents the ground truth flow: from the initial steady state to the bifurcated vortex shedding regime. The ground truth solver utilizes a 100 by 200 grid, with a total 40, 000 degrees of freedom. By contrast, our method captures the same level of details while employing a latent space dimension of only r = 4. Due to CROM's kinematics-approximation-only nature, our approach also generalize beyond the training temporal range.Future work should test CROM on more challenging boundary conditions (Copeland et al., 2022) and chaotic systems involving non-Markovian effects (Pan & Duraisamy, 2018) .

L SOLID MECHANICS L.1 CONTINUOUS PDE

We solve the elastodynamics equation arisen from solid mechanics for deformable soft body modeling:where ϕ(X, t) maps the undeformed (reference) position (X) of an arbitrary material point from the reference configuration Ω to its deformed (current) position (x) at time t. Proper Dirichlet and Neumann boundary conditions are applied.We assume the first Piola-Kirchhoff stress P is strictly a function of the deformation gradient F = ∇ϕ. In particular, we adopt the corotated linear elasticity constitutive law,where R is the rotation tensor from the polar decomposition of the deformation gradient F = RS, λ and µ are the first lame parameter (Pa) and the second lame parameter / shear modulus (Pa). The proposed framework also works with other elasticity constitutive laws and can be easily extended to capture hysteresis and inelasticity. 

L.4.2 TORSION AND TENSION

We study another common solid mechanics application where the material undergoes torsion and tension. The left and the right boundaries kinematically move at constant translational and angular velocities of the same magnitude but in opposite directions.We solve the parameterized family of problem where µ = (µ, λ) ∈ D = [6, 200, 20, 000] × [10, 000, 65, 000] ⊂ R 2 . We obtain 9 training samples via uniform full-factorial sampling of the parameter space. We simulate 100 time steps for each training sample. We then use the Latin hypercube method to randomly generate another 4 testing samples. We also simulate 100 time steps for these testing samples. Training and testing data adopt the same initial condition.

L.5 RESULTS

Figure 8 demonstrates a testing case (µ = 62500, 2.68% error from the full-order ground truth) for the experiment from Appendix L.4.1. Since our method has a discretization-agnostic architecture, we can adapt the simulation resolution to improve efficiency. The adaptive simulation adopts a contact-based oracle that switches from the low-resolution mesh to the high-resolution mesh once a contact event is detected. Before the start of the simulation, we generated both the low-resolution and the high-resolution meshes using TetWild (Hu et al., 2018) with small and large ideal edge lengths.After training on the experiment from Appendix L.4.2, our method captures a wide range of shearing and volume-preserving behaviors (see Figure 24 ).Furthermore, to showcase CROM's ability to handle large deformations, we reproduce the dinosaur example by Shen et al. (2021) . Figure 25 shows that CROM accurately captures the large deformations of the dinosaur. By contrast, under the same latent space dimension, POD has a limited expressivity and suffers from visual artifacts: the head and the tail's motions are constrained. This observation is consistent with the results reported by Shen et al. (2021) where nonlinear approaches outperform linear approaches. In terms of memory consumptions, CROM is orders-of-magnitude more efficient than these prior discretization-dependent ROM approaches (Barbič, 2012; Fulton et al., 2019; Shen et al., 2021) . than the prior ROM approach with different CPU thread counts. Our method's advantage is most obvious when there is extremely limited computational resource (1 CPU thread). Since this speedup comes from the discretization independent nature of CROM, we expect it to generalize to other prior discretization-dependent ROM approaches as well. However, for application where remeshing is unnecessary, we do not expect this kind of speedup.Hyper-reduction techniques (An et al., 2008) , such as the integration samples introduced in our approach, can also be employed to speed up the high-resolution mesh solution by reducing spatial samples. However, this is orthogonal to our adaptive discretization contribution, and the hyperreduction techniques can be employed to further accelerate the low-resolution mesh solution as well.

M.2 MEMORY

We compare the memory consumptions of our method, POD, and the autoencoder approaches with the same latent space dimension (r), both trained on data from the torsion and tension experiment (see Appendix L.4.2). For all implicit neural representation networks, we set the MLP width to be β = 20. Independent of latent space dimension (r), our network has a near-constant memory consumption (see Figure 10 ). By contrast, the memory consumptions of POD and the autoencoder approaches scale linearly with r. More importantly, since the output dimension of POD and the autoencoder is P d, their memory consumptions scale linearly with the number of discretized positions P . By contrast, our approach's memory consumption is independent of the number of spatial samples. Consequently, in this large-scale example that features P = 66, 608 vertices, we observe more than ten-fold advantages with our method across all latent space dimensions (see Figure 10 ).In addition, we also compare our method with POD on the image processing experiment (see Appendix H). CROM uses an order-of-magnitude less memory than POD (see Figure 5c ). In Figure 27 , we also demonstrate CROM's memory advantage over the convolutional autoencoder approach by Lee & Carlberg (2020) .Even though the overall memory consumption of our approach is significantly lower, our approach's per-sample evaluation cost is higher. For any given discretized spatial sample, POD only requires a vector-matrix multiplication of a d by r matrix. By contrast, our approach needs to go through the entire MLP that employs dβ by dβ matrices. The autoencoder approaches share similar computation overheads.

M.3 ACCURACY

Figure 10 demonstrates the accuracy of the networks studied in the previous memory section. Our method consistently offers orders-of-magnitude smaller training accuracies. This accuracy advantage is especially noticeable with small latent space dimensions (e.g., r = 2). Further increasing the latent space dimension leads to higher accuracies with POD and the autoencoder approaches but not our approach. As discussed in Appendix F, our approach's accuracy depends more saliently on the MLP width as opposed to the latent space dimension. Furthermore, unlike these discretization-dependent approaches, our continuous manifold-parameterization function also facilitates accurate handling of adaptive spatiotemporal data (Pan et al., 2023) .

