MULTI-GRID TENSORIZED FOURIER NEURAL OPERA-TOR FOR HIGH RESOLUTION PDES Anonymous

Abstract

Memory complexity and data scarcity are two main pressing challenges in learning solution operators of partial differential equations (PDE ) at high resolutions. These challenges limited prior neural operator models to low/mid-resolution problems rather than full scale real-world problems. Yet, these problems possess spatially local structures that is not used by previous approaches. We propose to exploit this natural structure of real-world phenomena to predict solutions locally and unite them into a global solution. Specifically, we introduce a neural operator that scales to large resolutions by leveraging local and global structures through decomposition of both the input domain and the operator's parameter space. It consists of a multi-grid tensorized neural operator (MG-TFNO ), a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization. MG-TFNO employs a novel multi-grid based domain decomposition approach to exploit the spatially local structure in the data. Using the FNO as a backbone, its parameters are represented in a highorder latent subspace of the Fourier domain, through a global tensor factorization, resulting in an extreme reduction in the number of parameters and improved generalization. In addition, the low-rank regularization it applies to the parameters enables efficient learning in low-data regimes, which is particularly relevant for solving PDE s where obtaining ground-truth predictions is extremely costly and samples, therefore, are limited. We empirically verify the efficiency of our method on the turbulent Navier-Stokes equations where we demonstrate superior performance, with 2.5 times lower error, 10× compression of the model parameters, and 1.8× compression of the input domain size. Our tensorization approach yields up to 400x reduction in the number of parameter without loss in accuracy. Similarly, our domain decomposition method gives a 7× reduction in the domain size while slightly improving accuracy. Furthermore, our method can be trained with much fewer samples than previous approaches, outperforming the FNO when trained with just half the samples.

1. INTRODUCTION

Real-world scientific computing problems often time require repeatedly solving large-scale and high-resolution partial differential equations (PDE s). For instance, in weather forecasts, large systems of differential equations are solved to forecast the future state of the weather. Due to internal inherent and aleatoric uncertainties, multiple repeated runs are carried out by weather scientists every day to quantify prediction uncertainties. Conventional PDE solvers constitute the mainstream approach used to tackle such computational problems. However, these methods are known to be slow and memory-intensive. They require an immense amount of computing power, are unable to learn and adapt based on observed data, and often times require sophisticated tuning (Slingo & Palmer, 2011; Leutbecher & Palmer, 2008; Blanusa et al., 2022) . Neural operators are a new class of models that aim at tackling these challenging problems (Li et al., 2020b) . They are mappings between function spaces whose trained models emulate the solution operators of PDE s (Kovachki et al., 2021b) . In the context of PDE s, these deep learning models are orders of magnitude faster than conventional solvers, can easily learn from data, can incorporate physically relevant information, and recently enabled solving problems deemed to be unsolvable with the current state of available PDE methodologies (Liu et al., 2022; Li et al., 2021b) . Among Figure 1 : Overview of our approach. First (left), a multi-grid approach is used to create coarse to fine inputs that capture high-resolution details in a local region while still encoding global context. The resulting regions are fed to a tensorized Fourier operator (middle), the parameters of which are jointly represented in a single latent space via a low-rank tensor factorization (here, a Tucker form). Here F denotes Fourier transform. Finally, the outputs (right) are stitched back together to form the full result. Smoothness in the output is ensured via the choice of loss function. Method L 2 test error # Params Model CR Domain CR FNO 2.54% 58 M 0× 0× TFNO [Tucker] 1.39% 41 M 1.5× 0× TFNO [CP] 2.24% 130 482× 0× MG-FNO 1.43% 58 M 0× 1.4× MG-TFNO [Tucker] 0.85% 5.5 M 10× 1.78× MG-TFNO [Tucker] 1.89% 5.5 M 10× 7× Table 1 : Overview of the performance on the relative L 2 test error of our MG-TFNO approach, compared with its parts TFNO and MG-FNO and the regular FNO, on Navier-Stokes. CR stands for compression ratio. For each method, we report the relative L 2 error, the number of parameters, and the compression ratios for both the input domain and the number of parameters. Tensorization and multi-grid domain decomposition both individually improve performance while enabling space savings. The two techniques combined lead to further improvements, enabling huge compression for both input and parameter, while largely outperforming all other approaches. neural operator models, Fourier neural operators (FNO s) in particular, have seen successful application in scientific computing for the task of learning the solution operator to PDE s as well as in computer vision for classification, in-painting, and segmentation (Li et al., 2020a; Kovachki et al., 2021a; Guibas et al., 2021) . By leverage spectral theory, FNO s have successfully advanced frontiers in weather forecasts, carbon storage, and seismology (Pathak et al., 2022; Wen et al., 2022; Yang et al., 2021) . While FNO s have shown tremendous speed-up over classical numerical methods, their efficacy can be limited due to the rapid growth in memory needed to represent complex operators. This growth may become a bottleneck in their application to high-resolution physical simulations such as climate or materials modeling. In general, despite significant speed up and better flexibility, prior works on neural operators suffer from similar memory complexity issues as conventional solvers do on high-resolution problems. In the worst case, the large memory complexity is required and, in fact, is unavoidable due to the need for resolving fine scale features globally. However, many real-world problems, possess local structure that is not currently exploited by neural operator methods.For instance, consider a weather forecast where predictions for the next hour are heavily dependent on the weather conditions in local regions and minimally on global weather conditions. Incorporating and learning this local structure of the underlying PDE s is the keys to overcoming the curse of memory complexity.

