HYPERDEEPONET: LEARNING OPERATOR WITH COM-PLEX TARGET FUNCTION SPACE USING THE LIMITED RESOURCES VIA HYPERNETWORK

Abstract

Fast and accurate predictions for complex physical dynamics are a significant challenge across various applications. Real-time prediction on resource-constrained hardware is even more crucial in real-world problems. The deep operator network (DeepONet) has recently been proposed as a framework for learning nonlinear mappings between function spaces. However, the DeepONet requires many parameters and has a high computational cost when learning operators, particularly those with complex (discontinuous or non-smooth) target functions. This study proposes HyperDeepONet, which uses the expressive power of the hypernetwork to enable the learning of a complex operator with a smaller set of parameters. The DeepONet and its variant models can be thought of as a method of injecting the input function information into the target function. From this perspective, these models can be viewed as a particular case of HyperDeepONet. We analyze the complexity of DeepONet and conclude that HyperDeepONet needs relatively lower complexity to obtain the desired accuracy for operator learning. HyperDeepONet successfully learned various operators with fewer computational resources compared to other benchmarks.

1. INTRODUCTION

Operator learning for mapping between infinite-dimensional function spaces is a challenging problem. It has been used in many applications, such as climate prediction (Kurth et al., 2022) and fluid dynamics (Guo et al., 2016) . The computational efficiency of learning the mapping is still important in real-world problems. The target function of the operator can be discontinuous or sharp for complicated dynamic systems. In this case, balancing model complexity and cost for computational time is a core problem for the real-time prediction on resource-constrained hardware (Choudhary et al., 2020; Murshed et al., 2021) . Many machine learning methods and deep learning-based architectures have been successfully developed to learn a nonlinear mapping from an infinite-dimensional Banach space to another. They focus on learning the solution operator of some partial differential equations (PDEs), e.g., the initial or boundary condition of PDE to the corresponding solution. Anandkumar et al. (2019) proposed an iterative neural operator scheme to learn the solution operator of PDEs. Simultaneously, Lu et al. (2019; 2021) proposed a deep operator network (DeepONet) architecture based on the universal operator approximation theorem of Chen & Chen (1995). The DeepONet consists of two networks: branch net taking an input function at fixed finite locations, and trunk net taking a query location of the output function domain. Each network provides the p outputs. The two p-outputs are combined as a linear combination (inner-product) to approximate the underlying operator, where the branch net produces the coefficients (p-coefficients) and the trunk net produces the basis functions (p-basis) of the target function. While variant models of DeepONet have been developed to improve the vanilla DeepONet, they still have difficulty approximating the operator for a complicated target function with limited computational resources. Lanthaler et al. (2022) and Kovachki et al. (2021b) pointed out the limitation of linear approximation in DeepONet. Some operators have a slow spectral decay rate of the Kolmogorov n-width, which defines the error of the best possible linear approximation using an n-dimensional space. A large n is required to learn such operators accurately, which implies that the DeepONet requires a large number of basis p and network parameters for them. 2022) proposed a nonlinear manifold decoder (NOMAD) framework by using a neural network that takes the output of the branch net as the input along with the query location. Even though these methods reduce the number of basis functions, the total number of parameters in the model cannot be decreased. The trunk net still requires many parameters to learn the complex operators, especially with the complicated (discontinuous or non-smooth) target functions. In this study, we propose a new architecture, HyperDeepONet, which enables operator learning, and involves a complex target function space even with limited resources. The HyperDeepONet uses a hypernetwork, as proposed by Ha et al. ( 2017), which produces parameters for the target network. Wang et al. ( 2022) pointed out that the final inner product in DeepONet may be inefficient if the information of the input function fails to propagate through a branch net. The hypernetwork in HyperDeepONet transmits the information of the input function to each target network's parameters. Furthermore, the expressivity of the hypernetwork reduces the neural network complexity by sharing the parameters (Galanti & Wolf, 2020) . Our main contributions are as follows. • We propose a novel HyperDeepONet using a hypernetwork to overcome the limitations of DeepONet and learn the operators with a complicated target function space. The DeepONet and its variant models are analyzed primarily in terms of expressing the target function as a neural network (Figure 4 ). These models can be simplified versions of our general HyperDeepONet model (Figure 5 ). • We analyze the complexity of DeepONet (Theorem 2) and prove that the complexity of the HyperDeepONet is lower than that of the DeepONet. We have identified that the Deep-ONet should employ a large number of basis to obtain the desired accuracy, so it requires numerous parameters. For variants of DeepONet combined with nonlinear reconstructors, we also present a lower bound for the number of parameters in the target network. • The experiments show that the HyperDeepONet facilitates learning an operator with a small number of parameters in the target network even when the target function space is complicated with discontinuity and sharpness, which the DeepONet and its variants suffer from. The HyperDeepONet learns the operator more accurately even when the total number of parameters in the overall model is the same.

2. RELATED WORK

Many machine learning methods and deep learning-based architectures have been successfully developed to solve PDEs with several advantages. One research direction is to use the neural network directly to represent the solution of PDE (E & Yu, 2018; Sirignano & Spiliopoulos, 2018) . The physics-informed neural network (PINN), introduced by Raissi et al. ( 2019), minimized the residual of PDEs by using automatic differentiation instead of numerical approximations. There is another approach to solve PDEs, called operator learning. Operator learning aims to learn a nonlinear mapping from an infinite-dimensional Banach space to another. Many studies utilize the convolutional neural network to parameterize the solution operator of PDEs in various applications (Guo et al., 2016; Bhatnagar et al., 2019; Khoo et al., 2021; Zhu et al., 2019; Hwang et al., 2021) . The neural operator (Kovachki et al., 2021b) is proposed to approximate the nonlinear operator inspired by Green's function. Li et al. (2021) extend the neural operator structure to the Fourier Neural



Hadorn (2022) investigated the behavior of DeepONet, to find what makes it challenging to detect the sharp features in the target function when the number of basis p is small. They proposed a Shift-DeepONet by adding two neural networks to shift and scale the input function. Venturi & Casey (2023) also analyzed the limitation of DeepONet via singular value decomposition (SVD) and proposed a flexible DeepONet (flexDeepONet), adding a pre-net and an additional output in the branch net. Recently, to overcome the limitation of the linear approximation, Seidman et al. (

