NIERT: ACCURATE NUMERICAL INTERPOLATION THROUGH UNIFYING SCATTERED DATA REPRESENTA-TIONS USING TRANSFORMER ENCODER

Abstract

Numerical interpolation for scattered data, i.e., estimating values for target points based on those of some observed points, is widely used in computational science and engineering. The existing approaches either require explicitly pre-defined basis functions, which makes them inflexible and limits their performance in practical scenarios, or train neural networks as interpolators, which still have limited interpolation accuracy as they treat observed and target points separately and cannot effectively exploit the correlations among data points. Here, we present a learning-based approach to numerical interpolation for scattered data using encoder representation of Transformers (called NIERT). Unlike the recent learning-based approaches, NIERT treats observed and target points in a unified fashion through embedding them into the same representation space, thus gaining the advantage of effectively exploiting the correlations among them. The specially-designed partial self-attention mechanism used by NIERT makes it escape from the unexpected interference of target points on observed points. We further show that the partial self-attention is essentially a learnable interpolation module combining multiple neural basis functions, which provides interpretability of NIERT. Through pre-training on large-scale synthetic datasets, NIERT achieves considerable improvement in interpolation accuracy for practical tasks. On both synthetic and real-world datasets, NIERT outperforms the existing approaches, e.g., on the TFRD-ADlet dataset for temperature field reconstruction, NIERT achieves an MAE of 1.897 × 10 -3 , substantially better than the state-of-the-art approach (MAE: 27.074 × 10 -3 ).

1. INTRODUCTION

Scattered data consist of a collection of points and corresponding values, in which the points have no structure besides their relative positions (Franke & Nielson, 1991) . Scattered data arise naturally and widely from a large variety of theoretical and practical scenarios, including solving partial differential equations (PDEs) (Franke & Nielson, 1991; Liu, 2016) , temperature field reconstruction (Chen et al., 2021) , and time series interpolation (Lepot et al., 2017; Shukla & Marlin, 2019) . These scenarios usually require numerical interpolation for scattered data, i.e., estimating values for target points based on those of some observed points. For example, in the task of temperature field reconstruction for micro-scale electronics, interpolation methods are used to obtain the real-time working environment of electronic components from limited measurements, and imprecise interpolation might significantly increase the cost of predictive maintenance. Thus, accurate approaches to numerical interpolation are highly desirable. A large number of approaches have been proposed for interpolating scattered data. Traditional approaches use schemes that approximate the target function by a linear combination of some basis functions (Heath, 2018) , in which the basis functions should be explicitly pre-defined. To adapt to different scenarios, various types of basis functions have been devised. These schemes can theoretically guarantee the interpolation accuracy when sufficient observed points are available; however, they have also been shown to be ineffective for sparse data points (Bulirsch et al., 2002) . In addition, the schemes can hardly learn from the experience of interpolation in similar tasks. Recent progress has exhibited an alternative strategy that uses neural networks to learn target functions directly from the given observed points. For example, conditional neural processes (CNPs) (Garnelo et al., 2018) and their extensions (Kim et al., 2019; Lee et al., 2020b; a) model the conditional distribution of regression functions given the observed points, and Chen et al. ( 2021) proposed to use vanilla Transformer (Vaswani et al., 2017) to solve interpolation task in temperature field reconstruction. All of these approaches use an "encoder-decoder" architecture, in which the encoder learns the representations of observed points while the decoder estimates values for target points. Ideally, observed points and target points should be processed in a unified fashion because they are from the same domain. However, these approaches treat them separately and cannot effectively exploit the correlation between them. Here, we present an approach to numerical interpolation that can effectively exploit the correlations between observed points and target points. Our approach is a learning-based approach using the encoder representations of Transformers (thus called NIERT). The key elements of NIERT include: i) the use of mask mechanism, which complements target points with learnable mask tokens as performed by masked language model, say BERT (Devlin et al., 2018) , and thus enables processing both observed and target points in a unified fashion, ii) a novel partial self-attention model that calculates attentions among the given data points except for the influence of target points onto the observed points at each layer, thus gaining the advantages of exploiting the correlations between these two types of points and, more importantly, avoiding the unexpected interference of target points on observed points at the same time, and iii) the use of the pre-training technique, which leverages large-scale low-cost synthetic data to build powerful, general-purpose, and transferable pre-trained interpolation models. The use of the pre-trained models significantly improves generalization ability and interpolation accuracy. The main contributions of this study are summarized as follows. • We propose an accurate approach to numerical interpolation for scattered data. On representative datasets including synthetic and real-world datasets, our approach outperforms the state-of-the-art approaches and shows potential in a wide range of application fields. • We propose a novel partial self-attention mechanism to make Transformer incorporated with strong inductive bias for interpolation tasks, i.e., it can effectively exploit the correlation among two types of points and, at the same time, avoids the interference of one type of points onto the others. • We also demonstrate the essence of NIERT, i.e., a learnable interpolation approach using neural basis functions, through illustrating the deep connection between partial selfattention mechanism and traditional interpolation approaches. • To the best of our knowledge, this study is the first work to propose the pre-trained models for scatter-data interpolation. We have verified that such interpolation pre-trained models can be generalized to a wide range of interpolation tasks.

2.1. TRADITIONAL INTERPOLATION APPROACHES FOR SCATTERED DATA

Traditional interpolation approaches for scattered data use explicitly pre-defined basis functions to construct interpolation function, e.g., Lagrange interpolation, Newton interpolation (Heath, 2018), B-spline interpolation (Hall & Meyer, 1976 ), Shepard's method (Gordon & Wixom, 1978) , Kriging (Wackernagel, 2003) , and radial basis function interpolation (RBF) (Powell, 1987; Fornberg & Zuev, 2007) . Among these approaches, the classical Lagrange interpolation, Newton interpolation and Bsplines interpolation are usually used for univariate interpolation. Wang et al. (2010) proposed a high-order multivariate approximation scheme for scattered data sets, in which approximation error is represented using Taylor expansions at data points, and basis functions are determined through minimizing the approximation error.

2.2. NEURAL NETWORK-BASED INTERPOLATION APPROACHES

Equipped with deep neural networks, data-driven interpolation and reconstruction methods show great advantages and potential. For instance, convolutional neural networks (CNNs) have been ap-

