LEARNING DYNAMICAL CHARACTERISTICS WITH NEURAL OPERATORS FOR DATA ASSIMILATION

Abstract

Data assimilation refers to a group of algorithms that combine numerical models of a system with observations to obtain an optimal estimation of the system's states. In domains like earth science, numerical models are usually formulated by differential equations, also known as prior dynamics. It is a great challenge for neural networks to properly exploit the dynamical characteristics for data assimilation, because first, it is difficult to represent complicated dynamical characteristics in neural networks, and second, the dynamics are likely to be biased. The state-of-the-art neural networks emulate traditional approaches to introduce dynamical characteristics by optimizing an objective function in which the dynamics are inherently quantified, but the iterative optimization process leads to high computational cost. In this paper, we develop a novel deep learning framework with neural operators for data assimilation. The key novelty of our proposed approach is that we design a so-called flow operator to explicitly learn dynamical characteristics for reconstructing sequences of physical states. Numerical experiments on the Lorenz-63 and Lorenz-96 systems, which are the standard benchmarks for data assimilation performance evaluation, show that the proposed method is at least three times faster than state-of-the-art neural networks, and reduces the dynamic loss by two orders of magnitude. It is also demonstrated that our method is well-adapted to biases in the prior dynamics.

1. INTRODUCTION

Data assimilation can be essentially defined as a statistical technique to combine the prior dynamics with a sequence of noisy and irregularly-sampled observations, which plays an important role in utilizing observational data, especially for numerical weather forecasting systems. For instance, the European Centre for Medium-Range Weather Forecasts (ECMWF) employs the variational assimilation algorithms in its operational systems to take full advantage of both in situ and satellite-derived data, as well as state-of-the-art numerical models (Rabier et al., 2000) . Classical Assimilation Methods Classical data assimilation methods range from early empirical analysis (Bergthorsson et al., 1955) and optimal interpolation (Gandin, 1963) to later variationalbased assimilation algorithms (Sasaki, 1970) and the filtering methods based on statistical estimation theory, such as Kalman filter (Welch et al., 1995) , and particle filter (Carpenter et al., 1999) . Although classical methods have been essential tools for improving the capability of major numerical weather prediction centers worldwide (Kalnay, 2003) , they generally do not account for the propagation of observation information along time and therefore are unable to effectively utilize observations at different times (Song et al., 2017) .

Time-dependent Assimilation Method

In recent years, new data assimilation methods have been developed (Evensen, 2003; Lorenc & Rawlins, 2005; Hunt et al., 2007) . Unlike previous methods, they consider the revolution of observation information along time, in other words, the time-dependent dynamical characteristics (Song et al., 2017) . Among them, the four-dimensional variational (4D-Var) assimilation algorithm is a cutting-edge one. Experimental results prove the advantage of 4D-Var assimilation methods in utilizing observational information (Lorenc & Rawlins, 2005) . However, due to the complicated modeling and solving process of the 4D-Var algorithm, it is considered computationally expensive, especially for high resolution cases (Fisher, 1998) . Machine Learning for data assimilation Prior to 2020, extensive research into developing machine learning for data assimilation had been successful in reducing computational costs (Hsieh & Tang, 1998; Vijaykumar et al., 2002; de Campos Velho et al., 2002; Härter & de Campos Velho, 2008; Cintra & de Campos Velho, 2018; Mack et al., 2020) . However, these works base their theory on classical assimilation methods, thus suffering from low accuracy brought by not considering time-dependent dynamical characteristics. From 2020, a lot of neural network-based methods consider time-dependent assimilation. They can be generally divided into three categories: learning the inverse observation operator, learning model biases, and learning the optimization algorithm. Learning Inverse Observation Operators The aim of learning inverse observation operators is to use neural networks to construct a mapping from observations to reconstructed states. The representative work is Frerix et al. (2021) , in which the observational data is exploited by a neural operator to provide better initial states for the optimization of 4D-Var objective function. This work has at least two flaws. The first is that the neural network is used only to exploit observational information, while the integration of observation and dynamical characteristics is still implemented by 4D-Var, leading to a limited improvement in computational efficiency. The second is that the strict constraints of time-dependent dynamical characteristics are the prerequisites for this work, which prevent it from generalizing to the cases in which the prior dynamics are biased. Learning Model Biases As mentioned above, the prior dynamics can be biased, and another group of works uses neural networks to learn the bias of the prior dynamic models for data assimilation (Brajard et al., 2020; Arcucci et al., 2021; Farchi et al., 2021; Bonavita & Laloyaux, 2020) . These works generally achieve a balance between accuracy and efficiency, but the improvement in computational efficiency is insignificant, because the assimilation of dynamical characteristics and observations is still done in an iterative manner. Learning Optimization Algorithm Since the main computational overhead of the 4D-Var algorithm lies in the process of iteratively optimizing the 4D-Var objective function, some studies have focused on replacing traditional optimization algorithms with neural networks. Among them, the representative one is Fablet et al. (2021b; a) . This work simultaneously learns both the dynamical characteristics of the model and the optimization algorithm of the 4D-Var objective function through neural networks. It not only takes into consideration the weak constraints of time-dependent dynamical characteristics, but also successfully reduces the number of iterations compared with traditional 4D-Var algorithms, subsequently improving computational efficiency. We consider this work to be the state-of-the-art method for comparison. Despite the achievements of the work, there is still room for improvement in terms of both accuracy and efficiency. Our Contributions We develop a new approach of combing data-driven strategies with modeldriven methods for time-dependent data assimilation. This framework consists of three operators implemented by neural networks, namely, the inverse observation operator, the perturbator, and the flow operator. Among them, the cooperation of the perturbator and the flow operator decouples the two goals of learning observations and learning dynamical characteristics, and they succeed in continuously refining reconstructed states. • To our knowledge, this is the first work that directly uses neural networks to blend the dynamical prior with a sequence of observations without explicit formulations of the 4D-Var objective function for assimilation. • We test the proposed framework on the Lorenz-63 and Lorenz-96 systems. Experimental results support the effectiveness of our framework. The proposed architecture is at least three times faster than the state-of-the-art neural network Fablet et al. (2021b) . It reduces the dynamic loss by two orders of magnitude and is comparable to the state-of-the-art neural networks in terms of the reconstruction error. • We design experiments in which the prior dynamics are different from ground truth to simulate real-world scenarios. Experimental results prove that our method is well-adapted to deviations of the prior dynamic model.

2. PROBLEM STATEMENT

Suppose that there is a dynamic system with d dimensions to be observed and estimated, and we denote the state variables by x(t) ∈ R d . Consider N evenly distributed time points, t i = t 0 +

