COMPRESSING MULTIDIMENSIONAL WEATHER AND CLIMATE DATA INTO NEURAL NETWORKS

Abstract

Weather and climate simulations produce petabytes of high-resolution data that are later analyzed by researchers in order to understand climate change or severe weather. We propose a new method of compressing this multidimensional weather and climate data: a coordinate-based neural network is trained to overfit the data, and the resulting parameters are taken as a compact representation of the original grid-based data. While compression ratios range from 300× to more than 3,000×, our method outperforms the state-of-the-art compressor SZ3 in terms of weighted RMSE, MAE. It can faithfully preserve important large scale atmosphere structures and does not introduce significant artifacts. When using the resulting neural network as a 790× compressed dataloader to train the WeatherBench forecasting model, its RMSE increases by less than 2%. The three orders of magnitude compression democratizes access to high-resolution climate data and enables numerous new research directions.

1. INTRODUCTION

Numerical weather and climate simulations can produce hundreds of terabytes to several petabytes of data (Kay et al., 2015; Hersbach et al., 2020) and such data are growing even bigger as higher resolution simulations are needed to tackle climate change and associated extreme weather (Schulthess et al., 2019; Schär et al., 2019) . In fact, kilometer-scale climate data are expected to be one of, if not the largest, scientific datasets worldwide in the near future. Therefore it is valuable to compress those data such that ever-growing supercomputers can perform more detailed simulations while end users can have faster access to them. Data produced by numerical weather and climate simulations contains geophysical variables such as geopotential, temperature, and wind speed. They are usually stored as multidimensional arrays where each element represents one variable evaluated at one point in a multidimensional grid spanning space and time. Most compression methods (Yeh et al., 2005; Lindstrom, 2014; Lalgudi et al., 2008; Liang et al., 2022; Ballester-Ripoll et al., 2018) use an auto-encoder approach that compresses blocks of data into compact representations that can later be decompressed back into the original format. This approach prohibits the flow of information between blocks. Thus larger blocks are required to achieve higher compression ratios, yet larger block sizes also lead to higher latency and lower bandwidth when only a subset of data is needed. Moreover, even with the largest possible block size, those methods cannot use all the information due to computation or memory limitations and cannot further improve compression ratio or accuracy. We present a new lossy compression method for weather and climate data by taking an alternative view: we compress the data through training a neural network to surrogate the geophysical variable as a continuous function mapping from space and time coordinates to scalar numbers (Figure 1 ). The input horizontal coordinates are transformed into three-dimensional Cartesian coordinates on the unit sphere where the distance between two points is monotonically related to their geodesic distance, such that the periodic boundary conditions over the sphere are enforced strictly. The resulting Cartesian coordinates and other coordinates are then transformed into Fourier features before flowing into fully-connected feed forward layers so that the neural network can capture high-frequency signals (Tancik et al., 2020) . After the neural network is trained, its weights are quantized from

