PHYSICS-AWARE SPATIOTEMPORAL MODULES WITH AUXILIARY TASKS FOR META-LEARNING

Abstract

Modeling the dynamics of real-world physical systems is critical for spatiotemporal prediction tasks, but challenging when data is limited. The scarcity of realworld data and the difficulty in reproducing the data distribution hinder directly applying meta-learning techniques. Although the knowledge of governing partial differential equations (PDE) of the data can be helpful for the fast adaptation to few observations, it is mostly infeasible to exactly find the equation for observations in real-world physical systems. In this work, we propose a framework, physics-aware meta-learning with auxiliary tasks whose spatial modules incorporate PDE-independent knowledge and temporal modules utilize the generalized features from the spatial modules to be adapted to the limited data, respectively. The framework is inspired by a local conservation law expressed mathematically as a continuity equation and does not require the exact form of governing equation to model the spatiotemporal observations. The proposed method mitigates the need for a large number of real-world tasks for meta-learning by leveraging spatial information in simulated data to meta-initialize the spatial modules. We apply the proposed framework to both synthetic and real-world spatiotemporal prediction tasks and demonstrate its superior performance with limited observations.

1. INTRODUCTION

Deep learning has recently shown promise to play a major role in devising new solutions to applications with natural phenomena, such as climate change (Manepalli et al., 2019; Drgona et al., 2019) , ocean dynamics (Cosne et al., 2019) , air quality (Soh et al., 2018; Du et al., 2018; Lin et al., 2018) , and so on. Deep learning techniques inherently require a large amount of data for effective representation learning, so their performance is significantly degraded when there are only a limited number of observations. However, in many tasks in physical systems in the real-world we only have access to a limited amount of data. One example is air quality monitoring (Berman, 2017) , in which the sensors are irregularly distributed over the space -many sensors are located in urban areas whereas there are much fewer sensors in vast rural areas. Another example is extreme weather modeling and forecasting, i.e., temporally short events (e.g., tropical cyclones (Racah et al., 2017b) ) without sufficient observations over time. Moreover, inevitable missing values from sensors (Cao et al., 2018; Tang et al., 2019) further reduce the number of operating sensors and shorten the length of fullyobserved sequences. Thus, achieving robust performance from a few spatiotemporal observations in physical systems remains an essential but challenging problem. Learning on a limited amount of data from physical systems can be considered as a few shot learning. While recently many meta-learning techniques (Schmidhuber, 1987; Andrychowicz et al., 2016; Ravi & Larochelle, 2017; Santoro et al., 2016; Snell et al., 2017; Finn et al., 2017) have been developed to address this few shot learning setting, there are still some challenges for the existing meta-learning methods to be applied in modeling natural phenomena. First, it is not easy to find a set of similar meta-tasks which provide shareable latent representations needed to understand targeted observations. For instance, while image-related tasks (object detection (He et al., 2017) or visual-question-answering tasks (Andreas et al., 2016; Fukui et al., 2016) ) can take advantage of an image-feature extractor pre-trained by a large set of images (Deng et al., 2009) and well-designed architecture (Simonyan & Zisserman, 2014; He et al., 2016; Sandler et al., 2018) , there is no such large data corpus that is widely applicable for understanding natural phenomena. Second, unlike computer vision or natural language processing tasks where a common object (images or words) is clearly de-fined, it is not straightforward to find analogous objects in the spatiotemporal data. Finally, exact equations behind natural phenomena are usually unknown, leading to the difficulty in reproducing the similar dataset via simulation. For example, although there have been some works (de Bezenac et al., 2018; Lutter et al., 2019; Greydanus et al., 2019) improving data efficiency via explicitly incorporating PDEs as neural network layers when modeling spatiotemporal dynamics, it is hard to generalize for modeling different or unknown dynamics, which is ubiquitous in real-world scenario. In this work, we propose physics-aware modules designed for meta-learning to tackle the few shot learning challenges in physical observations. One of fundamental equations in physics describing the transport of physical quantity over space and time is a continuity equation: ∂ρ ∂t + ∇ • J = σ, ( ) where ρ is the amount of the target quantity (u) per unit volume, J is the flux of the quantity, and σ is a source or sink, respectively. This fundamental equation can be used to derive more specific transport equations such as the convection-diffusion equation, Navier-Stokes equations, and Boltzmann transport equation. Thus, the continuity equation is the starting point to model spatiotemporal (conservative) observations which are accessible from sensors. Based on the form of ρ and J with respect to a particular quantity u, Eq. 1 can be generalized as: ∂u ∂t = F (∇u, ∇ 2 u, . . . ), where the function F (•) describes how the target u is changed over time from its spatial derivatives. Inspired by the form of Eq. 2, we propose two modules: spatial derivative modules (SDM) and time derivative modules (TDM). Since the spatial derivatives such as ∇, ∇•, and ∇ 2 are commonly used across different PDEs, the spatial modules are PDE-independent and they can be meta-initialized from synthetic data. Then, the PDE-specific temporal module is trained to learn the unknown function F (•) from few observations in the real-world physical systems. This approach can effectively leverage a large amount of simulated data to train the spatial modules as the modules are PDE-independent and thus mitigating the need for a large amount of real-world tasks to extract shareable features. In addition, since the spatial modules are universally used in physics equations, the representations from the modules can be conveniently integrated with datadriven models for modeling natural phenomena. Based on the modularized PDEs, we introduce a novel approach that marries physics knowledge in spatiotemporal prediction tasks with metalearning by providing shareable modules across spatiotemporal observations in the real-world. Our contributions are summarized below: • Modularized PDEs and auxiliary tasks: Inspired by forms of PDEs in physics, we decompose PDEs into shareable (spatial) and adaptation (temporal) parts. The shareable one is PDE-independent and specified by auxiliary tasks: supervision of spatial derivatives. • Physics-aware meta-learning: We provide a framework for physcis-aware meta-learning, which consists of PDE-independent/-specific modules. The framework is flexible to be applied to the modeling of different or unknown dynamics. • Synthetic data for shareable modules: We extract shareable parameters in the spatial modules from synthetic data, which can be generated from different dynamics easily.

2. MODULARIZED PDES AND META-LEARNING

In this section, we describe how the physics equations for conserved quantities are decomposable into two parts and how the meta-learning approach tackles the task by utilizing synthetic data when the data are limited.

2.1. DECOMPOSABILITY OF VARIANTS OF A CONTINUITY EQUATION

In physics, a continuity equation (Eq. 1) describes how a locally conserved quantity such as temperature, fluid density, heat, and energy is transported across space and time. This equation underlies

