METAPHYSICA: CAUSALITY-AWARE ROBUSTNESS TO OOD INITIAL CONDITIONS IN PHYSICS-INFORMED MACHINE LEARNING

Abstract

A fundamental challenge in physics-informed machine learning (PIML) is the design of robust PIML methods for out-of-distribution (OOD) forecasting tasks, where the tasks require learning-to-learn from observations of the same (ODE) dynamical system with different unknown parameters, and demand accurate forecasts even under initial conditions outside the training support. In this work we propose a solution for such tasks, which we define as a meta-learning procedure for causal structural discovery (including invariant risk minimization). Using three different OOD tasks, we empirically observe that the proposed approach significantly outperforms existing state-of-the-art PIML and deep learning methods.

1. INTRODUCTION

Physics-informed machine learning (PIML) (e.g., (Willard et al., 2020; Xingjian et al., 2015; Lusch et al., 2018; Yeo & Melnyk, 2019; Raissi et al., 2018; Kochkov et al., 2021) ) seeks to combine the strengths of physics and machine learning models and has positively impacted fields as diverse as biological sciences (Yazdani et al., 2020) , climate science (Faghmous & Kumar, 2014) , turbulence modeling (Ling et al., 2016; Wang et al., 2020a) , among others. PIML achieves substantial success in tasks where the test data comes from the same distribution as the training data (in-distribution tasks). Unlike the PIML work described above, this paper considers an out-of-distribution (OOD) change in the initial system state of the dynamical system, possibly with different train and test distribution supports (illustrated in Figure 1(a, b )). In this setting, we observe that existing state-of-the-art PIML models perform significantly worse than their performance in-distribution, even in PIML methods designed with OOD robustness in mind (Wang et al., 2021b; Kirchmeyer et al., 2022) . This is because the standard ML part of PIML, which tends to learn spurious associations, will perform poorly in our OOD setting. We then propose a promising solution: Combine meta learning with causal structure discovery to learn an ODE model that is robust to OOD initial conditions. In our OOD tasks, OOD robustness means that the robustness is tied to interventions over the initial conditions of the system, not on arbitrary interventions as the system evolves from the initial state. This is an important distinction. There can be multiple ODE models that will be equally OOD robust, and robust ODEs may not correctly predict system trajectories under arbitrary system interventions besides the initial state (Rubenstein et al. (2016) discusses the effect of arbitrary interventions in physics models). Contributions This work proposes a hybrid transductive-inductive modeling approach learning for more robust ODEs using meta learning and causal structure discovery (e.g., via L 1 regularization (Zheng et al., 2018) , which can be combined with invariant risk minimization (Arjovsky et al., 2019; Krueger et al., 2021) ). More precisely, our contributions are: 1. We  t T , and; (c) a family of performance measures (e.g., one for each task) described by the risk function R (i) ; our algorithm will meta learn such that performance at each task improves with experience (more observations) and with the number of tasks (number of experiments). For an algorithm to fit this definition, there must be a transfer of knowledge between multiple tasks that has a positive impact on expected task performance across all tasks. 3. Learning ODEs as structural causal discovery. In order to learn an ODE that is robust OOD changes in initial conditions (with possibly non-overlapping training and test distribution supports), we define a family of structural causal models and perform a structural causal search in order to find the correct model for our task (which is assumed to be in the family). We test common structural causal discovery approaches for linear models: ℓ 1 -regularization with and without an invariant risk minimization-type objective, which we observe achieve similar empirical results. The proposed method is then empirically validated using three commonly-used simulated physics tasks (with measurement noise): Damped pendulum systems (Yin et al., 2021) 

2. DYNAMICAL SYSTEM FORECASTING AS A META LEARNING TASK

In this section we formally describe the task of forecasting a dynamical system with a focus on the out-of-distribution initial condition scenario.  t T (i) denote the noisy observations of our dynamical system, with X (i) t := x (i) t + ε (i) t , where dx (i) t dt = ψ(x (i) t ; W (i) * , ξ * ) , {t 0 , . . . , t T (i) } are regularly-spaced discrete time steps, x t ∈ R d is the (hidden) state of the system at time t during experiment (task) i, ε (i) t are independent zero-mean Gaussian noises, ψ is an unknown deterministic function with hidden ground truth parameters W (i) * ∼ P (W * ) and ξ * , where the global task-independent parameters W * and ξ * are also hidden. Regularly spaced intervals are not strictly necessary for our method, but it makes its implementation simpler. Initial conditions: The distribution of initial conditions X (i) t0 ∼ P (X t0 | E = e (i) ) of task i may depend on its environment. The unknown parameters ξ * remain constant across environments. , where r is generally small, of the dynamical system t T (M +1) from the initial observations T (M +1) , using the inductive knowledge obtained from the training data.



Dynamical system forecasting task) In what follows we describe our task: 1. Training data (depicted in Figure 1(a)): In training, we are given a set of M experiments, which we will denote as M tasks. Task i ∈ {1, . . . , M } has an associated (hidden) environment e (i) . Different tasks can have the same environment. Let T (i) := X (i) t0 , . . . , X

Test data ((depicted in Figure 1(b)): At test, we are given an observed initial sequence T (M +1)

W (M +1) * , ξ * ) with initial condition X (M +1) t0 ∼ P (X t0 | E = e (M+1) ) and (unknown) system parameters W (M +1) * ∼ P (W * ) and hidden global parameters ξ * the same as in training. Our task is to predict X (M +1) tr+1 , . . . , X (M +1)

show that state-of-the-art PIML and deep learning methods fail in test examples with OOD methods, the tasks are dependent and knowledge can be transferred between the learned ODEs. By meta learning we mean the definition in (Thrun & Pratt, 1998, Chapter 1.2), where given: (a) a family of M tasks (a task is a single experiment in our setting), i = 1, . . . , M ; (b) training experience for each task i ∈ {1, . . . , M }, which for us are the time series observations of an experiment X

, predator-prey systems(Wang et al., 2021a), and epidemic modeling (Wang et al., 2021a), all under both constant ODE parameters and varying ODE parameters per experiment. ODE parameters between train and test experiments have the same distribution but non-overlapping (OOD) initial condition distributions.

