GENERALIZING TO NEW DYNAMICAL SYSTEMS THROUGH FIRST-ORDER CONTEXT-BASED ADAPTATION Anonymous authors Paper under double-blind review

Abstract

In this paper, we propose FOCA (First-Order Context-based Adaptation), a learning framework to model sets of systems governed by common but unknown laws that differentiate themselves by their context. Inspired by classical modeling-andidentification approaches, FOCA learns to represent the common law through shared parameters and relies on online optimization to compute system-specific context. Due to the online optimization-based context inference, the training of FOCA involves a bi-level optimization problem. To train FOCA efficiently, we utilize an exponential moving average (EMA)-based method that allows for fast training using only first-order derivatives. We test FOCA on polynomial regression and time-series prediction tasks composed of three ODEs and one PDE, empirically finding it outperforms baselines.

1. INTRODUCTION

Scientists and engineers have made tremendous progress on modeling the behavior of natural and engineering systems and optimizing model parameters to best describe the target system (Ljung, 2010) . This modeling and system identification paradigm has made remarkable advances in modern science and engineering (Schrödinger, 1926; Black & Scholes, 1973; Hawking, 1975) . However, applying this paradigm to complex systems is difficult because the mathematical modeling of systems requires a considerable degree of domain expertise, and finding the best system parameters requires massive experimentation. The availability of large datasets and advances in deep learning tools have made it possible to model a target system without specific mathematical models, relying instead on flexible model classes (Brunton et al., 2016; Gupta et al., 2020; Menda et al., 2020; Jumper et al., 2021; Kochkov et al., 2021; Degrave et al., 2022) . However, when the characteristics of target systems change (e.g., system parameters, boundary conditions), the flexibility of data-driven models makes them difficult to adapt. Deep learning approaches typically handle the contextual change by collecting data from the new behavioral mode and re-training the model on the new dataset. However, this approach can be impractical, especially when the system is complex and context change is frequent. We are interested in developing a framework that learns a common shared model of the systems and inferring the context that best describes the target system to predict response. Our study considers a target system whose input x and response y can be described by y = f (x, c), where f denotes the function class shared by the target systems and c denotes the system-specific context. One possible approach for modeling such target systems is meta-learning (Hospedales et al., 2021) , which learns how to adapt to new systems. Meta-learning is typically a combination of an adaptive mechanism and training for the adaptation. One typical meta-learning approach is to use an encoder that takes the adaptation data and returns the learned context (Santoro et al., 2016; Mishra et al., 2018; Garnelo et al., 2018; Kim et al., 2019) . Although encoder-based adaptation schemes require constant memory usage, their parameterized encoders limit the adaptation capability. Other approaches (pre) train the parameters on the dataset collected from the various modes and update all parameters using gradient-descent (Finn et al., 2017; Nagabandi et al., 2018; Rajeswaran et al., 2019) . Despite their effective adaptability, those approaches are often prone to (meta) over-fitting (Antoniou et al., 2019) , especially when the adaptation target is complex and adaptation data is scarce. Instead of updating all parameters, Raghu et al. ( 2019 of parameters, which we call context ĉ, while fixing the remainder. Although this modeling approach is effective, the training requires computing higher-order derivatives. Many training method for meta-learning have been proposed (Finn et al., 2017; Nichol et al., 2018; Rajeswaran et al., 2019; Deleu et al., 2021) . Typically encoder-based meta-learning trains an encoder and a prediction model jointly and no online optimization-based adaptation occurs. The training of gradient-based meta-learning is typically cast as bi-level optimization. Typical gradient-based meta-learning is often carried out by propagating through the update steps, which requires higherorder derivative calculations. To avoid such issues, Finn et al. ( 2017); Nichol et al. ( 2018) propose a first-order approximation of derivatives to update the (meta) parameters. Alternatively, the implicit gradient method can be used to lower the computational burden (Rajeswaran et al., 2019) , but gradient computation errors can be significant and result in performance degradation (Liao et al., 2018; Zhou et al., 2019; Blondel et al., 2021; Chen et al., 2022) . In this paper, we propose FOCA (First-Order Context-based Adaptation), a context-based metalearning method that is specialized for complex dynamical systems whose behavior can be characterized by a common mathematical model f and a context c. Specifically, FOCA considers target systems of the form y = f θ (x, ĉ), where f θ denotes the learned function class shared by the target systems and ĉ denotes the inferred system-specific context. FOCA learns the function class f θ during training and solves the system identification problem through numerical optimization to find the proper ĉ. The online context optimization bypasses the limitation of encoder-based approaches, but it entails a higher computational burden. To train FOCA efficiently, we thus propose using an exponential moving average (EMA) based training method, which operates with first-order derivatives and no additional memory usage. From our experiments, we also confirm that EMA-based training decreases the computational burden of training and improves the generalization performance of FOCA. The contributions of this work are summarized as follows: • We propose FOCA, a learning framework specialized for modeling complex dynamical system families that are described by an unknown common law and system-specific context. 

2. RELATED WORK

Learning to generalize to the new systems. Transfer learning (Zhuang et al., 2020) attempts to generalize learned models to new systems (or tasks) by fine-tuning a small number of parameters to the new tasks. As a more direct way of generalization to new systems, meta-learning learns how to adapt quickly to new systems. In particular, gradient-based meta-learning (GBML) approaches perform few-step gradient descent updates of the model parameter θ for adaptation (Finn et al., 2017; Nichol et al., 2018) . Focusing on the empirical evidence that the plain GBML is prone to overfit the training tasks, CAVIA (Zintgraf et al., 2019) performs gradient-based updates for a small



); Zintgraf et al. (2019) update a subset Table 1: Comparison of context-based generalization approaches. The memory column specifies the additional memory requirements for adaptation during the training phase of the algorithms. | • | denotes the number of elements of • . al., 2018; Lee et al., 2020a) f θ , g ϕ ĉ = g ϕ (D) O(|ϕ|) CAVIA (Zintgraf et al., 2019) f θ ĉk+1 = ĉk -λ∇ ĉk L (x,y)∈D (f θ (x, ĉk ), y), ĉ0 = 0, ĉ = c K , where K is the adaptation steps. O(|θ| • K) CoDA (Kirchmeyer et al., 2022) f θ , W ĉ = arg min c (x,y)∈D L (f θ+W c (x), y) O(|W |) FOCA f θ ĉ = arg min c (x,y)∈D L (fθ(x, c), y), where fθ is an EMA copy of f θ . O(1)

• We propose an EMA-based training method for training FOCA that overcomes the burden of second-order derivative calculations while showing better generalization results compared to other training methods.• We empirically demonstrate that FOCA outperforms or is competitive to various meta-learning baselines in static function regression and time-series prediction tasks evaluated both indistribution and out-of-distribution.

