FACTORS INFLUENCING GENERALIZATION IN CHAOTIC DYNAMICAL SYSTEMS

Abstract

Many real-world systems exhibit chaotic behaviour, for example: weather, fluid dynamics, stock markets, natural ecosystems, and disease transmission. While chaotic systems are often thought to be completely unpredictable, in fact there are patterns within and across that experts frequently describe and contrast qualitatively. We hypothesise that given the right supervision / task definition, representation learning systems will be able to pick up on these patterns, and successfully generalize both in-and out-of-distribution (OOD). Thus, this work explores and identifies key factors which lead to good generalization. We observe a variety of interesting phenomena, including: learned representations transfer much better when fine-tuned vs. frozen; forecasting appears to be the best pre-training task; OOD robustness falls off very quickly outside the training distribution; recurrent architectures generally outperform others on OOD generalization. Our findings are of interest to any domain of prediction where chaotic dynamics play a role.

1. INTRODUCTION

There are many reasons to be interested in understanding and predicting behaviour of chaotic systems. For example, the current climate crisis is arguably the most important issue of our time. From atmospheric circulation and weather prediction to economic and social patterns, there are chaotic dynamics in many data relevant to mitigate impact and adapt to climate changes. Most natural ecosystems exhibit chaos; a better understanding of the mechanisms of our impact on our environment is essential to ensuring a sustainable future on our planet. The spread of information in social networks, many aspects of market economies, and the spread of diseases, all have chaotic dynamics too, and of course these are not isolated systems -they all interact in complex ways, and the interaction dynamics can also exhibit chaos. This makes chaotic systems a compelling challenge for machine learning, particularly representation learning: Can models learn representations that capture high-level patterns and are useful across other tasks? Which losses, architectures, and other design choices lead to better representations? These are some of the questions which we aim to answer. Our main contributions are: • The development of a lightweight evaluation framework, ValiDyna, to evaluate representations learned by deep-learning models in new tasks, new scenarios, and on new data. • The design of experiments using this framework, showcasing its usefulness and flexibility. • A comparative analysis of 4 popular deep-learning architectures using these experiments.  model S C F S ↛ C F ↛ C S ↛ F C ↛ F F → S F → C C → S C → F GRU ✓ ✓ - ✓ ✓ - - ✓ ✓ ✓ - LSTM ✓ ✓ - ✓ ✓ - - ✓ ✓ ✓ - Transformer ✓ ✓ - ✓ ✓ - - ✓ - - - N-BEATS - -- - - - - - - - - Many works have studied factors influencing generalization for deep networks; see Maharaj (2022) for review, and Arjovsky (2021) for OOD specifically. To our knowledge, ours is the first such analysis for data exhibiting chaotic dynamics. Our work relies on that of Gilpin ( 2021), which presents a dataset of dynamical systems that show chaotic behaviour under certain conditions. They benchmark statistical and deep-learning models typically used with time series for a variety of tasks including forecasting and dataset-transfer, and highlight some connections between model performance and chaotic properties. Although not directly addressing chaos, the intersection of physics-informed and dynamical systems literature with representation learning holds relevance for chaotic dynamics, e.g. 2020) discuss the difference between generalisation to new data domains and to new ODE parameters in the context of dynamical systems. They show that ML techniques generalise badly when the parameters of a test system/data are not included in the train set (extrapolation).

3. DATA

Our data is generated using dysts, a Python library of 130+ chaotic dynamical systems published by Gilpin (2021). In dysts, each dynamical system can be integrated into a trajectory with any desired initial condition, length and granularity, thus allowing to generate an unlimited number of trajectories. It can also generate trajectories of similar time scales across different chaotic systems. See Figure 1 for examples and Figure A9 for further examples. Figure 1 : Sample trajectories from two related chaotic attractors. Both systems have two 'lobes'; Arneodo (left) has a characteristic shell shape with one lobe inside the other, while Lorenz (right) shows a characteristic butterfly shape with lobes at an angle to one another. This is the kind of high-level pattern experts describe for many real-world chaotic systems, which we hypothesize representation learning systems could pick up on.

3.1. THE DATA GENERATION PROCESS

We sample data from each dynamical system by picking different initial conditions. This leads to trajectories that are sufficiently different from each other, but representative of the underlying chaotic system. However, dysts relies on numerical ODE solvers to generate trajectories, which could fail due to numerical instabilities when the initial condition is too extreme. To avoid that, we generate the default trajectory for each system, compute the component-wise minima and maxima, and use a percentage p of the resulting intervals to sample random initial conditions for that system. In addition to the properties of the trajectory, the parameters of this process are the random seed and the percentage p of the observed initial condition range to be used for sampling.



Summary of the generalisation results. S, C and F stand for the tasks of Supervised featurisation, Classification, and Forecasting. A ↛ B and A → B indicate strict (see section 5.2) and loose (see section 5.3) feature-transfer from task A to task B. All runs generalise in-distribution. ✓ andindicate whether or not the model-run pair achieves OOD generalisation in the final task.

Raissi et al.  (2019)  show how to train models whose predictions respect the laws of physics, by employing partial differential equations as regularisation.Yin et al. (2022)  propose a framework to learn contextual dynamics by decomposing the learned dynamical function into two components that capture context-invariant and context-specific patterns.As AI systems are increasingly deployed in the real world, researchers have increasingly noted shortcomings of standard practice (i.e. performance on validation/test set) for comprehensively evaluating learned representations. An increasing number of evaluation frameworks have been proposed to help address this, e.g.Gulrajani & Lopez-Paz (2021)  propose model selection algorithms and develop a framework (DomainBed) for testing domain/OOD generalisation. Of particular relevance, Wang et al. (

