IDENTIFYING NONLINEAR DYNAMICAL SYSTEMS WITH MULTIPLE TIME SCALES AND LONG-RANGE DEPENDENCIES

Abstract

A main theoretical interest in biology and physics is to identify the nonlinear dynamical system (DS) that generated observed time series. Recurrent Neural Networks (RNNs) are, in principle, powerful enough to approximate any underlying DS, but in their vanilla form suffer from the exploding vs. vanishing gradients problem. Previous attempts to alleviate this problem resulted either in more complicated, mathematically less tractable RNN architectures, or strongly limited the dynamical expressiveness of the RNN. Here we address this issue by suggesting a simple regularization scheme for vanilla RNNs with ReLU activation which enables them to solve long-range dependency problems and express slow time scales, while retaining a simple mathematical structure which makes their DS properties partly analytically accessible. We prove two theorems that establish a tight connection between the regularized RNN dynamics and its gradients, illustrate on DS benchmarks that our regularization approach strongly eases the reconstruction of DS which harbor widely differing time scales, and show that our method is also en par with other long-range architectures like LSTMs on several tasks.

1. INTRODUCTION

Theories in the natural sciences are often formulated in terms of sets of stochastic differential or difference equations, i.e. as stochastic dynamical systems (DS). Such systems exhibit a range of common phenomena, like (limit) cycles, chaotic attractors, or specific bifurcations, which are the subject of nonlinear dynamical systems theory (DST; Strogatz (2015) ; Ott (2002)). A long-standing desire is to retrieve the generating dynamical equations directly from observed time series data (Kantz & Schreiber, 2004) , and thus to 'automatize' the laborious process of scientific theory building to some degree. A variety of machine and deep learning methodologies toward this goal have been introduced in recent years (Chen et al., 2017; Champion et al., 2019; Ayed et al., 2019; Koppe et al., 2019; Hamilton et al., 2017; Razaghi & Paninski, 2019; Hernandez et al., 2020) . Often these are based on sufficiently expressive series expansions for approximating the unknown system of generative equations, such as polynomial basis expansions (Brunton et al., 2016; Champion et al., 2019) or recurrent neural networks (RNNs) (Vlachas et al., 2018; Hernandez et al., 2020; Durstewitz, 2017; Koppe et al., 2019) . Formally, RNNs are (usually discrete-time) nonlinear DS that are dynamically universal in the sense that they can approximate to arbitrary precision the flow field of any other DS on compact sets of the real space (Funahashi & Nakamura, 1993; Kimura & Nakano, 1998; Hanson & Raginsky, 2020) . Hence, RNNs seem like a good choice for reconstructing -in this sense of dynamically equivalent behavior -the set of governing equations underlying real time series data. However, RNNs in their vanilla form suffer from the 'vanishing or exploding gradients' problem (Hochreiter & Schmidhuber, 1997; Bengio et al., 1994) : During training, error gradients tend to either exponentially explode or decay away across successive time steps, and hence vanilla RNNs face severe problems in capturing long time scales or long-range dependencies in the data. Specially designed RNN architectures equipped with gating mechanisms and linear memory cells have been proposed for mitigating this issue (Hochreiter & Schmidhuber, 1997; Cho et al., 2014) . However, from a DST perspective, simpler models that can be more easily analyzed and interpreted in DS

