EIGENVALUE INITIALISATION AND REGULARISATION FOR

Abstract

Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operator layer, and a decoder. These models have been designed and dedicated to tackle physics-related problems with interpretable dynamics and an ability to incorporate physics-related constraints. However, the majority of existing work employs standard regularisation practices. In our work, we take a step toward augmenting Koopman autoencoders with initialisation and penalty schemes tailored for physicsrelated settings. Specifically, we propose the "eigeninit" initialisation scheme that samples initial Koopman operators from specific eigenvalue distributions. In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training. We demonstrate the utility of these schemes on two synthetic data sets: a driven pendulum and flow past a cylinder; and two real-world problems: ocean surface temperatures and cyclone wind fields. We find on these datasets that eigenloss and eigeninit improves the convergence rate by up to a factor of 5, and that they reduce the cumulative long-term prediction error by up to a factor of 3. Such a finding points to the utility of incorporating similar schemes as an inductive bias in other physics-related deep learning approaches.

1. INTRODUCTION

Modern neural networks are often overparameterised, i.e., their number of learnable parameters is significantly larger than the number of available training samples (Allen-Zhu et al., 2019a; b) . To guide optimisation through this immense parameter space, and to potentially improve performance by avoiding overfitting, neural networks are trained with regularisation techniques (Goodfellow et al., 2016) . The importance of regularisation has been shown in the theory and practice of deep learning. Prominent examples include the initialisation of parameter matrices (He et al., 2015; Hanin & Rolnick, 2018) , and constraining the parameters' norm via loss penalties (Hinton, 1987; Krogh & Hertz, 1991) . Initialising weights and penalising them with small random values and weight decay are arguably the most common regularisation techniques employed in training deep models with stochastic gradient descent algorithms. However, specific neural architectures, data domains, and learning problems may require different initialisation and penalty schemes. In this paper, we empirically study the effect of regularisation on physics-aware architectures. The ground-breaking success of deep learning in solving complex tasks in vision and other domains has inspired the physics community to develop deep models suited to deal with real-world problems arising in the field (Willard et al., 2020; Karniadakis et al., 2021) . In this context, we focus on dynamical systems analysed and processed using Koopman-based approaches (Takeishi et al., 2017; Lusch et al., 2018) . Koopman theory (Koopman, 1931) proves that under certain assumptions, nonlinear and finite-dimensional systems can be transformed to a linear (albeit infinite-dimensional) representation via the Koopman operator. Using finite-dimensional approximations of this Koopman operator is advantageous as they facilitate the analysis and understanding of dynamical systems by utilising linear analysis tools. Despite the theoretical and practical advances that have significantly improved Koopman-based learning methods, the majority of existing models still apply regularisation practices designed for general neural networks. Our investigation aims to answer the research question: can one exploit properties of Koopman operators to improve regularisation and to promote better overall performance? The Koopman operator is a linear object with a complex spectrum. Some groups in the past have saught to place a bias on the operator's form. For example, Pan & Duraisamy (2020) propose a skew-symmetric tridiagonal form to guarantee stable Koopman models. However, there has not been a systematic investigation that evaluates the effect of initialisation and penalty regularisation schemes on the behavior of Koopman-based neural networks. The overarching objective of this paper is to help bridge this gap. Key to our regularisation schemes is the observation that spectral properties of linear Koopman operators follow a typical structure, shared by many dynamical systems. Namely, Koopman eigenvalues of stable dynamical systems are constrained in the unit circle of the complex plane (Mauroy & Mezić, 2016) . Motivated by the theoretical observation that, for this important class of stable systems, eigenvalues need to be within the unit circle, we propose two novel regularisation techniques: random eigenvalue generation from known distributions to initialise key weights in the network ("eigeninit"), and a loss penalty on the eigenvalues to alter their expansion ("eigenloss"). Importantly, our regularisation schemes can be incorporated with all Koopman-based approaches as a means to regularise the associated Koopman operator. We evaluate our approach on several challenging physics-related datasets and in comparison to standard regularisation and initialisation approaches as well as a state-of-the-art baseline. Our results indicate that our spectral schemes for initialisation and penalty improve the performance of Koopman-based networks, leading to faster and smoother convergence in the objective loss, as well as yielding models that generalise better in long-term prediction tests. Although these schemes require tuning, a guide to which we provide, they are easily applicable in any context where a Koopman approximate is used. As such, they present a useful tool for practitioners in the field.

2. RELATED WORK

Regularisation of neural networks is a fundamental research topic in machine learning. Common regularisation techniques include dropout (Srivastava et al., 2014) , batch normalisation (Ioffe & Szegedy, 2015) , and data augmentation (Perez & Wang, 2017) . In what follows, we mainly discuss parameter initialisation approaches and weight penalty methods. Parameter initialisation. Proper initialisation of deep models is known to be crucial to their successful training (Sutskever et al., 2013) . Standard initialisation schemes for tanh(•) (Glorot & Bengio, 2010) and ReLU (He et al., 2015) activations sample small random values for the weight matrices of the network. These choices are backed by theoretical results showing that initial small scale weights generalise better (Woodworth et al., 2020) , and random initialisation converges to local minimisers on differentiable losses (Lee et al., 2016) . While guaranteed, convergence may be exponentially slow (Du et al., 2017) . Other approaches promote information flow using initial orthogonal weight matrices (Saxe et al., 2014; Mishkin & Matas, 2016; Pennington et al., 2017) . Koopman-based approaches (Pan & Duraisamy, 2020) suggest to initialise the Koopman operator using the dynamic mode decomposition (DMD) estimates (Schmid, 2010) . Still, the general problem of weight initialisation in neural networks remains an active research topic in practice (Arpit et al., 2019) , and theory (Hanin & Rolnick, 2018; Stöger & Soltanolkotabi, 2021) . Parameter loss penalties. Penalising the norms of weight matrices using L 1 and L 2 metrics is a common practice in machine learning, collectively termed weight decay (Hinton, 1987) . More recently, Arjovsky et al. (2016) showed that recurrent neural networks can avoid the issue of exploding gradients if their hidden-to-hidden matrices are parameterised to be unitary. 



Similarly, Yoshida & Miyato (2017) introduce a regularisation scheme based on penalising the spectral norm of weight matrices. Greydanus et al. (2019) assume the underlying system is measure-preserving, and their network learns the Hamiltonian during training. Lusch et al. (2018) use block diagonal Koopman operators to support continuous spectra. To promote stability, Erichson et al. (2019) employ Lyapunovbased constraints, whereas Pan & Duraisamy (2020) guarantees that Koopman eigenvalues remain in the unit circle via tridiagonal Koopman operators. In contrast, a soft penalty on the forward and backward dynamics was introduced by Azencot et al. (2020), yielding stable Koopman systems and state of the art performance.

