RECURRENT NEURAL NETWORK ARCHITECTURE BASED ON DYNAMIC SYSTEMS THEORY FOR DATA DRIVEN MODELLING OF COMPLEX PHYSICAL SYSTEMS Anonymous

Abstract

While dynamic systems can be modeled as sequence-to-sequence tasks by deep learning using different network architectures like DNN, CNN, RNNs or neural ODEs, the resulting models often provide poor understanding of the underlying system properties. We propose a new recurrent network architecture, the Dynamic Recurrent Network (DYRNN), where the computation function is based on the discrete difference equations of basic linear system transfer functions known from dynamic system identification. This results in a more explainable model, since the learnt weights can provide insight on a system's time dependent behaviour. It also introduces the sequences' sampling rate as an additional model parameter, which can be leveraged, for example, for time series data augmentation and model robustness checks. The network is trained using traditional gradient descent optimization and can be used in combination with other state of the art neural network layers. We show that our new layer type yields results comparable to or better than other recurrent layer types on several system identification tasks.

1. INTRODUCTION

Dynamic systems occur in many different areas of life (Isermann & Münchhof (2011) ). From biology, engineering, medicine to economics and more: Often, if a system changes its state based on a external input, this system can be viewed as a dynamic system. Dynamic system identification is the process of modelling the system's properties. Such models can be used, for example, for anomaly detection, controller design or outcome prediction. For linear systems, this identification task is already well understood and state of the art methods exist. However, if a system exhibits non-linear behaviour, for example slip-stick-effects due to mechanical friction, the applicability of these methods is limited. In this case different approaches implemented in the state of the art range from white-box to black-box models. Generally, increasing system complexity raises the need for more powerful and often less understandable model architectures in order to produce satisfactory results: White box (based on differential equations or numerical simulations of the physical system components), black box systems (like Gaussian processes, deep neural networks, Support Vector Machines) and grey box models, which often employ a mix of linear and non-linear building blocks. One example of a tool used in engineering are Hammerstein-Wiener models which are a combination of linear and (prior known) non-linear equations (shown in Figure 1 ). The linear model parameters are determined based on the training data. The non-linear behaviour of models is modeled using lookup tables or user defined non-linear functions. In this work we present a new type of recurrent neural network layer called the Dynamic Recurrent Neural Network (DYRNN). It is designed for data based modelling of dynamic systems in a sequence-to-sequence manner based on input (x(t)) and output (y(t)) data. With it, we intend to bridge the gap between dynamic systems theory and recurrent neural networks. The layer's internal computation is based on elemental transfer blocks from linear system identification. By combining it with non-linear neural networks, a Hammerstein-Wiener style model is emulated. This way, the model can offer additional knowledge about the examined system's internal properties. Furthermore, while the model is trained on sampled data of one sampling rate it can be applied to data of the same system at a different sampling rate. This can be used to check the robustness of the model or to save time during training. We show that our network produces results which are better than or comparable to other recurrent networks (RNN, LSTM, GRU) on three different problem datasets. Since the layer can be implemented to be compatible to current deep learning frameworks, it can be combined with state of the art neural network layers (like convolutional or fully connected layers) and training techniques. u(t) y(t) g(t) Figure 1 : Hammerstein-Wiener model. Static non-linearities before and after a linear differential equation model g(t) can be used to model non-linear dynamic systems.

2. RELATED WORK

Dynamic system identification can be viewed as a sequence-to-sequence task of the modelling of a systems' output based on certain inputs. Isermann & Münchhof (2011), for example, list several different tools like ARIMA processes for linear systems and multiple neural network architectures for non-linear systems. Examples for the latter are locally recurrent locally feedforward networks (LRGF), Multi Layer Perceptrons (MLP) and Radial Basis Function (RBF) networks of different types of dynamics. These model structures are generalized, however, and as we will show, further theoretical background on linear systems theory could be leveraged. Generally, deep learning offers multiple neural network layer types that can be employed when dealing with sequence-to-sequence problems, like fully connected (FC) networks, convolutional networks (CNN) or recurrent networks. Recurrent networks are also known as sequential models (like RNN, LSTM by Hochreiter & Schmidhuber (1997) and GRU by Cho et al. (2014) ) and have been used successfully for text based sequence-to-sequence problems like machine translation or text processing. Wang (2017) demonstrates a concept of LSTM for dynamic system identification by using several parallel LSTM layers which predict the systems behaviour based on its input and prior predictions and their derivatives. A different approach of modelling dynamic systems are neural ordinary differential equations (ODEs) by Chen et al. (2018) . These networks learn dy/dt of a function f with y(t) = f (x(t)) and the resulting ODE model is used an numerical integrator/solver (like Runge-Kutta) to compute y(t). This has the advantage of a varying sampling step size which is determined by the solver, but these methods are agnostic of dynamic systems theory knowledge. Similarly, Raissi et al. (2019) use deep learning to learn Partial Differential Equations (PDE) of physical systems in a FC model combined with a numerical integrator. Furthermore, since the evaluation of ODE/PDE models is done using a numerical integrator, the model is difficult to apply in combination with other neural network layers like for example convolutional or recurrent layers. In terms of sampling frequency of the measurement data, recurrent network architectures can only be trained on one specific data frequency, and do not provide the functionality to generalize to other sampling rates of the same system. In such a case one would have to resample the new data to the frequency of the training set. Explainability approaches for sequential models in text processing deduce which parts of the sentences are relevant for the model's prediction based on the activation of the internal gates (as shown by e.g. Krakovna & Doshi-Velez (2016) ). Interpretability of RNN/LSTM or GRU models for continuous measurement data has not been explored yet to our knowledge.

3. DYNAMIC RECURRENT NETWORK

The complexity of the modelling of dynamic systems does not only result from the potential nonlinearity, but also from the fact that the model has to keep track of the system's current and past states in order to predict the output based on new input. We intend to model a dynamic system in

