LIPSCHITZ RECURRENT NEURAL NETWORKS

Abstract

Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity. This particular functional form facilitates stability analysis of the long-term behavior of the recurrent unit using tools from nonlinear systems theory. In turn, this enables architectural design decisions before experimentation. Sufficient conditions for global stability of the recurrent unit are obtained, motivating a novel scheme for constructing hidden-to-hidden matrices. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks. Finally, through Hessian-based analysis we demonstrate that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.

1. INTRODUCTION

Many interesting problems exhibit temporal structures that can be modeled with recurrent neural networks (RNNs), including problems in robotics, system identification, natural language processing, and machine learning control. In contrast to feed-forward neural networks, RNNs consist of one or more recurrent units that are designed to have dynamical (recurrent) properties, thereby enabling them to acquire some form of internal memory. This equips RNNs with the ability to discover and exploit spatiotemporal patterns, such as symmetries and periodic structures (Hinton, 1986) . However, RNNs are known to have stability issues and are notoriously difficult to train, most notably due to the vanishing and exploding gradients problem (Bengio et al., 1994; Pascanu et al., 2013) . Several recurrent models deal with the vanishing and exploding gradients issue by restricting the hidden-to-hidden weight matrix to be an element of the orthogonal group (Arjovsky et al., 2016; Wisdom et al., 2016; Mhammedi et al., 2017; Vorontsov et al., 2017; Lezcano-Casado & Martinez-Rubio, 2019) . While such an approach is advantageous in maintaining long-range memory, it limits the expressivity of the model. To address this issue, recent work suggested to construct hidden-tohidden weights which have unit norm eigenvalues and can be nonnormal (Kerg et al., 2019) . Another approach for resolving the exploding/vanishing gradient problem has recently been proposed by Kag et al. (2020) , who formulate the recurrent units as a differential equation and update the hidden states based on the difference between predicted and previous states. In this work, we address these challenges by viewing RNNs as dynamical systems whose temporal evolution is governed by an abstract system of differential equations with an external input. The data are formulated in continuous-time where the external input is defined by the function x = x(t) ∈ R p , and the target signal is defined as y = y(t) ∈ R d . Based on insights from dynamical systems theory, we propose a continuous-time Lipschitz recurrent neural network with the functional form 



ḣ = A β A ,γ A h + tanh(W β W ,γ W h + U x + b) , y = Dh ,

