A PRIORI GUARANTEES OF FINITE-TIME CONVER-GENCE FOR DEEP NEURAL NETWORKS Anonymous

Abstract

In this paper, we perform Lyapunov based analysis of the loss function to derive an a priori upper bound on the settling time of deep neural networks. While previous studies have attempted to understand deep learning using control theory framework, there is limited work on a priori finite time convergence analysis. Drawing from the advances in analysis of finite-time control of non-linear systems, we provide a priori guarantees of finite-time convergence in a deterministic control theoretic setting. We formulate the supervised learning framework as a control problem where weights of the network are control inputs and learning translates into a tracking problem. An analytical formula for finite-time upper bound on settling time is provided a priori under the assumptions of boundedness of input. Finally, we prove that our loss function is robust against input perturbations.

1. INTRODUCTION

Over the past decade, Deep neural networks have achieved human-like performance in various machine learning tasks, such as classification, natural language processing and speech recognition. Despite the popularity of deep learning, the underlying theoretical understanding remains relatively less explored. While attempts have been made to develop deep learning theory by drawing inspiration from other related fields such as statistical learning and information theory, a comprehensive theoretical framework is still in an early developmental stage. It is difficult to perform mathematical analysis on Deep neural networks due to the large number of parameters involved. Other problems in deep neural networks revolve around the stability and desired convergence rate of the training. Since the performance of the network depends highly on the training data and the choice of the optimization algorithm, there is no guarantee that the training will converge. When it comes to convergence of certain state variables of a dynamical system, control theory provides a rich mathematical framework which can be utilized for analyzing the non-linear dynamics of deep learning [Liu & Theodorou (2019) ]. One of the early works relating deep learning to control theory was of LeCun et al. (1988) , which used the concept of optimal control and formulated



Our work attempts to give finite-time convergence guarantees for training of a deep neural network by utilizing an established stabilization framework from control theory. Existing works in deep learning theory have attempted to bridge the gap in understanding deep learning dynamics by focusing on simple models of neural networks [Saxe et al. (2013), Li & Yuan (2017), Arora et al. (2018), Jacot et al. (2018)]. This could be attributed to the fact that current state-of-the-art deep learning models are highly complex structures to analyze. Jacot et al. (2018) proved that a multilayer fully-connected network with infinite width converges to a deterministic limit at initialization and the rate of change of weights goes to zero. Saxe et al. (2013) analyzed deep linear networks and proved that these networks, surprisingly, have a rich non-linear structure. The study shows that given the right initial conditions, deep linear networks are a finite amount slower than shallow networks. Following this work, Arora et al. (2018) proved the convergence of gradient descent to global minima for networks with dimensions of every layer being full rank in dimensions. While these studies give important insights into the design of neural network architecture and the behavior of training, their results may need to be modified in order to provide convergence guarantees for the conventional deep neural networks. Du et al. (2018) extended the work of Jacot et al. (2018) further by proving convergence for gradient descent to achieve zero training loss in deep neural networks with residual connections.

