BLOCK-LEVEL STIFFNESS ANALYSIS OF RESIDUAL NETWORKS

Abstract

Residual Networks (ResNets) can be interpreted as dynamic systems, which are systems whose state changes over time and can be described with ordinary differential equations (ODEs) (Haber et al., 2018; Weinan, 2017). Specifically, the dynamic systems interpretation views individual residual blocks as ODEs. Numerical techniques for solving ODEs result in an approximation; and therefore contain an error term. If an ODE is stiff it is likely that this error is amplified and becomes dominating in the solution calculations, which negatively affects the accuracy of the approximated solution (Burden et al., 2015). Therefore, stiff ODEs are often numerically unstable. In this paper we leverage the dynamic systems interpretation to perform a novel theoretical analysis of ResNets by leveraging findings and tools from numerical analysis of ODEs. Specifically, we perform block level stiffness analysis of ResNets. We find that residual blocks towards the end of ResNet models exhibit increased stiffness and that there is a statistically significant correlation between stiffness and model accuracy and loss. Based on these findings, we propose that ResNets behave as stiff numerically unstable ODEs.

1. INTRODUCTION

There are three theoretical interpretations of Residual Networks (ResNets): (1) unraveled ResNets, (2) unrolled iterative estimation, and (2) dynamical systems. The unravelled interpretation views ResNets as a collection of 2 n paths along which the input data flows, where n is the number of residual blocks (Veit et al., 2016) . The unrolled iterative estimation interpretation explains ResNets as iterative approximators, where the first estimate provided by the first layer and is progressively refined by subsequent layers (Greff et al., 2017) . Finally, the dynamical systems view interprets ResNets as discretized dynamical systems, where ResNets are seen as ordinary differential equations (ODEs) (Haber et al., 2018; Chen et al., 2018; Lu et al., 2018) . Specifically, the dynamical systems interpretation regards ResNets's residual blocks as a series of forward Euler discretizations of an initial value ODE. This connection between residual blocks and ODEs can be leveraged for novel theoretical analyses that further our understanding and interpretation of ResNets. In this paper we perform a stiffness analysis of ResNets and their residual blocks by leveraging findings from numerical analysis of ODEs. Stiffness is an interesting property of an ODE that has important implications. If a differential equation is stiff, the solution to the equation will have an unpredictable error that will negatively affect the accuracy of the approximated solution (Burden et al., 2015) . Therefore, stiff ODEs are often numerically unstable and their solutions have accuracy issues (Seinfeld et al., 1970; Shampine & Gear, 1979) . There is no rigorous definition of stiffness; however there are certain phenomena that indicate that a problem may be stiff. One way to assess stiffness of an ODE is to analyze the eigenvalues of the Jacobian of the ODE. Specifically, if the eigenvalues of the Jacobian differ greatly in magnitude (Butcher, 2008; Bui & Bui, 1979) or if a large portion of the eigenvalues have negative real parts (Burden et al., 2015) , it is likely that the ODE is stiff. Unfortunately, there are no specific thresholds regarding what constitutes a high variation in magnitude of eigenvalues or high proportion of eigenvalues with negative real parts.

