LIPSCHITZ-BOUNDED EQUILIBRIUM NETWORKS

Abstract

This paper introduces new parameterizations of equilibrium neural networks, i.e. networks defined by implicit equations. This model class includes standard multilayer and residual networks as special cases. The new parameterization admits a Lipschitz bound during training via unconstrained optimization: no projections or barrier functions are required. Lipschitz bounds are a common proxy for robustness and appear in many generalization bounds. Furthermore, compared to previous works we show well-posedness (existence of solutions) under less restrictive conditions on the network weights and more natural assumptions on the activation functions: that they are monotone and slope restricted. These results are proved by establishing novel connections with convex optimization, operator splitting on non-Euclidean spaces, and contracting neural ODEs. In image classification experiments we show that the Lipschitz bounds are very accurate and improve robustness to adversarial attacks.

1. INTRODUCTION

Deep neural network models have revolutionized the field of machine learning: their accuracy on practical tasks such as image classification and their scalability have led to an enormous volume of research on different model structures and their properties (LeCun et al., 2015) . In particular, deep residual networks with skip connections He et al. (2016) have had a major impact, and neural ODEs have been proposed as an analog with "implicit depth" (Chen et al., 2018) . Recently, a new structure has gained interest: equilibrium networks (Bai et al., 2019; Winston & Kolter, 2020) , a.k.a. implicit deep learning models (El Ghaoui et al., 2019) , in which model outputs are defined by implicit equations incorporating neural networks. This model class is very flexible: it is easy to show that includes many previous structures as special cases, including standard multi-layer networks, residual networks, and (in a certain sense) neural ODEs. However model flexibility in machine learning is always in tension with model regularity or robustness. While deep learning models have exhibited impressive generalisation performance in many contexts it has also been observed that they can be very brittle, especially when targeted with adversarial attacks (Szegedy et al., 2014) . In response to this, there has been a major research effort to understand and certify robustness properties of deep neural networks, e.g. Raghunathan et al. 2019) and many others. Global Lipschitz bounds (a.k.a. incremental gain bounds) provide a somewhat crude but nevertheless highly useful proxy for robustness (Tsuzuku et al., 2018; Fazlyab et al., 2019) , and appear in several analyses of generalization (e.g. (Bartlett et al., 2017; Zhou & Schoellig, 2019) ). Inspired by both of these lines of research, in this paper we propose new parameterizations of equilibrium networks with guaranteed Lipschitz bounds. We build directly on the monotone operator framework of Winston & Kolter (2020) The main contribution of our paper is the ability to enforce tight bounds on the Lipschitz constant of an equilibrium network during training with essentially no extra computational effort. In addition, we prove existence of solutions with less restrictive conditions on the weight matrix and more natural assumptions on the activation functions via novel connections to convex optimization and contracting dynamical systems. Finally, we show via small-scale image classification experiments that the proposed parameterizations can provide significant improvement in robustness to adversarial attacks with little degradation in nominal accuracy. Furthermore, we observe small gaps between certified Lipschitz upper bounds and observed lower bounds computed via adversarial attack. Equilibrium networks, Implicit Deep Models, and Well-Posedness. As mentioned above, it has been recently shown that many existing network architectures can be incorporated into a flexible model set called an equilibrium network (Bai et al., 2019; Winston & Kolter, 2020) or implicit deep model (El Ghaoui et al., 2019) . In this unified model set, the network predictions are made not by forward computation of sequential hidden layers, but by finding a solution to an implicit equation involving a single layer of all hidden units. One major question for this type of networks is its wellposedness, i.e. the existence and uniqueness of a solution to the implicit equation for all possible inputs. El Ghaoui et al. ( 2019) proposed a computationally verifiable but conservative condition on the spectral norm of hidden unit weight. In Winston & Kolter (2020), a less conservative condition was developed based on monotone operator theory. Similar monotonicity constraints were previously used to ensure well-posedness of a different class of implicit models in the context of nonlinear system identification (Tobenkin et al., 2017, Theorem 1). On the question of well-posedness, our contribution is a more flexible model set and more natural assumptions on the activation functions: that they are monotone and slope-restricted. Neural Network Robustness and Lipschitz Bounds. The Lipschitz constant of a function measures the worst-case sensitivity of the function, i.e. the maximum "amplification" of difference in inputs to differences in outputs. The key features of a good Lipschitz bounded learning approach include a tight estimation for Lipschitz constant and a computationally tractable training method with bounds enforced. For deep networks, Tsuzuku et al. ( 2018) proposed a computationally efficient but conservative approach since its Lipschitz constant estimation method is based on composition of estimates for different layers, while Anil et al. ( 2019) proposed a combination of a novel activation function and weight constraints. For equilibrium networks, El Ghaoui et al. ( 2019) proposed an estimation of Lipschitz bounds via input-to-state (ISS) stability analysis. Fazlyab et al. ( 2019) estimates for deep networks based on incremental quadratic constraints and semidefinite programming (SDP) were shown to give state-of-the-art results, however this was limited to analysis of an already-trained network. The SDP test was incorporated into training via the alternating direction method of multipliers (ADMM) in Pauli et al. (2020) , however due to the complexity of the SDP the training times recorded were almost 50 times longer than for unconstrained networks. Our approach uses a similar condition to Fazlyab et al. (2019) applied to equilibrium networks, however we introduce a novel direct parameterization method that enables learning robust models via unconstrained optimization, removing the need for computationally-expensive projections or barrier terms.

3.1. PROBLEM STATEMENT

We consider the weight-tied network in which x ∈ R d denotes the input, and z ∈ R n denotes the hidden units, y ∈ R p denotes the output, given by the following implicit equation z = σ(W z + U x + b z ), y = W o z + b y (1) where W ∈ R n×n , U ∈ R n×d , and W o ∈ R p×n are the hidden unit, input, and output weights, respectively, b z ∈ R n and b y ∈ R p are bias terms. The implicit framework includes most current neural network architectures (e.g. deep and residual networks) as special cases. To streamline the presentation we assume that σ : R → R is a single nonlinearity applied elementwise, although our results also apply in the case that each channel has a different activation function, nonlinear or linear. Equation ( 1) is called an equilibrium network since its solutions are equilibrium points of the difference equation z k+1 = σ(W z k + U x + b z ) or the ODE ż(t) = -z(t) + σ(W z(t) + U x + b z ). Our goal is to learn equilibrium networks (1) possessing the following two properties: • Well-posedness: For every input x and bias b z , equation 1 admits a unique solution z. • γ-Lipschitz: It has a finite Lipschitz bound of γ, i.e., for any input-output pairs (x 1 , y 1 ), (x 2 , y 2 ) we have y 1y 2 2 ≤ γ x 1x 2 2 .



(2018a); Tjeng et al. (2018); Liu et al. (2019); Cohen et al. (

and the work of Fazlyab et al. (2019) on Lipschitz bounds.

