A UNIFIED ALGEBRAIC PERSPECTIVE ON LIPSCHITZ NEURAL NETWORKS

Abstract

Important research efforts have focused on the design and training of neural networks with a controlled Lipschitz constant. The goal is to increase and sometimes guarantee the robustness against adversarial attacks. Recent promising techniques draw inspirations from different backgrounds to design 1-Lipschitz neural networks, just to name a few: convex potential layers derive from the discretization of continuous dynamical systems, Almost-Orthogonal-Layer proposes a tailored method for matrix rescaling. However, it is today important to consider the recent and promising contributions in the field under a common theoretical lens to better design new and improved layers. This paper introduces a novel algebraic perspective unifying various types of 1-Lipschitz neural networks, including the ones previously mentioned, along with methods based on orthogonality and spectral methods. Interestingly, we show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. We also prove that AOL biases the scaled weight to the ones which are close to the set of orthogonal matrices in a certain mathematical manner. Moreover, our algebraic condition, combined with the Gershgorin circle theorem, readily leads to new and diverse parameterizations for 1-Lipschitz network layers. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers. Finally, the comprehensive set of experiments on image classification shows that SLLs outperform previous approaches on certified robust accuracy. Code is available at github.com/araujoalexandre/Lipschitz-SLL-Networks.

1. INTRODUCTION

The robustness of deep neural networks is nowadays a great challenge to establish confidence in their decisions for real-life applications. Addressing this challenge requires guarantees on the stability of the prediction, with respect to adversarial attacks. In this context, the Lipschitz constant of neural networks is a key property at the core of many recent advances. Along with the margin of the classifier, this property allows us to certify the robustness against worst-case adversarial perturbations. This certification is based on a sphere of stability within which the decision remains the same for any perturbation inside the sphere (Tsuzuku et al., 2018) . The design of 1-Lipschitz layers provides a successful approach to enforce this property for the whole neural network. For this purpose, many different techniques have been devised such as spectral normalization (Miyato et al., 2018; Farnia et al., 2019) , orthogonal parameterization (Trockman et al., 2021; Li et al., 2019; Singla et al., 2021; Yu et al., 2022; Xu et al., 2022) , Convex Potential Layers (CPL) (Meunier et al., 2022) , and Almost-Orthogonal-Layers (AOL) (Prach et al., 2022) . While all these techniques share the same goal, their motivations, and derivations can greatly differ, delivering different solutions. Nevertheless, their raw experimental comparison fails to really gain insight into their peculiar performance, soundness, and in the end their possible complementarity. Therefore a question acts as a barrier for an in-depth analysis and future development: Are there common principles underlying the developments of 1-Lipschitz Layers? In this paper, we propose a novel perspective to answer this question based on a unified Semidefinite Programming (SDP) approach. We introduce a common algebraic condition underlying various types of methods like spectral normalization, orthogonality-based methods, AOL, and CPL. Our key insight is that this condition can be formulated as a unifying and simple SDP problem, and that the development of 1-Lipschitz architectures systematically arise by finding "analytical solutions" of this SDP. Our main contributions are summarized as follows. • We provide a unifying algebraic perspective for 1-Lipschitz network layers by showing that existing techniques such as spectral normalization, orthogonal parameterization, AOL, and CPL can all be recast as a solution of the same simple SDP condition (Theorem 1 and related discussions). Consequently, any new analytical solutions of our proposed SDP condition will immediately lead to new 1-Lipschitz network structures. • Built upon the above algebraic viewpoint, we give a rigorous mathematical interpretation for AOL explaining how this method promotes "almost orthogonality" in training (Theorem 2). • Based on our SDPs, a new family of 1-Lipschitz network structures termed as SDP-based Lipschitz layers (SLL) has been developed. Specifically, we apply the Gershgorin circle theorem to obtain some new SDP solutions, leading to non-trivial extensions of CPL (Theorem 3). We derive new SDP conditions to characterize SLL in a very general form (Theorem 4). • Finally, we show, by a comprehensive set of experiments, that our new SDP-based Lipschitz layers outperform previous approaches on certified robust accuracy. Our work is inspired by Fazlyab et al. ( 2019) that develops SDP conditions for numerical estimation of Lipschitz constants of given neural networks. A main difference is that we focus on "analytical SDP solutions" which can be used to characterize 1-Lipschitz network structures.

2. RELATED WORK

In recent years, certified methods have been central to the development of trustworthy machine learning and especially for deep learning. Randomized Smoothing (Cohen et al., 2019; Salman et al., 2019) is one of the first defenses to offer provable robustness guarantees. The method simply extends a given classifier by the smart introduction of random noise to enhance the robustness of the classifier. Although this method offers an interesting level of certified robustness, it suffers from important downsides such as the high computational cost of inference and some impossibility results from information-theory perspective (Yang et al., 2020; Kumar et al., 2020) . Another approach to certify the robustness of a classifier is to control its Lipschitz constant (Hein et al., 2017; Tsuzuku et al., 2018) . The main idea is to derive a certified radius in the feature space by upper bounding the margin of the classifier. et al., 2018; Farnia et al., 2019) . Each layer is, by construction, 1-Lipschitz. Later, a body of research replaces the normalized weight matrix by an orthogonal matrix. It improves upon the spectral normalization method by adding the gradient preservation (Li et al., 2019; Trockman et al., 2021; Singla et al., 2021; Yu et al., 2022; Xu et al., 2022) . These methods constrain the parameters by orthogonality during training. Specifically, the Cayley transform can be used to constrain the weights (Trockman et al., 2021) and, in a similar fashion, SOC (Singla et al., 2021) parameterizes their layers with the exponential of a skew symmetric matrix making it orthogonal. To reduce cost,



See Proposition 1 of Tsuzuku et al. (2018) for more details. This radius, along with the Lipschitz constant of the network can certify the robustness. In order to reduce the Lipschitz constant and have a non-trivial certified accuracy, Tsuzuku et al. (2018) and Leino et al. (2021) both upper bound the margin via computing a bound on the global Lipschitz constant, however, these bounds have proved to be loose. Instead of upper bounding the global Lipschitz constant, Huang et al. (2021b) leverages local information to get tighter bound on the Lipschitz constant. On the other hand, other works, instead of upper bounding the local or global Lipschitz, devised neural networks architecture that are provably 1-Lipschitz. One of the first approaches in this direction consists of normalizing each layer with its spectral norm (Miyato

