ROBUST NEURAL ODES VIA CONTRACTIVITY-PROMOTING REGULARIZATION

Abstract

Neural networks can be fragile to input noise and adversarial attacks. In this work, we consider Neural Ordinary Differential Equations (NODEs) -a family of continuous-depth neural networks represented by dynamical systems -and propose to use contraction theory to improve their robustness. A dynamical system is contractive if two trajectories starting from different initial conditions converge to each other exponentially fast. Contractive NODEs can enjoy increased robustness as slight perturbations of the features do not cause a significant change in the output. Contractivity can be induced during training by using a regularization term involving the Jacobian of the system dynamics. To reduce the computational burden, we show that it can also be promoted using carefully selected weight regularization terms for a class of NODEs with slope-restricted activation functions, including convolutional networks commonly used in image classification. The performance of the proposed regularizers is illustrated through benchmark image classification tasks on MNIST and FashionMNIST datasets, where images are corrupted by different kinds of noise and attacks.

1. INTRODUCTION

Neural networks (NNs) have demonstrated outstanding performance in image classification, natural language processing, and speech recognition tasks. However, they can be sensitive to input noise or meticulously crafted adversarial attacks (Xu et al., 2020; Carlini & Wagner, 2017; Athalye et al., 2018; Szegedy et al., 2013) . The customary remedies are either heuristic, such as feature obfuscation (Miller et al., 2020 ), adversarial training (Goodfellow et al., 2014; Allen-Zhu & Li, 2022) , and defensive distillation (Papernot et al., 2016) , or certificate-based such as Lipschitz regularization (Xu et al., 2020; Fazlyab et al., 2019; Pauli et al., 2021; Aquino et al., 2022; Virmaux & Scaman, 2018; Combettes & Pesquet, 2020) . The overall intent of certificate-based approaches is to penalize the input-to-output sensitivity of NNs to improve robustness. Recently, the connections between NNs and dynamical systems have been extensively explored. Representative results include classes of NNs stemming from the discretization of dynamical systems (Haber & Ruthotto, 2017) and NODEs (Chen et al., 2018) , which transform the input through a continuous-time ODE embedding training parameters. The continuous-time nature of NODEs makes them particularly suitable for learning complex dynamical systems (Rubanova et al., 2019; Greydanus et al., 2019) and allows borrowing tools from dynamical system theory to analyze their properties (Fazlyab et al., 2022; Galimberti et al., 2021) . In this paper, we employ contraction theory to improve the robustness of NODEs. A dynamical system is contractive if all trajectories converge exponentially fast to each other (Lohmiller & Slotine, 1998; Tsukamoto et al., 2021) . Through the lens of contraction, slight perturbations of initial conditions have a diminishing impact over time on the NODE state. With the above considerations, we propose a class of regularizers that promote contractivity of NODEs during the training. In the most general case, the regularizers require the Jacobian matrix of the NODE, which might be computationally challenging to obtain for deep networks. Nevertheless, for a wide class of NODEs with slope-restricted activation functions, we show that contractivity can be promoted by directly penalizing the weights during the training. Moreover, by leveraging the linearity of convolution operations, we demonstrate that contractivity can be promoted for convolutional NODEs by regularizing the convolution filters only.

1.1. RELATED WORK

Several works have focused on improving the robustness of general NNs against input noise and adversarial attacks using dynamical system theory. For example, the notion of incremental dissipativity is used to provide robustness certificates for NNs in the form of a linear matrix inequality (Aquino et al., 2022) . The works Chen et al. (2021; 2022) address the robustness issue of NNs by using a closed-loop control method from the perspective of dynamical systems. A control process is added to a trained NN to generate control signals to mitigate the perturbations in input data. Nevertheless, the method requires to solve an optimal control problem for the inference of an input sample, which increases the computational burden. A detailed study on the robustness of NODEs has been done by Hanshu et al. (2019) , where the authors show that NODEs can be more robust against random perturbations than common convolutional NNs. Moreover, they study time-invariant NODEs, and propose to regularize their flows to further enhance the robustness. To bolster the defense against adversarial attacks, NODEs equipped with Lyapunov-stable equilibrium points have been proposed (Kang et al., 2021) . Likewise, Rodriguez et al. ( 2022) introduced a loss function to promote robustness based on a control-theoretic Lyapunov condition. Both methods have shown promising performance against adversarial attacks. Finally, Massaroli et al. ( 2020) design provably stable NODEs and argue that stability can reduce the sensitivity to small perturbations of the input data. Nevertheless, this claim is not supported by theoretical analysis or numerical validation. In comparison to all the aforementioned works, in this paper, we employ contraction theory to regularize the trajectories of NODEs and improve robustness. Recently, contraction theory has been employed in the framework of NNs for various purposes. For instance, contractivity is exploited to improve the well-posedness and robustness of implicit NNs (Jafarpour et al., 2021) , the trainability of recurrent NNs (Revay & Manchester, 2020; Jafarpour et al., 2022) , and the analysis of Hopfield NNs with Hebbian learning (Centorrino et al., 2022) . In Zakwan et al. (2022) , the authors propose a Hamiltonian NODE that is contractive by design to improve robustness. However, the extension to different classes of NODEs, including convolutional NODEs, is not straightforward. Besides the robustification of NNs and NODEs, contractivity has also been exploited for learning NN-based dynamical models from data. For instance, Singh et al. (2021) and Revay et al. (2021a; b) utilize contraction theory to learn stabilizable nonlinear NN models from available data.

1.2. CONTRIBUTIONS

The contribution of this paper is fourfold. • We show that contractivity can be used to improve the robustness of NODEs, and demonstrate how to promote contractivity for general NODEs during training by including regularization terms in the cost function. • The regularization terms involve optimizing the Jacobian matrix in NODEs, which might be computationally expensive. Interestingly, for a wide class of NODEs with slope-restricted activation functions, we prove that contractivity can be promoted by carefully penalizing weight matrices and without optimizing the Jacobian matrix. • By exploiting the linearity of convolution operations and the above results for NODEs with slope-restricted activation functions, we show that contractivity for convolutional NODEs can be induced by suitably penalizing the convolutional filters. • We conduct experiments on MNIST and FashionMNIST datasets with test images perturbed by different kinds of noise and adversarial attacks. Compared to vanilla NODEs, by using contractivity-promoting regularization terms the average test accuracy can be improved up to 34% in the presence of input noise and up to 30% in the case of adversarial attacks.

