MANIFOLD REGULARIZATION FOR LOCALLY STABLE DEEP NEURAL NETWORKS

Abstract

We apply concepts from manifold regularization to develop new regularization techniques for training locally stable deep neural networks. Our regularizers encourage functions which are smooth not only in their predictions but also their decision boundaries. Empirically, our networks exhibit stability in a diverse set of perturbation models, including 2 , ∞ , and Wasserstein-based perturbations; in particular, against a state-of-the-art PGD adversary, a single model achieves both ∞ robustness of 40% at = 8/255 and 2 robustness of 48% at = 1.0 on CIFAR-10. We also obtain state-of-the-art verified accuracy of 21% in the same ∞ setting. Furthermore, our techniques are efficient, incurring overhead on par with two additional parallel forward passes through the network; in the case of CIFAR-10, we achieve our results after training for only 3 hours, compared to more than 70 hours for standard adversarial training.

1. INTRODUCTION

Recent results in deep learning highlight the remarkable performance deep neural networks can achieve on tasks using data from the natural world, such as images, video, and audio. Though such data inhabits an input space of high dimensionality, the physical processes which generate the data often manifest significant biases, causing realistic inputs to be sparse in the input space. One way of capturing this intuition is the manifold assumption, which states that input data is not drawn uniformly from the input space, but rather supported on some smooth submanifold(s) of much lower dimension. Starting with the work of Belkin et al. (2006) , this formulation has been studied extensively in the setting of semi-supervised kernel and regression methods, where algorithms exploit the unlabelled data points to learn functions which are smooth on the input manifold (Geng et al., 2012; Goldberg et al., 2008; Niyogi, 2013; Sindhwani et al., 2005; Tsang and Kwok, 2007; Xu et al., 2010) . Such techniques have seen less use in the context of deep neural networks, owing in part to the ability of such models to generalize from relatively sparse data (Zhang et al., 2016) . Contributions We apply concepts from manifold regularization to train locally stable deep neural networks. In light of recent results showing that neural networks suffer widely from adversarial inputs (Szegedy et al., 2013) , our goal is to learn a function which does not vary much in the neighborhoods of natural inputs, independently of whether the network classifies correctly. We show that this definition of local stability has a natural interpretation in the context of manifold regularization, and propose an efficient regularizer based on an approximation of the graph Laplacian when the data is sparse, i.e., the pairwise distances are large. Crucially, our regularizer exploits the continuous piecewise linear nature of ReLU networks to learn a function which is smooth over the data manifold in not only its outputs but also its decision boundaries. We evaluate our approach by training neural networks with our regularizers for the task of image classification on CIFAR-10 (Krizhevsky et al., 2009) . Empirically, our networks exhibit robustness against a variety of adversarial models implementing 2 , ∞ , and Wasserstein-based attacks. We also achieve state-of-the-art verified robust accuracy under ∞ of size = 8/255. Furthermore, our regularizers are cheap: we simply evaluate the network at two additional random points for each training sample, so the total computational cost is on par with three parallel forward passes through the network. Our techniques thus present a novel, regularization-only approach to learning robust neural networks, which achieves performance comparable to existing defenses while also being an order of magnitude more efficient.

2. BACKGROUND

Manifold regularization The manifold assumption states that input data is not drawn uniformly from the input domain X , also know as the ambient space, but rather is supported on a submanifold M ⊂ X , called the intrinsic space. There is thus a distinction between regularizing on the ambient space, where the learned function is smooth with respect to the entire input domain (e.g., Tikhonov regularization (Phillips, 1962; Tikhonov et al., 2013) ), and regularizing over the intrinsic space, which uses the geometry of the input submanifold to determine the regularization norm. A common form of manifold regularization assumes the gradient of the learned function ∇ M f (x) should be small where the probability of drawing a sample is large; we call such functions "smooth". Let µ be a probability measure with support M. This leads to the following intrinsic regularizer: ||f || 2 I := M ||∇ M f (x)|| 2 dµ(x) In general, we cannot compute this integral because M is not known, so Belkin et al. (2006) propose the following discrete approximation that converges to the integral as the number of samples grows: ||f || 2 I ≈ 1 N 2 N i,j=1 (f (x i ) -f (x j )) 2 L i,j Here, the x 1 , ..., x N are samples drawn, by assumption, from the input manifold M according to µ, and L is a matrix of weights measuring the similarity between samples. The idea is to approximate the continuous input manifold using a discrete graph, where the vertices are samples, the edge weights are distances between points, and the Laplacian matrix L encodes the structure of this graph. A common choice of weights is a heat kernel: L i,j = L(x i , x j ) := exp(-||x i -x j || 2 /s). To improve computational costs, weights are often truncated to the k-nearest neighbors or within some -ball. Note that the Laplacian can also be interpreted as a discrete matrix operator, which converges under certain conditions to the continuous Laplace operator (Belkin and Niyogi, 2008) . ReLU networks Our development focuses on a standard architecture for deep neural networks: fully-connected feedforward networks with ReLU activations. In general, we can write the function represented by such a network with n layers and parameters θ = {A i , b i } i=1,...,n-1 as z 0 = x (3) ẑi = A i • z i-1 + b i for i = 1, ..., n -1 (4) z i = σ(ẑ i ) for i = 1, ..., n -2 (5) f (x; θ) = ẑn-1 where the A i are the weight matrices and the b i are the bias vectors. We call the z i "hidden activations", or more simply, activations, and the ẑi "pre-activations". In this work, we consider networks in which σ(•) in ( 5) is the Rectified Linear Unit (ReLU) z i = σ(ẑ i ) := max(0, ẑi ) It is clear from this description that ReLU networks are a family of continuous piecewise linear functions. We denote the linear function induced by an input x as f x (•; θ), i.e., the analytic extension of the local linear component about x over the input domain. Adversarial robustness One common measure of robustness for neural networks is against a norm-bounded adversary. In this model, the adversary is given an input budget over a norm 

