DEEP LEARNING WITH DATA PRIVACY VIA RESIDUAL PERTURBATION

Abstract

Protecting data privacy in deep learning (DL) is at its urgency. Several celebrated privacy notions have been established and used for privacy-preserving DL. However, many of the existing mechanisms achieve data privacy at the cost of significant utility degradation. In this paper, we propose a stochastic differential equation principled residual perturbation for privacy-preserving DL, which injects Gaussian noise into each residual mapping of ResNets. Theoretically, we prove that residual perturbation guarantees differential privacy (DP) and reduces the generalization gap for DL. Empirically, we show that residual perturbation outperforms the stateof-the-art DP stochastic gradient descent (DPSGD) in both membership privacy protection and maintaining the DL models' utility. For instance, in the process of training ResNet8 for the IDC dataset classification, residual perturbation obtains an accuracy of 85.7% and protects the perfect membership privacy; in contrast, DPSGD achieves an accuracy of 82.8% and protects worse membership privacy.

1. INTRODUCTION

Many high-capacity deep nets (DNs) are trained with private data, including medical images and financial transaction data (Yuen et al., 2011; Feng et al., 2017; Liu et al., 2017) . DNs usually overfit and can memorize the private training data, which makes training DNs exposed to data privacy leakage (Fredrikson et al., 2015a; Shokri et al., 2017; Salem et al., 2018; Yeom et al., 2018; Sablayrolles et al., 2018) . Given a pre-trained DN, the membership inference attack can determine if an instance is in the training set based on DN's response (Fredrikson et al., 2014; Shokri et al., 2017; Salem et al., 2018) ; the model extraction attack can learn a surrogate model that matches the target model, given the adversary only black-box access to the target model (Tramèr et al., 2016; Gong & Liu, 2018) ; the model inversion attack can infer certain features of a given input from the output of a target model (Fredrikson et al., 2015b; Al-Rubaie & Chang, 2016) ; the attribute inference attack can deanonymize the anonymized training data (Gong & Liu, 2016; Zheng et al., 2018) . Machine learning (ML) with data privacy is crucial in many applications (Lindell & Pinkas, 2000; Barreno et al., 2006; Hesamifard et al., 2018; Bae et al., 2019) . Several algorithms have been developed to reduce privacy leakage include differential privacy (DP) (Dwork et al., 2006) , federated learning (FL) (McMahan et al., 2016; Konečnỳ et al., 2016) , and k-anonymity (Sweeney, 2002; El Emam & Dankar, 2008) . Objective, output, and gradient perturbations are among the most used approaches for ML with DP guarantees at the cost of significant utility degradation (Chaudhuri et al., 2011; Bassily et al., 2014; Shokri & Shmatikov, 2015; Abadi et al., 2016b; Bagdasaryan et al., 2019) . FL trains centralized ML models, through gradient exchange, with the training data being distributed at the edge devices. However, the gradient exchange can still leak the privacy (Zhu et al., 2019; Wang et al., 2019c) . Most of the existing privacy is achieved at a tremendous sacrifice of utility. Moreover, training ML models using the state-of-the-art DP stochastic gradient descent (DPSGD) leads to tremendous computational cost due to the requirement of computing and clipping the per-sample gradient (Abadi et al., 2016a) . It remains a great interest to develop new privacy-preserving ML algorithms without excessive computational overhead or degrading the utility of the ML models.

1.1. OUR CONTRIBUTION

In this paper, we propose residual perturbation for privacy-preserving deep learning (DL) with DP guarantees. At the core of residual perturbation is injecting Gaussian noise to each residual mapping of ResNet (He et al., 2016) , and the residual perturbation is theoretically principled by the stochastic differential equation (SDE) theory. The major advantages of residual perturbation are threefold: • It can protect the membership privacy of the training data almost perfectly and often without sacrificing ResNets' utility. Furthermore, it can even improve ResNets' classification accuracy. • It has fewer hyperparameters to tune than the benchmark DPSGD. Also, it is more computationally efficient than DPSGD, which requires to compute the per-sample gradient. • It can be easily implemented by a few lines of code in modern DL libraries.

1.2. RELATED WORK

Improving the utility of ML models with DP guarantees is an important task. PATE (Papernot et al., 2017; 2018) uses semi-supervised learning together with model transfer between the "student" and "teacher" models to enhance utility. Several variants of the DP notions have also been proposed to improve the privacy budget and some times can also improve the resulting model's utility at a given DP budget (Abadi et al., 2016b; Mironov, 2017; Wang et al., 2018; Dong et al., 2019) . Some post-processing techniques have also been developed to improve the utility of ML models with negligible computational overhead (Wang et al., 2019a; Liang et al., 2020) . From the SDE viewpoint, (Li et al., 2019; Wang et al., 2015) showed that several stochastic gradient Monte Carlo samplers could reach state-of-the-art performance in terms of both privacy and utility in Bayesian learning. Gaussian noise injection in residual learning has been used to improve the robustness of ResNets (Rakin et al., 2018; Wang et al., 2019b; Liu et al., 2019) . In this paper, we inject Gaussian noise to each residual mapping to achieve data privacy instead of adversarial robustness.

1.3. ORGANIZATION

We organize this paper as follows: In Section 2, we introduce the residual perturbation for privacypreserving DL. In Section 3, we present the generalization and DP guarantees for residual perturbation. In Section 4, we numerically verify the efficiency of the residual perturbation in protecting data privacy without degrading the underlying models' utility. We end with some concluding remarks. Technical proofs and some more experimental details and results are provided in the appendix.

1.4. NOTATIONS

We denote scalars by lower or upper case letters; vectors/ matrices by lower/upper case bold face letters. For a vector x = (x 1 , • • • , x d ) ∈ R d , we use x 2 = ( d i=1 |x i | 2 ) 1/2 to denote its 2 norm. For a matrix A, we use A 2 to denote its induced norm by the vector 2 norm. We denote the standard Gaussian in R d as N (0, I) with I ∈ R d×d being the identity matrix. The set of (positive) real numbers is denoted as (R + ) R. We use B(0, R) to denote the ball centered at 0 with radius R.

2. ALGORITHMS

2.1 DEEP RESIDUAL LEARNING AND ITS CONTINUOUS ANALOGUE Given the training set S N := {x i , y i } N i=1 , with {x i , y i } ⊂ R d × R being a data-label pair. For a given x i the forward propagation of a ResNet with M residual mappings can be written as x l+1 = x l + F (x l , W l ), for l = 0, 1, • • • , M -1, with x 0 = x i ; ŷi = f (x M ), where F (•, W l ) is the nonlinear mapping of the lth residual mapping parameterized by W l ; f is the output activation function, and ŷi is the predicted label for x i . The heuristic continuum limit of (1) is dx(t) = F (x(t), W(t))dt, x(0) = x, where t is the time variable. (2) The ordinary differential equation (ODE) (2) can be revertible, and thus the ResNet counterpart might be exposed to data privacy leakage. For instance, we use the ICLR logo (Fig. 1 (a )) as the initial data x in (2). Then we simulate the forward propagation of ResNet by solving (2) from t = 0 to t = 1 using the forward Euler solver with a time step size ∆t = 0.01 and a given velocity field F (x(t), W(t)) (see Appendix E for the details of F (x(t), W(t))), which maps the original image to its features (Fig. 1 (b)). To recover the original image, we start from the feature and use the backward Euler iteration, i.e., x(t) = x(t + ∆t) -∆tF ( x(t + ∆t), t + ∆t), to evolve x(t) from t = 1 to t = 0 with x(1) = x(1) being the features obtained in the forward propagation. We plot the recovered image from features in Fig. 1 (c), and the original image can be almost perfectly recovered.

