MIXCON: ADJUSTING THE SEPARABILITY OF DATA REPRESENTATIONS FOR HARDER DATA RECOVERY Anonymous

Abstract

To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks. Our method is motivated by the theoretical insights of data separability in neural networking training and results on the hardness of model inversion. Empirically, by adjusting the separability of data representation, we show that there exist sweet-spots for data separability such that it is difficult to recover data during inference while maintaining data utility.

1. INTRODUCTION

Over the past decade, deep neural networks have shown superior performances in various domains, such as visual recognition, natural language processing, robotics, and healthcare. However, recent studies have demonstrated that machine learning models are vulnerable in terms of leaking private data He et al. (2019) ; Zhu et al. (2019) ; Zhang et al. (2020b) . Hence, preventing private data from being recovered by malicious attackers has become an important research direction in deep learning research. Distributed machine learning Shokri & Shmatikov (2015) ; Kairouz et al. (2019) has emerged as an attractive setting to mitigate privacy leakage without requiring clients to share raw data. In the case of an edge-cloud distributed learning scenario, most layers are commonly offloaded to the cloud, while the edge device computes only a small number of convolutional layers for feature extraction, due to power and resource constraints Kang et al. (2017) . For example, service provider trains and splits a neural network at a "cut layer," then deploys the rest of the layers to clients Vepakomma et al. (2018) . Clients encode their dataset using those layers, then send the data representations back to cloud server using the rest of layers for inference Teerapittayanon et al. (2017) ; Ko et al. (2018) ; Vepakomma et al. (2018) . This gives an untrusted cloud provider or a malicious participant a chance to steal sensitive inference data from the output of "cut layer" on the edge device side, i.e. inverting data from their outputs Fredrikson et al. (2015) ; Zhang et al. (2020b) ; He et al. (2019) . In the above distributed learning setup, we investigate how to design a hard-to-invert data representation function (or hidden data representation function), which is defined as the output of the neural network's intermediate layer. We focus on defending data recovery during inference. The goal is to hide sensitive information and to protect data representations from being used to reconstruct the original data while ensuring that the resulted data representations are still informative enough for decision making. We use the model inversion attack that reconstructs individual data He et al. (2019) ; Zhang et al. (2020b) to evaluate defense performance and model accuracy to evaluate data utility. The core question here is how to achieve the goal, especially protecting individual data from being recovered. We propose data separability, also known as the minimum (relative) distance between (the representation of) two data points, as a new criterion to investigate and understand the trade-off between data utility and hardness of data recovery. Recent theoretical studies show that if data points are separable in the hidden embedding space of a DNN model, it is helpful for the model to achieve good classification accuracy Allen- Zhu et al. (2019b) . However, larger separability is also easier to recover inputs. Conversely, if the embeddings are non-separable or sometimes overlap with one another, it is challenging to recover inputs. Nevertheless, the model may not be able to learn to achieve good performance. Two main questions arise. First, is there an effective way to adjust the separability of data representations? Second, are there "sweet spots" that make the data representations difficult for inversion attacks while achieving good accuracy. This paper aims to answer these two questions by learning a feature extractor that can adjust the separability of data representations embedded by a few neural network layers. Specifically, we propose to add a self-supervised learning-based novel regularization term to the standard loss function during training. We conduct experiments on both synthetic and benchmark datasets to demonstrate that with specific parameters, such a learned neural network is indeed difficult to recover input data while maintaining data utility. Our contributions can be summarized as: • To the best of our knowledge, this is the first proposal to investigate the trade-off between data utility and data recoverability from the angle of data representation separability; • We propose a simple yet effective loss term, Consistency Loss -MixCon for adjusting data separability; • We provide the theoretical-guided insights of our method, including a new exponential lower bound on approximately solving the network inversion problem, based on the Exponential Time Hypothesis (ETH); and • We report experimental results comparing accuracy and data inversion results with/without incorporating MixCon. We show MixCon with suitable parameters makes data recovery difficult while preserving high data utility. The rest of the paper is organized as follow. We formalize our problem in Section 2. In Section 3, we present our theoretical insight and introduce the consistency loss. We demonstrate the experiment results in Section 4. We defer the technical proof and experiment details to Appendix.

2. PRELIMINARY

Distributed learning framework. We consider a distributed learning framework, in which users and servers collaboratively perform inferences Teerapittayanon et al. (2017) ; Ko et al. (2018) ; Kang et al. (2017) . We have the following assumptions: 1) Datasets are stored at the user sides. During inference, no raw data are ever shared among users and servers; 2) Users and servers use a split model Vepakomma et al. (2018) where users encode their data using our proposed mechanism to extract data representations at a cut layer of a trained DNN. Servers take encoded data representations as inputs and compute outputs using the layers after the cut layer in the distributed learning setting; 3) DNN used in the distributed learning setting can be regularized by our loss function (defined later). Threat model. We consider the attack model with access to the shared hidden data representations during the client-cloud communication process. The attacker aims to recover user data (i.e., pixelwise recovery for images in vision task). To quantify the upper bound of privacy leakage under this threat model, we allow the attacker to have more power in our evaluation. In addition to having access to extracted features, we allow the attacker to see all network parameters of the trained model. Problem formulation. Formally, let h : R d → R m denote the local feature extractor function, which maps an input data x ∈ R d to its feature representation h(x) ∈ R m . The local feature extractor is a shallow neural network in our setting. The deep neural network on the server side is denoted as g : R m → R C , which performs classification tasks and maps the feature representation to one of C target classes. The overall neural network f : R d → R C , and it can be written as f = g • h. Our overall objectives are: • Learn the feature representation mechanism (i.e. h function) that safeguards information from unsolicited disclosure. • Jointly learn the classification function g, and the feature extraction function h to ensure the information extracted is useful for high-performance downstream tasks.

3. CONSISTENCY LOSS FOR ADJUSTING DATA SEPARABILITY

To address the issue of data recovery from hidden layer output, we propose a novel consistency loss in neural network training, as shown in Figure 1 . Consistency loss is applied to the feature extractor h to encourage encoding closed but separable representations for the data of different classes. Thus, the feature extractor h can help protect original data from being inverted by an attacker during inference while achieving desirable accuracy.  ! = ⁄ ∆ % 2 ' = ⁄ ( ℎ 2 ! = ⁄ ∆ % 2 ' = ⁄ ( ℎ 2

3.1. DATA SEPARATION AS A GUIDING TOOL

Our intuition is to adjust the information in the data representations to a minimum such that downstream classification tasks can achieve good accuracy but not enough for data recovery through model inversion attacks He et al. (2019) . The question is, what is the right measure on the amount of information for successful classification and data security? We propose to use data separability as the measure. This intuition is motivated by the theoretical results of deep learning. In particular, • Over-parameterized deep learning theory -the well separated data requires a narrower network to train, • In-approximation theory -the worse separability of the data, the harder of the inversion problem. Definition 3.1 (Data separability). Let δ h denote the separability of hidden layer over all pairwise inputs x 1 , x 2 , • • • , x n ∈ R d , i.e., δ h := min i =j∈[n] h(x i ) -h(x j ) 2 .

(controlling accuracy)

Let S denote a set of pairs that supposed to be close in hidden layer. Let ∆ H denote the maximum distance with respect to that set S ∆ H := max (i,j)∈S h(x i ) -h(x j ) 2 . (controlling invertibility) Intuitively, we expect the upper bound on data separability (∆ H ) relates to the invertibility and the lower bound on data separability (δ h ) relates to the accuracy. Lower bound on data separability implies better accuracy Recent line of deep learning theory Allen- Zhu et al. (2019b) indicates that data separability is perhaps the only matter fact for learnability (at least for overparameterized neural network), leading into the following results. Theorem 3.2. Allen- Zhu et al. (2019b) Suppose the training data points are separable, i.e., δ h > 0. If the width of a L-layer neural network with ReLU gates satisfies m ≥ poly(n, d, L, 1/δ h ), initializing from a random weight matrix W , (stochastic) gradient descent algorithm can find the global minimum of neural network function f . Essentially, the above theorem indicates that we can (provably) find global minimum of a neural network given well separated data, and better separable data points requires narrower neural network and less running time. Upper bound on data separability implies hardness of inversion. When all data representation is close to each other, i.e. ∆ H is sufficiently small, we expect the inversion problem is hard. We support this intuition by proving that the neural network inversion problem is hard to approximate within some constant factor when assuming NP =RP.foot_0 Existing work Lei et al. (2019) indicates that the decision version of the neural network inversion problem is NP-hard. However, this is insufficient since it is usually easy to find an approximate solution, which could leak much information on the original data. It is an open question whether the approximation version is also challenging. We strengthen the hardness result and show that by assuming NP =RP, it is hard to recovery an input that approximates the hidden layer representation. Our hardness result implies that given hidden representations are close to each other, no polynomial time can distinguish their input. Therefore, it is impossible to recover the real input data in polynomial time. Theorem 3.3 (Informal). Assume NP =RP, there is no polynomial time algorithm that is able to give a constant approximation to thee neural network inversion problem. The above result only rules out the polynomial running time recovery algorithm but leaves out the possibility of a subexponential time algorithm. To further strengthen the result, we assume the well-known Exponential Time Hypothesis (ETH), which is widely accepted in the computation complexity community. Hypothesis 3.4 (Exponential Time Hypothesis (ETH) Impagliazzo et al. (1998) ). There is a δ > 0 such that the 3SAT problem cannot be solved in O(2 δn ) time. Assuming ETH, we derive an exponential lower bound on approximately recovering the input. Corollary 3.5 (Informal). Assume ETH, there is no 2 o(n 1-o(1) ) time algorithm that is able to give a constant approximation to neural network inversion problem.

3.2. CONSISTENCY LOSS -MixCon

Follow the above intuitions, we propose a novel loss term MixCon loss -L mixcon -to incorporate in training. MixCon adjusts data separability by forcing the consistency of hidden data representations from different classes. This additional loss term balances the data separability, punishing feature representations that are too far or too close to each other. Noting that we choose to mix data from different classes instead of the data within a class, in order to bring more confusion in embedding space and potentially hiding data label informationfoot_1 . MixCon loss L mixcon : We add consistency penalties to force the data representation of i-th data in different classes to be similar, while without any overlapping for any two data points. L mixcon := 1 p 1 |C| • (|C| -1) p i=1 c1∈C c2∈C (dist(i, c 1 , c 2 ) + β/dist(i, c 1 , c 2 )). (1) A practical choice for the pairwise distance is dist(i, c 1 , c 2 ) = h(x i,c1 ) -h(x i,c2 ) 2 2 3 , where x i,c is the i-th input data point in class c, p := min c∈C |c|, and β > 0 balances the data separability. Note that the order i is not fixed for a data point due to random shuffling in the regular training process. Thus Eq. 1 can nearly perform as all-pair comparisons in training with data shuffling. The first term punishes large distance while the second term enforces sufficient data separability. In general, we could replace (dist + β/dist) by convex functions with asymptote shape on non-negative domain, that is, function with value reaches infinity on both ends of [0, ∞). We consider the classification loss L class := - N i=1 C c=1 y i,c • log( y i,c ) (cross entropy) (2) where y i ∈ R C is the one-hot representation of true label and y i = f (x i ) ∈ R C is the prediction score of data i ∈ {1, . . . , N }. The final objective function is L := L class + λ • L mixcon . We simultaneously train h and g, where λ and β are tunable hyper-parameters associated with consistency loss regularization to adjust separability. We discuss the effect of λ and β in experiments (Section 4).

4. EXPERIMENTAL RESULTS

Our hypothesis in this work is MixCon loss can adjust the separability of data hidden representations. A moderate reduction of representation separability can still keep data utility while making it harder to recover from its representation. To show this hypothesis hold, we conduct two experiments. Synthetic experiments provide a straightforward illustration for the relationship among data separability, data utility and reversibility. Benchmark experiments are used for image data classification and recovery evaluation.

4.1. DATA RECOVERY MODEL

To empirically evaluate the quality of inversion, we formally define the white-box data recovery (inversion) model He et al. (2019) used in our experiments. The model aims to solve an optimization problem in the input space. Given a representation z = h(x) of a testing data point, and a public function h (the trained network that generates data representations), the inversion model tries to find the original input x: x * = arg min s L(h(s), z) + α • R(s) (3) where L is the loss function that measures the similarity between h(s) and z, and R is the regularization term. We specify L and R used in each experiment later. We solve Eq. ( 3) by iterative gradient descent.

4.2. EXPERIMENTS WITH SYNTHETIC DATA

To allow precise manipulation and straightforward visualization for data separability, our experiments use generated synthetic data with a 4-layer fully-connected network, such that we can control the dimensionality. In this section, we want to answer the following questions: Q1 What is the impact of having β in Eq.( 1) to bound the smallest data pairwise distance? Q2 Is feature encoded with MixCon mechanism harder to invert? Network, data generation and training. We defined the network as y = q(softmax(f (x))), f (x) = W 4 • σ(W 3 • σ(W 2 • (σ(W 1 x + b 1 )) + b 2 ) + b 3 ) + b 4 x ∈ R 10 , W 1 ∈ R 500×10 , W 2 ∈ R 2×500 , W 3 ∈ R 100×2 , W 4 ∈ R 2×100 , b 1 ∈ R 500 , b 2 ∈ R 2 , b 3 ∈ R 100 , b 4 ∈ R 2 . For a vector z, we use q(z) to denote the index i such that |z i | > |z j |, ∀j = i. We initialize each entry of W k and b k from N (u k , 1), where u k ∼ N (0, α) and k ∈ {1, 2, 3, 4}. We generate synthetic samples (x, y) from two multivariate normal distribution. Positive data are sampled from N (0, I), and negative data are sampled from N (-1, I), ending up with 800 training samples and 200 testing samples, where the covariance matrix I is an identity diagonal matrix. L mixcon is applied to the 2nd fully-connected layer. We train the network for 20 epochs with cross-entropy loss and SGD optimizer with 0.1 learning rate. We apply noise to the labels by randomly flipping 5% of labels to increase training difficulty. Testing setup. We compare the results under the following settings: • Vanilla: training using only L class . • MixCon: training with MixCon loss with parameters (λ, β)foot_3 . Table 1 : Data utility (accuracy). Vanilla is equivalent to (λ = 0, β = 0). Two MixCon "default" settings both use λ = 0.1 but vary in β = 0.01 and β = 0. "Deeper"/"Wider" indicate increasing the depth / width of layers in the network on server side g(x). We perform model inversion using Eq. ( 3) without any regularization term R(x) and L is the 1 -loss function. Detailed optimization process is listed in Appendix C.1. Results. To answer Q1 that how β in Eq.( 1) affect the smallest data pairwise distance, we visualize the change of data representations at initial and ending epochs in 1 . When β = 0, MixCon achieves chance accuracy only as it encodes all the h(x) to hidden space (0,0) (Figure 2 f). While having β > 0 balancing the separability, MixCon achieves similar accuracy as Vanilla. Based on Theorem 3.2, we further present two strategies to ensure reasonable accuracy while comprise of reducing data separability by increasing the depth or the width of the layers g(z), the network after the layer that is applied L mixcon . In practice, we add two more fully-connected layers with 100 neurons after the 3nd layer for "deeper" g(x), and change the number of neurons on the 3nd layer to 2048 for "wider" g(x). We show the utility results in Table 1 . Using deeper or wider g(z), MixCon (λ = 0.1, β = 0.01) improves accuracy. Whereas MixCon (λ = 0.1, β = 0) fails, because zero data separability is not learnable no matter how g(z) changes. This gives conformable answer that β is an important factor to guarantee neural network to be trainable. To answer Q2, we evaluate the quality of data recovery using the inversion model. We use both square error (SE) and cosine similarity (CS) of x and x * to evaluate the data recovery accuracy. We show the quantitative inversion results in Table 2 with the mean and worst case values. Higher SE or lower CS indicates a worse inversion. Apparently, data representation from MixCon trained network is more difficult to recover compared to Vanilla strategy.

DATASETS

In this section, we would like to answer the following questions: Neural network is optimized using cross-entropy loss and SGD optimizer with learning rate 0.01 for 20 epochs. We do not use any data augmentation or manual learning rate decay. MixCon loss is applied to the output of 2nd convolutional layer blocks in LeNet5. We use mini-batch training and each batch contains 40 data points from each class. 6 We train the model with different pairs of (λ, β) in Eq. ( 1) for the following testing. Specifically, we vary λ from: {0.01, 0.1, 0.5, 1, 2, 5, 10, 100} and β from: {10 -2 , 10 -3 , 10 -4 , 10 -5 , 10 -6 , 10 -foot_6 , 10 -8 }. MNIST FashionMNIST SVHN Vanilla - MixCon (λ = 1.0, β = 10 -4 ) Vanilla - MixCon (λ = 1.0, β = 10 -4 ) Vanilla - MixCon (λ = 0.5, β = 10 -4 ) Acc (%) Testing setup. We record the testing accuracy and pairwise distance of data representation under each pair of (λ, β) for each dataset. Following a recent model inversion method He et al. (2019) , we define L in Eq. ( 3) as 2 -loss function, R as the regularization term capturing the total variation of a 2D signal defined as R(a) = i,j ((a i+1,j -a i,j ) 2 + (a i,j+1 -a i,j ) 2 ) 1/2 . The inversion attack is applied to the output of 2nd convolutional layer blocks in LeNet5 and find the optimal of Eq. ( 3) though SGD optimizer. Detailed optimization process is listed in Appendix C.2. We use metrics normalized structural similarity index metric (SSIM) Wang et al. (2004) and perceptual similarity (PSIM) Johnson et al. (2016) to measure the similarity between the recovered image and the original image. The concrete definitions of SSIM and PSIM are listed in Appendix C.3.

Results

To answer Q3, we plot the complementary effects of λ and β in Figure 3 . Note that β bounds the minimal pairwise of data representations, and λ indicate the penalty power on data separability given by MixCon. Namely, a larger λ brings stronger penalty of MixCon, which enhances the regularization of data separability and results in lower accuracy. Meanwhile, with a small β, λ is not necessary to be very large, as smaller β leads to a smaller bound of data separability, thus resulting in lower accuracy. Hence, λ and β work together to adjust the separability of hidden data representations, which can affect on data utility. To answer Q4, we evaluate the quality of inversion qualitatively and quantitatively through a model inversion attack defined in "Test setup" paragraph. Specifically, for each private input x, we execute the inversion attack on h mixcon (x) and h vanilla (x) of testing images. As it is qualitatively shown in Figure 4 , first, the recovered images using model inversion from MixCon training (such as given (λ, β) ∈ {(1, 1 × 10 -7 ), (10, 1 × 10 -2 ), (100, 1 × 10 -2 )}) are visually different from the original inputs, while the recovered images from Vanilla training still look similar to the originals. Second, with the same λ (Figure 4 column c3-c5), the smaller the β it is, the less similar of the recovered images to original images. Last, with the same β (Figure 4 column c3 and c6-c8), the larger the λ it is, the less similar of the recovered images to original images. Further, we quantitatively measure the inversion performance by reporting the averaged similarity between 100 pairs of recovered images by the inversion model and their original samples. We select (λ, β) to match the accuracy 7 of MixCon with Vanilla training (see Accuracy in Table 3 ), and investigate if MixCon makes the inversion attack harder. The inverted results (see SSIM and PSIM in Table 3 ) are reported in the format of mean ± std and the worst case (the best-recovered data) similarity in parentheses for each metric. Both qualitative and quantitative results agree with our hypothesis that 1) adding L mixcon in network training can reduce the mean pairwise distance (separability) of data hidden representations; and 2) smaller separability make it more difficult to invert original inputs. By sweet spot, we can define as the set of (β, λ) that suffers with negligible accuracy loss (say within 1%) and the model inversion becomes significantly harder w.r.t computational complexity or breaks the attack (less similarity to the original input data per se). Thus by visiting through possible (λ, β), we are able to find a spot, where data utility is reasonable but harder for data recovery, such as (λ = 100, β = 1e -2) for MNIST (Figure 4 ). Thus our proposed method is helpful if the user is willing to give up on some accuracy in the hope of getting a more robust model.

5. DISCUSSION AND CONCLUSION

In this paper, we have proposed and studied the trade-off between data utility and data recovery from the angle of the separability of hidden data representations in deep neural network. We propose using MixCon, a consistency loss term, as an effective way to adjust the data separability. Our proposal is inspired by theoretical data separability results and a new exponential lower bound on approximately solving the network inversion problem, based on the Exponential Time Hypothesis (ETH). We conduct two sets of experiments, using synthetic and benchmark datasets, to show the effect of adjusting data separability on accuracy and data recovery. Our theoretical insights help explain our key experimental findings: MixCon can effectively adjust the separability of hidden data representations, and one can find "sweet-spot" parameters for MixCon to make it difficult to recover data while maintaining data utility. Our experiments are limited to small benchmark datasets in the domain of image classifications. It will be helpful to conduct experiments using large datasets in multiple domains to further the study of the potential of adjusting data separability of data representations to trade-off between data utility and data recovery.

Roadmap of Appendix

The Appendix is organized as follows. We discuss related work in Section A. We provide theoretical analysis in Section B. The details of data recovery experiment are in Section C and additional experiment details are in Section D. A RELATED WORK 

A.4 MODEL INVERSION ATTACK AND DEFENSE

The neural network inversion problem has been extensively investigated in recent years Fredrikson et al. (2015) ; He et al. (2019) ; Lei et al. (2019) ; Zhang et al. (2020b) . As used in this paper, the general approach is to cast the network inversion as an optimization problem and uses a problem specified objective. In particular, Fredrikson et al. (2015) proposes to use confidence in prediction as to the optimized objective. He et al. (2019) uses a regularized maximum likelihood estimation. Recent work Zhang et al. (2020b) also proposes to use GAN to do the model inversion. There are very few studies about defenses against model inversion attack. Existing data privacy protection mechanisms mainly rely on noise injection Fredrikson et al. (2015) ; Dwork (2008) ; Abadi et al. (2016) or Homomorphic Encryption Nandakumar et al. (2019) . While being able to mitigate attacks, existing methods significantly hinder model performance. Recently MID Wang et al. (2020) was proposed to limit the information about the model input contained in the prediction, thereby limiting the ability of an adversary to infer data information from the model prediction. Yang et al. (2020) proposed to add a purification block following by prediction output, so that the confidence score vectors predicted by the target classifier are less sensitivity of the prediction to the change of input data. However, the above two methods target the logit output layer (i.e., performing argmax). They either require auxiliary information (i.e., knowing attack model) or modifying network structure (i.e., building variational autoencoder structure for mutual information calculation). In contrast, our proposed method MixConcan easily and efficiently serve as a plug-in loss to the middle layers of arbitrarily network structures to defend inversion attack during inference.

B HARDNESS OF NEURAL NETWORK INVERSION B.1 PRELIMINARIES

We first provide the definitions for 3SAT, ETH, MAX3SAT, MAXE3SAT and then state some fundamental results related to those definitions. For more details, we refer the reader to the textbook Arora & Barak (2009) .

Definition B.1 (3SAT problem).

Given n variables and m clauses in a conjunctive normal form CNF formula with the size of each clause at most 3, the goal is to decide whether there exists an assignment to the n Boolean variables to make the CNF formula be satisfied. 2016), which are also believable in the computational complexity community.

Definition B.3 (MAX3SAT).

Given n variables and m clauses, a conjunctive normal form CNF formula with the size of each clause at most 3, the goal is to find an assignment that satisfies the largest number of clauses. We use MAXE3SAT to denote the version of MAX3SAT where each clause contains exactly 3 literals. Theorem B.4 (Håstad (2001) ). For every δ > 0, it is NP-hard to distinguish a satisfiable instance of MAXE3SAT from an instance where at most a 7/8 + δ fraction of the clauses can be simultaneously satisfied. Theorem B.5 (Håstad (2001) ; Moshkovitz & Raz (2010) ). Assume ETH holds. For every δ > 0, there is no 2 o(n 1-o(1) ) time algorithm to distinguish a satisfiable instance of MAXE3SAT from an instance where at most a fraction 7/8 + δ of the clauses can be simultaneously satisfied. We use MAXE3SAT(B) to denote the restricted special case of MAX3SAT where every variable occurs in at most B clauses. Håstad Håstad (2000) proved that the problem is approximable to within a factor 7/8 + 1/(64B) in polynomial time, and that it is hard to approximate within a factor 7/8 + 1/(log B) Ω(1) . In 2001, Trevisan improved the hardness result, Theorem B.6 (Trevisan (2001) ). Unless RP=NP, there is no polynomial time (7/8 + 5/ √ B)approximate algorithm for MAXE3SAT(B). Theorem B.7 (Håstad (2001) ; Trevisan (2001) ; Moshkovitz & Raz (2010) ). Unless ETH fails, there is no 2 o(n 1-o(1) ) time (7/8 + 5/ √ B)-approximate algorithm for MAXE3SAT(B).

B.2 OUR RESULTS

We provide a hardness of approximation result for the neural network inversion problem. In particular, we prove unless RP=NP, there is no polynomial time that can approximately recover the input of a two-layer neural network with ReLU activation functionfoot_7 . Formally, consider the inversion problem h(x) = z, x ∈ [-1, 1] d , where z ∈ R m2 is the hidden layer representation, h is a two neural network with ReLU gates, specified as h(x) = W 2 σ(W 1 x + b), W 2 ∈ R m2×m1 , W 1 ∈ R m1×d , b ∈ R m1 We want to recover the input data x ∈ [-1, 1] d given hidden layer representation z and all parameters of the neural network (i.e., W (1) , W (2) , b). It is known the decision version of neural network inversion problem is NP-hard Lei et al. (2019) . It is an open question whether approximation version is also hard. We show a stronger result which is, it is hard to give to constant approximation factor. Two notions of approximation could be consider here, one we called solution approximation Definition B.8 (Solution approximation). Given a neural network h and hidden layer representation z, we say x ∈ [-1, 1] d is an approximation solution for Eq. (4), if there exists x ∈ [-1, 1] ∈ R d , such that x -x 2 ≤ √ d and h(x) = z. Roughly speaking, solution approximation means we recovery an approximate solution. The √ d factor in the above definition is a normalization factor and it is not essential. One can also consider a weaker notion, which we called function value approximation Definition B.9 (Function value approximation). Given a neural network h and hidden layer representation z, we say x ∈ [-1, 1] d is -approximate of value to Eq. (4), if h(x ) -y 2 ≤ √ m 2 . Again, the √ m 2 factor is only for normalization. Suppose the neural network is G-Lipschitz continuous for constant G (which is the case in our proof), then an -approximate solution implies G -approximation of value. For the purpose of this paper, we focus on the second notion (i.e., function value approximation). Given our neural network is (constant)-Lipschitz continuous, this immediately implies hardness result for the first one. Our theorem is formally stated below. In the proof, we reduce from MAX3SAT(B) and utilize Theorem B.6 Theorem B.10. There exists a constant B > 1, unless RP = NP, it is hard to 1 60B -approximate Eq. (4) . Furthermore, the neural network is O(B)-Lipschitz continuous, and therefore, it is hard to find an Ω(1/B 2 ) approximate solution to the neural network. Using the above theorem, we can see that by taking a suitable constant B > 1, the neural network inversion problem is hard to approximate within some constant factor under both definitions. In particular, we conclude Theorem B.11 (Formal statement of Theorem 3.3). Assume N P = RP , there exists a constant > 0, such that there is no polynomial time algorithm that is able to give an -approximation to neural network inversion problem. Proof of Theorem B.10. Given an 3SAT instance φ with n variables and m clause, where each variable appears in at most B clauses, we construct a two layer neural network h φ and output representation z satisfy the following: We set d = n, m 1 = m + 200B 2 n and m 2 = m + 100B 2 n. For any j ∈ [m], we use φ j to denote the j-th clause and use h 1,j (x) to denote the output of the j-th neuron in the first layer, i.e., h 1,j (x) = σ(W • Completeness. If φ is satisfiable, then there exists x ∈ [0, 1] d such that h φ (x) = z. • Soundness. For any x such that h φ (x) -z 2 ≤ 1 60B √ m 2 , (1) j x + b i ) , where W (1) j is the j-th row of W (1) . For any i ∈ [n], we use X i to denote the i-th variable. Intuitively, we use the input vector x ∈ [-1, 1] n to denote the variable, and the first m neurons in the first layer to denote the m clauses. By taking W (1) j,i =    1, X i ∈ φ j ; -1, X i ∈ φ j ; 0, otherwise. and b j = -2 for any i ∈ [n], j ∈ [m], and viewing x i = 1 as X i to be false and x i = -1 as X i to be true. One can verify that h 1,j (x) = 0 if the clause is satisfied, and h 1,j (x) = 1 if the clause is unsatisfied. We simply copy the value in the second layer h j (x) = h 1,j (x) for j ∈ [m]. For other neurons, intuitively, we make 100B 2 copies for each |x i | (i ∈ n) in the output layer. This can be achieved by taking h m+(i-1)•100B 2 +k (x) = h m+(i-1)•100B 2 +k (x) + h 1,m+100B 2 n+(i-1)•100B 2 +k (x) and set h 1,m+(i-1)•100B 2 +k (x) = max{x i , 0} h 1,m+100B 2 n+(i-1)•100B 2 +k (x) = max{-x i , 0} for any i ∈ [n], k ∈ [100B 2 ]. Finally, we set the target output as z = (0, • • • , 0 m , 1, • • • , 1 100B 2 n ) We are left to prove the three claims we made about the neural network h and the target output z. For the first claim, suppose φ is satisfiable and X = (X 1 , • • • , X n ) is the assignment. Then as argued before, we can simply take x i = 1 if X i is false and x i = -1 is X i is true. One can check that h(x) = z. For second claim, suppose we are given x ∈ [-1, 1] d such that h(x) -z 2 ≤ 1 60B √ m 2 We start from the simple case when x is binary, i.e., x ∈ {-1, 1} n . Again, by taking X i to be true if x i = -1 and X i to be false when x i = 0. One can check that the number of unsatisfied clause is at most h(x) -z 2 2 ≤ 1 3600B 2 m 2 = 1 3600B 2 (m + 100B 2 n) ≤ 1 12 m + 1 3600B 2 m (5) ≤ 1 8 m - 5 √ B m The third step follows from n ≤ 3m, and the last step follows from B ≥ 15000. Next, we move to the general case that x ∈ [-1, 1] d . We would round x i to -1 or +1 based on the sign. Define x ∈ {-1, 1} n as x i = arg min t∈{-1,1} |t -x i | We prove that x induces an assignment that satisfies ( 7 8 + 5 √ B )m clauses. It suffices to prove h(x) -z 2 2 -h(x) -z 2 2 ≤ 3 100 m (6) since this implies the number of unsatisfied clause is bounded by h(x) -z 2 2 ≤ h(x) -z 2 2 + ( h(x) -z 2 2 -h(x) -z 2 2 ) ≤ ( 1 12 m + 1 36B 2 m) + 3 100 m ≤ 1 8 m - 1 5 √ B m, where the second step follow from Eq. ( 5)(6), and the last step follows from B ≥ 10 7 . We define ∆ i := |x i -x i | = 1 -|x i | ∈ [0, 1] and T := m + 128B 2 n. Then we have h(x) -z 2 2 -h(x) -z 2 2 = T j=1 (h j (x) -z j ) -(h j (x) -z j ) 2 = m j=1 (h j (x) -z j ) 2 -(h j (x) -z j ) 2 + T j=m+1 (h j (x) -z j ) 2 -(h j (x) -z j ) 2 = m j=1 h j (x) 2 -h j (x) 2 -100B 2 n i=1 ∆ 2 i ≤ 2 m j=1 |h 1,j (x) -h 1,j (x)| -100B 2 n i=1 ∆ 2 i ≤ 2 m j=1 i∈φj ∆ i -100B 2 n i=1 ∆ 2 i ≤ 2B n i=1 ∆ i -100B 2 n i=1 ∆ 2 i ≤ n 100 ≤ 3m 100 . The third step follow from z j = 0 for j ∈ [m] and for j ∈ {m + 1, • • • , m + 100B 2 n}, z j = 1, h j (x) -z j = 0 and h j (x) -z j 2 2 = ∆ i given j ∈ [m + (i -1) • 100B 2 + 1, i • 100B 2 ]. The fourth step follows from that h j (x) = h 1,j (x) ∈ [0, 1] for j ∈ [m] . The fifth step follows from the 1-Lipschitz continuity of the ReLU. The sixth step follows from each variable appears in at most B clause. This concludes the second claim. For the last claim, by the Lipschitz continuity of ReLU, we have for any x 1 , x 2 h(x 1 ) -h(x 2 ) = W (2) σ(W (1) x 1 + b) -W (2) σ(W (1) x 2 + b) ≤ W (2) • W (1) x 1 -x 2 2 It is easy to see that W (2) ≤ 2 and W (2) ≤ 200B 2 + 3B ≤ √ 203B 2 ≤ 15B, where the second step follows from B ≥ 1. Thus concluding the proof. By assuming ETH and using Theorem B.7, we can conclude Corollary B.12 (Formal statement of Corollary 3.5). Unless ETH fails, there exists a constant > 0, such that there is no 2 o(n 1-o(1) ) time algorithm that is able to give an -approximation to neural network inversion problem. The proof is similar to Theorem B.10, we omit it here.

C DETAILS OF DATA RECOVERY EXPERIMENTS C.1 INVERSION MODEL DETAILS FOR SYNTHETIC DATASET

In synthetic experiment, a malicious attacker recover original input data x ∈ R d by solving the the following optimization: x * = arg min s∈R d h(s) -z 1 To estimate the optimal, we run an SGD optimizer with a learning rate of 0.01 and decayed weight 10 -4 for 500 iterations. We test data recovery results on all the 200 testing samples. Namely, we solve the above optimization problems 200 times. Each time for a testing data point.

C.2 INVERSION MODEL DETAILS FOR BENCHMARK DATASET

In benchmark experiment, a malicious attacker recover original input data x ∈ R d by solving the the following optimization: x * = arg min s∈R d h(s) -z 2 + ζ i,j ((s i+1,j -s i,j ) 2 + (s i,j+1 -s i,j ) 2 ) 1/2 , where i, j are the indexes of pixels in an image. To estimate the optimal, we run an SGD optimizer with a learning rate of 10 and decayed weight 10 -4 for 500 iterations. We used a grid searching on the space of ζ. We find that the best data recovery comes from ζ = 0.01 for SVHN dataset and ζ = 10 -5 for MNIST and FashionMNIST by grid search.

C.3 QUANTITATIVE METRICS FOR IMAGE SIMILARITY MEASUREMENT

We adopt the following two known metrics to measure the similarity between x * and x: • Normalized structural similarity index metric (SSIM), a perception-based metric that considers the similarity between images in structural information, luminance and contrast. It is widely used in image and video compression research to quantify the difference between the original and compressed images. The detailed calculation can be found in Wang et al. (2004) . We normalize SSIM to take value range [0, 1] (original SSIM takes value range [-1, 1]). • Perceptual similarity (PSIM). Perceptual loss 

D.1 COMPARE PENALTY STRATEGIES

A natural approach arise to reduce data separability could be adding a penalty on the pair-wise distance for the data representations within a class. We name this approach as UniCon. Its loss function denoted as L unicon can be written as: L unicon = 1 C 1 |C c | • (|C c | -1) c∈C i∈Cc j∈Cc h(x i ) -h(x j ) 2 2 , The final objective function L := L class + λ • L unicon . This approach is similar to contrastive learning Khosla et al. (2020) . However, we observed that the approach is not as ideal as our proposed MixCon, in the sense of defending inversion attack. The intuition is that MixCon can induce confusing patterns to fool the neural network learning typical patterns from a class. Here we show the visualization for the three benchmark datasets in 

D.2 EFFECTS ON THE SELECTION OF MIDDLE LAYERS

The trade-off between data separability and data utility can be different for adding MixConon the different layers. In our benchmark experiment, we use a LeNet5 LeCun (2015) -a five-layer CNN. Thus there are four split methods, namely four intermediate outputs. In our collaborative inference settingfoot_9 , we visit the all four possible middle layers to apply MixCon loss. We plot the accuracy and data separability plots over the different combinations of (β, λ), together with SSIM and PSIM scores (mean and the worst-case results), for each layer on our three benchmark datasets. The results are shown as Figure 6 to Figure 17 . There is a clear trend that the shallower the h(x) it is, the easier to recover original x on average with respect to the mean SSIM and PSIM scores. The worst-case measurement may suffer from some outliers and imperfectness of the evaluation metrics. In most cases, distance and recovery similarity score for the first three layers shows a positive relationship, i.e. in Figure 6 -Figure 8 . Usually, inversion from the deeper layers is not stable. Also, splitting a network at a deeper layer in the collaborative inference setting is not common or realistic because clients, such as edge-end devices, do not have powerful computational resources. Notably, the relationship between accuracy and similarity is highly non-linear. The sweet spot for a trade-off between accuracy and difficulty of recovery is in the space where the accuracy degradation curve is slow, while recovery similarity is low. Users can search the best parameters and "cut layer" to meet certain accuracy and data recovery defending requirements in practice. 



The class RP consists of all languages L that have a polynomial-time randomized algorithm A with the following behavior: If x / ∈ L, then A always rejects x (with probability 1). If x ∈ L, then A accepts x in L with probability at least 1/2. We show the comparison in Appendix D.1. In practice, we normalize h(x) 2 to 1. To avoid division by zero, we can use a positive small ( 1) and threshold distance to the range of [ , 1/ ]. λ is the coefficient of penalty and β is balancing term for data separability. We change input channel to 3 for SVHN dataset. In the regular mini-batch training using data shuffling, when calculating the MixCon loss, we truncate the size of batch to p|C| and each mini-batch contains an equal number of samples of each class, Here p is the number of training points of the smallest class and |C| is the number of classes. Accuracy reduction is within a small tolerance, i.e., 1%. We remark there is a polynomial time algorithm for one layer ReLU neural network recovery We have presented the comparison between MixCon and vanilla training in Table3. There is no necessity to add MixCon loss for the layers before "cut layer", because the attacker is not able to get access to the original data hidden representations form those layers



Figure 2: Data hidden representation h(x) ∈ R 2 from the 2nd fully-connected layer of synthetic data at different epoch (E). Two settings of MixCon are given default λ = 0.1 but have different β. Compare to Vanilla, MixCon squeezes data representations to a smaller space over training. When β = 0, MixCon map all data to h(x) = (0, 0), which is not learnable.

Figure 2. First, in vanilla training (Figure 2 a-b), data are dispersively distributed and enlarge their distance after training. The obvious difference for MixCon training (Figure 2 c-f) is that data representations become more and more gathering through training. Second, we direct the data utility results of Vanilla and two "default" MixCon settings -(λ = 0.1, β = 0.01) and (λ = 0.1, β = 0) to Table

Figure 3: Trade-off between data separability and data utility. We show testing accuracy and mean pairwise distance (data separability) on three datasets with different λ and β. λ and β show complementary effort on adjusting data separability. A sweet-spot can be found at the (λ, β) resulting in small data separability and high data utility.

Figure 4: Qualitative evaluation for image inversion results. (λ, β) settings of MixCon denoted on the header. The corresponding testing accuracy of each dataset is denoted on the top of each row. Compared to vanilla training, inversions from the MixCon model are less realistic and distinguishable from the original images without significant accuracy dropping.

Exponential Time Hypothesis (ETH)Impagliazzo et al. (1998)). There is a δ > 0 such that the 3SAT problem defined in Definition B.1 cannot be solved in O(2 δn ) time.ETH is a stronger notion than NP = P, and is well acceptable the computational complexity community. Over the few years, there has been work proving hardness result under ETH for theoretical computer science problemsChalermsook et al. (2017); Manurangsi (2017); Chitnis et al. (2018); Bhattacharyya et al. (2018); Dinur & Manurangsi (2018); KCS & Manurangsi (2018) and machine learning problems, e.g. matrix factorizations Arora et al. (2012); Razenshteyn et al. (2016); Song et al. (2017); Ban et al. (2019), tensor decomposition Song et al. (2019). There are also variations of ETH, e.g. Gap-ETH Dinur (2016; 2017); Manurangsi & Raghavendra (2017) and random-ETH Feige (2002); Razenshteyn et al. (

we can recover an assignment to φ that satisfies at least 7 Lipschitz continuous. The neural network is O(B)-Lipschitz.

Figure 5: Qualitative evaluation for image inversion results.

(a) Accuracy vs. Distance (b) Quantitative evaluations on data recovery

Figure 6: Adding MixCon to the 1st layer of CNN on MNIST dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 7: Adding MixCon to the 2nd layer of CNN on MNIST dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 8: Adding MixCon to the 3rd layer of CNN on MNIST dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 9: Adding MixCon to the 4th layer of CNN on MNIST dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 10: Adding MixCon to the 1st layer of CNN on FashionMNIST dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 11: Adding MixCon to the 2nd layer of CNN on FashionMNIST dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 14: Adding MixCon to the 1st layer of CNN on SVHN dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Figure 16: Adding MixCon to the 3rd layer of CNN on SVHN dataset. (a) The trade-off between data separability and data utility . We show testing accuracy and mean pairwise distance (data separability) with different λ and β. λ and β show complementary effort on adjusting data separability. (b) Quantitative evaluation of data recovery results. We show SSIM and PSIM scores with different λ and β.

Schematic diagram of our data representation encoding scheme in deep learning pipeline. We show a simple toy example of classifying data points of triangles, squares, and circles. In embedding space (the middle block), data representations from different classes are constrained to a small ball with diameter ∆H , while they are separate from each other at least with distance δ h .

Inversion results on synthetic dataset reported in mean(worst) format for the 200 testing samples. Higher MSE or lower MCS indicates a worse inversion. (λ, β) denoted in header.

Quantitative evaluations for image recovery results. For fair evaluation, we match the data utility (accuracy) for Vanilla and MixCon.Structural similarity index metric (SSIM) and perceptual similarity (PSIM) are measured on 100 testing samples. Those scores are presented in mean ± std and worst-case (in parentheses) format. Lower scores indicate harder to invert.

The work ofLei et al. (2019) is most relevant to us. They consider the neural network inversion problem in generative models and prove the exact inversion problem is NP-complete. The other model is collaborative inference. In such a distributed system setting, the neural network can be divided into two parts. The first few layers of the network are stored in the local edge device, while the rest are offloaded to a remote cloud server. Given an input, the edge device calculates the output of the first few layers and sends it to the cloud. Then cloud perform the rest of computation and sends the final results to each edge device Eshratifar et al. (2019); Hauswald et al. (2014); Kang et al. (2017); Teerapittayanon et al. (2017). In our work, we focus on tackling data recovery problem under collaborative inference mode.

Quantitative evaluations for image recovery results. For fair evaluation, we match the data utility (accuracy) for Vanilla and MixCon. SSIM and PSIM are measured on 100 testing samples. Those scores are presented in mean ± std and worst-case (in parentheses) format. The smaller scores indicate harder data recovery.

