MIXCON: ADJUSTING THE SEPARABILITY OF DATA REPRESENTATIONS FOR HARDER DATA RECOVERY Anonymous

Abstract

To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks. Our method is motivated by the theoretical insights of data separability in neural networking training and results on the hardness of model inversion. Empirically, by adjusting the separability of data representation, we show that there exist sweet-spots for data separability such that it is difficult to recover data during inference while maintaining data utility.

1. INTRODUCTION

Over the past decade, deep neural networks have shown superior performances in various domains, such as visual recognition, natural language processing, robotics, and healthcare. However, recent studies have demonstrated that machine learning models are vulnerable in terms of leaking private data He et al. (2019) ; Zhu et al. (2019) ; Zhang et al. (2020b) . Hence, preventing private data from being recovered by malicious attackers has become an important research direction in deep learning research. Distributed machine learning Shokri & Shmatikov (2015) ; Kairouz et al. (2019) has emerged as an attractive setting to mitigate privacy leakage without requiring clients to share raw data. In the case of an edge-cloud distributed learning scenario, most layers are commonly offloaded to the cloud, while the edge device computes only a small number of convolutional layers for feature extraction, due to power and resource constraints Kang et al. (2017) . For example, service provider trains and splits a neural network at a "cut layer," then deploys the rest of the layers to clients Vepakomma et al. (2018) . Clients encode their dataset using those layers, then send the data representations back to cloud server using the rest of layers for inference Teerapittayanon et al. (2017); Ko et al. (2018); Vepakomma et al. (2018) . This gives an untrusted cloud provider or a malicious participant a chance to steal sensitive inference data from the output of "cut layer" on the edge device side, i.e. inverting data from their outputs Fredrikson et al. ( 2015 In the above distributed learning setup, we investigate how to design a hard-to-invert data representation function (or hidden data representation function), which is defined as the output of the neural network's intermediate layer. We focus on defending data recovery during inference. The goal is to hide sensitive information and to protect data representations from being used to reconstruct the original data while ensuring that the resulted data representations are still informative enough for decision making. We use the model inversion attack that reconstructs individual data He et al. ( 2019); Zhang et al. (2020b) to evaluate defense performance and model accuracy to evaluate data utility. The core question here is how to achieve the goal, especially protecting individual data from being recovered. We propose data separability, also known as the minimum (relative) distance between (the representation of) two data points, as a new criterion to investigate and understand the trade-off between data utility and hardness of data recovery. Recent theoretical studies show that if data points are separable in the hidden embedding space of a DNN model, it is helpful for the model to achieve good classification accuracy Allen-Zhu et al. (2019b) . However, larger separability is also easier to recover inputs. Conversely, if the embeddings are non-separable or sometimes overlap with one another, it is challenging to recover inputs. Nevertheless, the model may not be able to learn to achieve good performance. Two main questions arise. First, is there an effective way to adjust the separability of data representations? Second, are there "sweet spots" that make the data representations difficult for inversion attacks while achieving good accuracy. This paper aims to answer these two questions by learning a feature extractor that can adjust the separability of data representations embedded by a few neural network layers. Specifically, we propose to add a self-supervised learning-based novel regularization term to the standard loss function during training. We conduct experiments on both synthetic and benchmark datasets to demonstrate that with specific parameters, such a learned neural network is indeed difficult to recover input data while maintaining data utility. Our contributions can be summarized as: • To the best of our knowledge, this is the first proposal to investigate the trade-off between data utility and data recoverability from the angle of data representation separability; • We propose a simple yet effective loss term, Consistency Loss -MixCon for adjusting data separability; • We provide the theoretical-guided insights of our method, including a new exponential lower bound on approximately solving the network inversion problem, based on the Exponential Time Hypothesis (ETH); and • We report experimental results comparing accuracy and data inversion results with/without incorporating MixCon. We show MixCon with suitable parameters makes data recovery difficult while preserving high data utility. The rest of the paper is organized as follow. We formalize our problem in Section 2. In Section 3, we present our theoretical insight and introduce the consistency loss. We demonstrate the experiment results in Section 4. We defer the technical proof and experiment details to Appendix.

2. PRELIMINARY

Distributed learning framework. We consider a distributed learning framework, in which users and servers collaboratively perform inferences Teerapittayanon et al. ( 2017 2018) where users encode their data using our proposed mechanism to extract data representations at a cut layer of a trained DNN. Servers take encoded data representations as inputs and compute outputs using the layers after the cut layer in the distributed learning setting; 3) DNN used in the distributed learning setting can be regularized by our loss function (defined later). Threat model. We consider the attack model with access to the shared hidden data representations during the client-cloud communication process. The attacker aims to recover user data (i.e., pixelwise recovery for images in vision task). To quantify the upper bound of privacy leakage under this threat model, we allow the attacker to have more power in our evaluation. In addition to having access to extracted features, we allow the attacker to see all network parameters of the trained model. Our overall objectives are: • Learn the feature representation mechanism (i.e. h function) that safeguards information from unsolicited disclosure. • Jointly learn the classification function g, and the feature extraction function h to ensure the information extracted is useful for high-performance downstream tasks.

3. CONSISTENCY LOSS FOR ADJUSTING DATA SEPARABILITY

To address the issue of data recovery from hidden layer output, we propose a novel consistency loss in neural network training, as shown in Figure 1 . Consistency loss is applied to the feature extractor



); Zhang et al. (2020b); He et al. (2019).

);Ko et al. (2018);Kang  et al. (2017). We have the following assumptions: 1) Datasets are stored at the user sides. During inference, no raw data are ever shared among users and servers; 2) Users and servers use a split model Vepakomma et al. (

Problem formulation. Formally, let h : R d → R m denote the local feature extractor function, which maps an input data x ∈ R d to its feature representation h(x) ∈ R m . The local feature extractor is a shallow neural network in our setting. The deep neural network on the server side is denoted as g : R m → R C , which performs classification tasks and maps the feature representation to one of C target classes. The overall neural network f : R d → R C , and it can be written as f = g • h.

