MIXCON: ADJUSTING THE SEPARABILITY OF DATA REPRESENTATIONS FOR HARDER DATA RECOVERY Anonymous

Abstract

To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks. Our method is motivated by the theoretical insights of data separability in neural networking training and results on the hardness of model inversion. Empirically, by adjusting the separability of data representation, we show that there exist sweet-spots for data separability such that it is difficult to recover data during inference while maintaining data utility.

1. INTRODUCTION

Over the past decade, deep neural networks have shown superior performances in various domains, such as visual recognition, natural language processing, robotics, and healthcare. However, recent studies have demonstrated that machine learning models are vulnerable in terms of leaking private data He et al. ( 2019 2019) has emerged as an attractive setting to mitigate privacy leakage without requiring clients to share raw data. In the case of an edge-cloud distributed learning scenario, most layers are commonly offloaded to the cloud, while the edge device computes only a small number of convolutional layers for feature extraction, due to power and resource constraints Kang et al. (2017) . For example, service provider trains and splits a neural network at a "cut layer," then deploys the rest of the layers to clients Vepakomma et al. (2018) . Clients encode their dataset using those layers, then send the data representations back to cloud server using the rest of layers for inference Teerapittayanon et al. ( 2017 In the above distributed learning setup, we investigate how to design a hard-to-invert data representation function (or hidden data representation function), which is defined as the output of the neural network's intermediate layer. We focus on defending data recovery during inference. The goal is to hide sensitive information and to protect data representations from being used to reconstruct the original data while ensuring that the resulted data representations are still informative enough for decision making. We use the model inversion attack that reconstructs individual data He et al. ( 2019); Zhang et al. (2020b) to evaluate defense performance and model accuracy to evaluate data utility. The core question here is how to achieve the goal, especially protecting individual data from being recovered. We propose data separability, also known as the minimum (relative) distance between (the representation of) two data points, as a new criterion to investigate and understand the trade-off between data utility and hardness of data recovery. Recent theoretical studies show that if data points are separable in the hidden embedding space of a DNN model, it is helpful for the model to achieve good classification accuracy Allen-Zhu et al. (2019b) . However, larger separability is also easier to recover inputs. Conversely, if the embeddings are non-separable or sometimes overlap with one another, it is challenging to recover inputs. Nevertheless, the model may not be able to learn to achieve good performance. Two main questions arise. First, is there an effective way to adjust the separability of data representations? Second, are there "sweet spots" that make the data representations difficult for inversion attacks while achieving good accuracy.



); Zhu et al. (2019); Zhang et al. (2020b). Hence, preventing private data from being recovered by malicious attackers has become an important research direction in deep learning research. Distributed machine learning Shokri & Shmatikov (2015); Kairouz et al. (

);Ko et al. (2018);Vepakomma et al. (2018). This gives an untrusted cloud provider or a malicious participant a chance to steal sensitive inference data from the output of "cut layer" on the edge device side, i.e. inverting data from their outputs Fredrikson et al. (2015); Zhang et al. (2020b); He et al. (2019).

