PRIVATE SPLIT INFERENCE OF DEEP NETWORKS

Abstract

Splitting network computations between the edge device and the cloud server is a promising approach for enabling low edge-compute and private inference of neural networks. Current methods for providing the privacy train the model to minimize information leakage for a given set of private attributes. In practice, however, the test queries might contain private attributes that are not foreseen during training. We propose an alternative solution, in which, instead of obfuscating the information corresponding to a set of attributes, the edge device discards the information irrelevant to the main task. To this end, the edge device runs the model up to a split layer determined based on its computational capacity and then removes the activation content that is in the null space of the next layer of the model before sending it to the server. It can further remove the low-energy components of the remaining signal to improve the privacy at the cost of reducing the accuracy. The experimental results show that our methods provide privacy while maintaining the accuracy and introducing only a small computational overhead.

1. INTRODUCTION

The surge in cloud computing and machine learning in recent years has led to the emergence of Machine Learning as a Service (MLaaS), where the compute capacity of the cloud is used to analyze the data that lives on edge devices. One shortcoming of the MLaaS framework is the leakage of the clients' private data to the cloud server. To address this problem, several cryptographybased solutions have been proposed which provide provable security at the cost of increasing the communication cost and delay of remote inference by orders of magnitude (Juvekar et al. (2018); Riazi et al. (2019) ). The cryptography-based solutions are applicable in use-cases such as healthcare where a few minutes of delay is tolerable, but not in scenarios where millions of clients request fast and low-cost responses such as in Amazon Alexa or Apple Siri applications. A light-weight alternative to cryptographic solutions is to manually hide private information on the edge device; For instance, sensitive information in an image can be blurred on the edge device before sending it to the service provider (Vishwamitra et al. (2017) ). This approach, however, is task-specific and may not be viable for generic applications. The objective of split inference framework, shown in Figure 1 , is to provide a generic and computationally efficient data obfuscation scheme (Kang et al. (2017) ; Chi et al. ( 2018)). The service provider trains the model and splits it into two sub-models, M 1 and M 2 , where M 1 contains the first few layers of the model and M 2 contains the rest. The client runs M 1 on the edge device and sends the resulting feature vector z = M 1 (x) to the server, which computes the public label as y pub = M 2 (z). To preserve the privacy, the client desires z to only contain information related to the underlying task. For instance, when sending facial features for cell-phone authentication, the client does not want to disclose other information such as their mood. As seen in Figure 1 , the privacy leakage is quantified by an adversary that trains the model M 3 to extract private label y pri from feature vector z. Current methods of private split inference aim to censor the information corresponding to a list of known private attributes. For example, Feutry et al. (2018) utilize adversarial training to minimize the accuracy of M 3 on the private attribute, and Osia et al. ( 2018) minimize the mutual information between the query z and the private label y pri at training time. The set of private attributes, however, can vary from one query to another. Hence, it is not feasible to foresee all types of attributes that could be considered private for a specific MLaaS application. Moreover, the need to annotate inputs with all possible private attributes significantly increases the cost of model training.

