INFORMATION LAUNDERING FOR MODEL PRIVACY

Abstract

In this work, we propose information laundering, a novel framework for enhancing model privacy. Unlike data privacy that concerns the protection of raw data information, model privacy aims to protect an already-learned model that is to be deployed for public use. The private model can be obtained from general learning methods, and its deployment means that it will return a deterministic or random response for a given input query. An information-laundered model consists of probabilistic components that deliberately maneuver the intended input and output for queries of the model, so the model's adversarial acquisition is less likely. Under the proposed framework, we develop an information-theoretic principle to quantify the fundamental tradeoffs between model utility and privacy leakage, and derive the optimal design.

1. INTRODUCTION

An emerging number of applications involve the following user-scenario. Alice developed a model that takes a specific query as input and calculates a response as output. The model is a stochastic black-box that may represent a novel type of ensemble models, a known deep neural network architecture with sophisticated parameter tuning, or a physical law described by stochastic differential equations. Bob is a user that sends a query to Alice and obtains the corresponding response for his specific purposes, whether benign or adversarial. Examples of the above scenario include many recent Machine-Learning-as-a-Service (MLaaS) services (Alabbadi, 2011; Ribeiro et al., 2015; Xian et al., 2020) and artificial intelligence chips, where Alice represents a learning service provider, and Bob represents users. Suppose that Bob obtains sufficient paired input-output data as generated from Alice's black-box model, it is conceivable that Bob could treat it as supervised data and reconstruct Alice's model to some extent. From the view of Alice, her model may be treated as valuable and private. As Bob who queries the model may be benign or adversarial, Alice may intend to offer limited utility for the return of enhanced privacy. The above concern naturally motivates the following problem. (Q1) How to enhance the privacy for an already-learned model? Note that the above problem is not about data privacy, where the typical goal is to prevent adversarial inference of the data information during data transmission or model training. In contrast, model privacy concerns an alreadyestablished model. We propose to study a general approach to jointly maneuver the original query's input and output so that Bob finds it challenging to guess Alice's core model. As illustrated in Figure 1a , Alice's model is treated as a transition kernel (or communication channel) that produces Ỹ conditional on any given X. Compared with an honest service Alice would have provided (Figure 1b) , the input X is a maneuvered version of Bob's original input X; Moreover, Alice may choose Published as a conference paper at ICLR 2021 ! " ! # # $ Pseudo-response Response " Query # " Query Pseudo-query Response (a) (b) $ % $ * $ * $ ' Figure 1 : Illustration of (a) Alice's effective system for public use, and (b) Alice's idealistic system not for public use. In the figure, K * denotes the already-learned model/API, K 1 denotes the kernel that perturbs the input data query by potential adversaries, and K 2 denotes the kernel that perturbs the output response from K * to publish the final response Y . to return a perturbed outcome Y instead of Ỹ to Bob. Consequently, the apparent kernel from Bob's input query X to the output response Y is a cascade of three kernels, denoted by K in Figure 1a . The above perspective provides a natural and general framework to study model privacy. Admittedly, if Alice produces a (nearly) random response, adversaries will find it difficult to steal the model, while benign users will find it useless. Consequently, we raise another problem. (Q2) How to formulate the model privacy-utility tradeoff, and what is the optimal way of imposing privacy? To address this question, we formulate a model privacy framework from an informationtheoretic perspective, named information laundering. We briefly describe the idea below. The general goal is to jointly design the input and output kernels (K 1 and K 2 in Figure 1a ) that deliberately maneuver the intended input and output for queries of the model so that 1) the effective kernel (K in Figure 1a ) for Bob is not too far away from the original kernel (K * in Figure 1a ), and 2) adversarial acquisition of the model becomes difficult. Alternatively, Alice 'launders' the input-output information maximally given a fixed utility loss. To find the optimal way of information laundering, we propose an objective function that involves two components: the first being the information shared between X, X and between Ỹ , Y , and the second being the average Kullback-Leibler (KL) divergence between the conditional distribution describing K and K * . Intuitively, the first component controls the difficulty of guessing K * sandwiched between two artificial kernels K 1 and K 2 , while the second component ensures that overall utility is maximized under the same privacy constraints. By optimizing the objective for varying weights between the components, we can quantify the fundamental tradeoffs between model utility and privacy.

1.1. RELATED WORK

We introduce some closely related literature below. Section 3.3 will incorporate more technical discussions on some related but different frameworks, including information bottleneck, local data privacy, information privacy, and adversarial model attack. A closely related subject of study is data privacy, which has received extensive attention in recent years due to societal concerns (Voigt & Von dem Bussche, 2017; Evans et al., 2015; Cross & Cavallaro, 2020; Google, 2019; Facebook, 2020) . Data privacy concerns the protection of (usually personal) data information from different perspectives, including lossless cryptography (Yao, 1982; Chaum et al., 1988) , randomized data collection (Evfimievski et al., 2003; Kasiviswanathan et al., 2011; Ding & Ding, 2020) , statistical database query (Dwork & Nissim, 2004; Dwork, 2011 ), membership inference (Shokri et al., 2017 ), and Federated learning (Shokri & Shmatikov, 2015; Konevcny et al., 2016; McMahan et al., 2017; Yang et al., 2019; Diao et al., 2020) . A common goal in data privacy is to obfuscate individual-level data values while still enabling population-wide learning. In contrast, the subject of model privacy focuses on protecting a single learned model ready to deploy. For example, we want to privatize a classifier to deploy on the cloud for public use, whether the model is previously trained from raw image data or a data-private procedure. Another closely related subject is the model extraction in (Tramèr et al., 2016; Papernot et al., 2016b) , where Bob's goal is to reconstruct Alice's model from several queries' inputs and outputs, knowing what specific model Alice uses. For example, suppose that Alice's model is a generalized linear regression with p features. In that case, it is likely to be reconstructed using p queries of the expected mean (a known function of Xβ) by solving equations (Tramèr et al., 2016) . In the supervised classification scenario, when only labels are returned to any given input, model extraction could be

