MARICH: A QUERY-EFFICIENT MAX-INFORMATION MODEL EXTRACTION ATTACK USING PUBLIC DATA Anonymous authors Paper under double-blind review

Abstract

In this paper, we study black-box model stealing attacks where the attacker is only able to query a machine learning model only through publicly available APIs. Specifically our aim is to design a black-box model stealing attack that uses a minimal number of queries to create an informative replica of the target model. First, we reduce this problem into an online variational optimisation problem. The attacker solves this problem to select the most informative queries that maximise the entropy of the selected queries and simultaneously reduce the mismatch between the target and the stolen models. We propose an online and adaptive algorithm, MARICH that leverages active learning to select the queries. We instantiate efficiency of our attack on different text and image data sets and different models including BERT and ResNet18. Marich is able to steal a model that can achieve 69-96% of true model's accuracy using 1,070 -6,950 samples from the attack datasets which are completely different from the training data sets. Our stolen models also achieve 85-95% accuracy of membership inference and also show 77-94% agreement of membership inference with direct membership inference on the target models. Our experiments validate that Marich is query efficient and also capable of creating an informative replica of the target model.

1. INTRODUCTION

In recent years, Machine Learning as a Service (MLaaS) are widely deployed and used in industries. In MLaaS (Ribeiro et al., 2015) , an ML model is trained remotely on a private dataset, deployed in a Cloud, and offered for public access through a prediction API, such as Amazon AWS, Google API, Microsoft Azure. This API allows an user, including a potential adversary, to send queries to the ML model and fetch corresponding predictions. Recent works have shown such models with public APIs can be stolen or extracted by designing black-box model extraction attacks (Tramèr et al., 2016) . In model extraction attacks, an adversary queries the target model with a query dataset, which might be same or different than the private dataset, collects the corresponding predictions from the target model, and builds a replica model of the target model. The goal is to construct a model which is nearly-equivalent to the target model over the input space of interest (Jagielski et al., 2020) . Often, ML models are proprietary, guarded by IP rights, and expensive to build. These models might be trained on datasets which are expensive to obtain (Yang et al., 2019) and consist of private data of individuals (Lowd & Meek, 2005) . Also, extracted models can be used to perform other privacy attacks on the private dataset used for training, such as membership inference (Nasr et al., 2019) . Thus, understanding susceptibility of models accessible through MLaaS presents an important conundrum. This motivates us to investigate black-box model extraction attacks while the adversary has no access to the private data or a perturbed version of it (Papernot et al., 2017) . Instead, the adversary uses a public dataset to query the target model (Orekondy et al., 2019; Pal et al., 2020) . Black-box model extraction attacks pose a tension between the number of queries sent to the target ML model and the accuracy of extracted model (Pal et al., 2020) . With more number of queries and predictions, adversary can build a better replica. But querying an API too much can be expensive, as each query incurs a monetary cost in MLaaS. Also, researchers have developed algorithms that can detect adversarial queries, when they are not well-crafted or sent to the API in large numbers (Juuti et al., 2019; Pal et al., 2021) . Thus, designing a query-efficient attack is paramount for practical deployment. Also, it indicates how more information can be leaked from a target model with less number of interactions. Our contributions. Our investigation yields three-fold contributions. 1. Formalism: Distribution Equivalence and Max-Information Extraction. Often, the ML models, specifically classifiers, are stochastic algorithms. They also include different elements of randomness during training. Thus, rather than focusing on equivalence of extracted and target models in terms of a fixed dataset or accuracy on that dataset (Jagielski et al., 2020), we propose a distributional notion of equivalence. We propose that if the joint distributions induced by a query generating distributions and corresponding prediction distributions due to the target and extracted models are same, they will be called distributionally equivalent Sec. 4). Another proposal is to reinforce the objective of the attack, i.e. to extract as much information as possible from the target model. This allows us to formulate the Max-Information attack, where the adversary aims to maximise the mutual information between the extracted and target models' distributions. We further show that both of these attacks can be performed by optimising a variational objective (Staines & Barber, 2012). 2. Algorithm: Adaptive Query Selection for Extraction with MARICH. We propose an algorithm, MARICH (Sec. 5), that optimises the objective of the variational optimisation problem (Equation 6). Given an extracted model, a target model, and previous queries, MARICH adaptively selects a batch of queries enforcing this objective. Then, it sends the queries to the target model, collect the predictions, and use them to further train the extracted model (Algorithm 1). In order to select the most informative set of queries, it deploys three sampling strategies sequentially. These three strategies select: a) the most informative set of queries, b) the most diverse set of queries in the first selection, and c) the set of queries in the first selection where the target and extracted models mismatch the most. Together these strategies allow MARICH to select a small subset of queries, which maximise the information leakage and align the extracted model with the target (Figure 1 ). 3. Experimental Analysis. We perform extensive experimental evaluation with both image and text datasets, and diverse model classes, such as logistic regression, ResNet18, and BERT (Sec. 6). Our experimental results validate that MARICH can extract accurate and informative replicas of the target models in comparison to random sampling. While MARICH uses a small number of queries (0.83 -6.15%) selected from publicly available query datasets, the models extracted by it lead to comparable accuracy with the target model while encountering a membership inference attack. This shows that MARICH can extract alarmingly informative models query-efficiently.

2. RELATED WORKS

Here, we elaborate the questions in the model extraction literature that we aim to mitigate. 



Taxonomy of Model Extraction. Black-box model extraction (or model stealing or model inference) attacks aim to replicate of a target ML model, commonly classifiers, deployed in a remote service and accessible through a public API(Tramèr et al., 2016). The replication is done in such a way that the extracted model achieves one of the three goals: a) accuracy close to that of the target model on the private training data used to train the target model, b) maximal agreement in predic-

Figure 1: A schematic for black-box model extraction attack with MARICH.In this paper, we investigate effective definitions of efficiency of model extraction and corresponding algorithm design for query-efficient black-box model extraction attack with public data, which is oblivious to deployed model and applicable for any datatype.

