MARICH: A QUERY-EFFICIENT MAX-INFORMATION MODEL EXTRACTION ATTACK USING PUBLIC DATA Anonymous authors Paper under double-blind review

Abstract

In this paper, we study black-box model stealing attacks where the attacker is only able to query a machine learning model only through publicly available APIs. Specifically our aim is to design a black-box model stealing attack that uses a minimal number of queries to create an informative replica of the target model. First, we reduce this problem into an online variational optimisation problem. The attacker solves this problem to select the most informative queries that maximise the entropy of the selected queries and simultaneously reduce the mismatch between the target and the stolen models. We propose an online and adaptive algorithm, MARICH that leverages active learning to select the queries. We instantiate efficiency of our attack on different text and image data sets and different models including BERT and ResNet18. Marich is able to steal a model that can achieve 69-96% of true model's accuracy using 1,070 -6,950 samples from the attack datasets which are completely different from the training data sets. Our stolen models also achieve 85-95% accuracy of membership inference and also show 77-94% agreement of membership inference with direct membership inference on the target models. Our experiments validate that Marich is query efficient and also capable of creating an informative replica of the target model.

1. INTRODUCTION

In recent years, Machine Learning as a Service (MLaaS) are widely deployed and used in industries. In MLaaS (Ribeiro et al., 2015) , an ML model is trained remotely on a private dataset, deployed in a Cloud, and offered for public access through a prediction API, such as Amazon AWS, Google API, Microsoft Azure. This API allows an user, including a potential adversary, to send queries to the ML model and fetch corresponding predictions. Recent works have shown such models with public APIs can be stolen or extracted by designing black-box model extraction attacks (Tramèr et al., 2016) . In model extraction attacks, an adversary queries the target model with a query dataset, which might be same or different than the private dataset, collects the corresponding predictions from the target model, and builds a replica model of the target model. The goal is to construct a model which is nearly-equivalent to the target model over the input space of interest (Jagielski et al., 2020) . Often, ML models are proprietary, guarded by IP rights, and expensive to build. These models might be trained on datasets which are expensive to obtain (Yang et al., 2019) and consist of private data of individuals (Lowd & Meek, 2005) . Also, extracted models can be used to perform other privacy attacks on the private dataset used for training, such as membership inference (Nasr et al., 2019) . Thus, understanding susceptibility of models accessible through MLaaS presents an important conundrum. This motivates us to investigate black-box model extraction attacks while the adversary has no access to the private data or a perturbed version of it (Papernot et al., 2017) . Instead, the adversary uses a public dataset to query the target model (Orekondy et al., 2019; Pal et al., 2020) . Black-box model extraction attacks pose a tension between the number of queries sent to the target ML model and the accuracy of extracted model (Pal et al., 2020) . With more number of queries and predictions, adversary can build a better replica. But querying an API too much can be expensive, as each query incurs a monetary cost in MLaaS. Also, researchers have developed algorithms that can detect adversarial queries, when they are not well-crafted or sent to the API in large numbers (Juuti et al., 2019; Pal et al., 2021) . Thus, designing a query-efficient attack is paramount for practical deployment. Also, it indicates how more information can be leaked from a target model with less number of interactions.

