CANARY IN A COALMINE: BETTER MEMBERSHIP IN-FERENCE WITH ENSEMBLED ADVERSARIAL QUERIES

Abstract

As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings. Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine 

1. INTRODUCTION

In an increasingly data-driven world, legislators have begun developing a slew of regulations with the intention of protecting data ownership. The right-to-be-forgotten written into the strict GDPR law passed by the European Union has important implications for the operation of ML-as-a-service (MLaaS) providers (Wilka et al., 2017; Truong et al., 2021) . As one example, Veale et al. (2018) discuss that machine learning models could legally (in terms of the GDPR) fall into the category of "personal data", which equips all parties represented in the data with rights to restrict processing and to object to their inclusion. However, such rights are vacuous if enforcement agencies are unable to detect when they are violated. Membership inference algorithms are designed to determine whether a given data point was present in the training data of a model. Though membership inference is often presented as a breach of privacy in situations where belonging to a dataset is itself sensitive information (e.g. a model trained on a group of people with a rare disease), such methods can also be used as a legal tool against a non-compliant or malicious MLaaS provider. Because membership inference is a difficult task, the typical setting for existing work is generous to the attacker and assumes full white-box access to model weights. In the aforementioned legal scenario, this is not a realistic assumption. Organizations have an understandable interest in keeping their proprietary model weights secret and short of a legal search warrant, often only provide black-box querying to their clients (OpenAI, 2020) . Moreover, even if a regulatory agency forcibly obtained white-box access via an audit, for example, a malicious provider could adversarially spoof the reported weights to cover up any violations. In this paper, we achieve state-of-the-art performance for membership inference in the black-box setting by using a new adversarial approach. We observe that previous work (Shokri et al., 2017;  

