EXPLORING CONNECTIONS BETWEEN MEMORIZATION AND MEMBERSHIP INFERENCE

Abstract

Membership inference (MI) allows adversaries to query trained machine learning models to infer if a particular data sample was used in training. Prior work has shown that the efficacy of MI is not the same for every sample in the training dataset; they broadly attribute this behavior to various data properties such as distributional difference. However, systematically analyzing the reasons for such disparate behavior has received little attention. In this work, we investigate the cause for such a discrepancy, and observe that the reason is more subtle and fundamental. We first provide empirical insight that an MI adversary is very successful with those samples that are highly likely to be memorized, irrespective of whether the sample is from the same or a different distribution. Next, we provide a game-based formulation which lower-bounds the advantage of an adversary with the ability to determine if a sample is memorized or not, under certain assumptions made about the efficacy of the model on the memorized samples. Finally, based on our theoretical results, we present a practical instantiation of a highly effective MI attack on memorized samples.

1. INTRODUCTION

Advances in machine learning (ML) are enabling a wide variety of new tasks that were previously deemed complex for computerized systems. These tasks are powered by models which are trained on large volumes of data, that often fall under the category of being sensitive or private, as it is collected from a variety of sources. For example, data used to customize (or fine-tune) large language models can often be sensitive (Carlini et al., 2021; Zanella-Béguelin et al., 2020) . Hence, understanding and explaining the privacy risks of the data used to train these models is an important problem that need to be solved before widespread adoption of these models. Several prior works (Shokri et al., 2017; Yeom et al., 2018) have successfully established that such models are susceptible to privacy attacks, such as membership inference (MI), that aim to infer if specific data-points are used during their training. Even more concerning is that they have shown that the efficacy of MI is not the same for every sample in the training dataset. Unfortunately, the problem of explaining this discrepancy has received less attention. Only recently, researchers have proposed techniques to measure the susceptibility of attack per sample (Carlini et al., 2022a; Ye et al., 2022) , and attributed the behaviour of disparate risks with coarse relation to distributional difference (Kulynych et al., 2019) . Consequently, out-of-distribution (OOD) samples which are part of the training dataset were deemed to be at a higher risk compared to other samples. In this work, we first systematically analyse the correctness of the above reasoning using representative techniques from OOD detection (Hendrycks & Gimpel, 2016; Liu et al., 2020) and MI literature (Carlini et al., 2022a) . Our empirical observations reveal that the relationship between OOD samples and higher MI risk is not straightforward ( § 3). The reasoning is more subtle and fundamental. We bridge the gap in our understanding of this problem and provide reasons for varying MI risk among training data-points. We demonstrate that MI and OOD samples are connected via the susceptibility of memorization of these samples Feldman & Zhang (2020); Brown et al. (2021) . That is, we show that an adversary is highly successful in predicting the membership of those samples that are likely to be memorized, irrespective of the distribution from which the data-point is sampled. Moreover, as shown in previous work by Feldman (2020) , and demonstrated in our evaluation, it is

