MEMBERSHIP ATTACKS ON CONDITIONAL GENERATIVE MODELS USING IMAGE DIFFICULTY

Abstract

Membership inference attacks (MIA) try to detect if data samples were used to train a neural network model. As training data is very valuable in machine learning, MIA can be used to detect the use of unauthorized data. Unlike the traditional MIA approaches, addressing classification models, we address conditional image generation models (e.g. image translation). Due to overfitting, reconstruction errors are typically lower for images used in training. A simple but effective approach for membership attacks can therefore use the reconstruction error. However, we observe that some images are "universally" easy, and others are difficult. Reconstruction error alone is less effective at discriminating between difficult images used in training and easy images that were never seen before. To overcome this, we propose to use a novel difficulty score that can be computed for each image, and its computation does not require a training set. Our membership error, obtained by subtracting the difficulty score from the reconstruction error, is shown to achieve high MIA accuracy on an extensive number of benchmarks.

1. INTRODUCTION

Deep neural networks have been widely adopted in various computer vision tasks, e.g. image classification, semantic segmentation, image translation and generation etc. The high sample-complexity of such models requires large amounts of training data. However, obtaining many training images might not be an easy task. In fact, collection and annotation is often an expensive and labor intensive process. In some domains, such as medical imaging, publicly available training data are particularly scarce due to privacy concerns. In such settings, it is common to grant access to private sensitive data for training purposes alone, while ensuring not to reveal the data in the inference stage. A common solution is training the model privately and then providing black-box access to the trained model. However, even black-box access may leak sensitive information about the training data. Membership inference attacks (MIA) are one way to detect such leakage. Given access to a data sample, an attacker attempts to find whether or not the sample was used in the training process. MIA attacks have been widely studied for image classification models, achieving high success rates (Shokri et al., 2017; Salem et al., 2018; Sablayrolles et al., 2019; Yeom et al., 2018; Li & Zhang, 2020; Choo et al., 2020) . Due to overfitting in deep neural networks, prediction confidence tends to be higher for images used in training. This difference in prediction confidence helps MIA methods to successfully determine which images were used for training. Therefore, in addition to detecting information leakage, MIA also provide insights on the degree of overfitting in the victim model. We address MIA for a new domain -conditional image generation models, e.g. image translation. While classification models give a probability vector over possible classes, generation models give a single color for every pixel. We propose a MIA that uses pixel-wise reconstruction error, as overfitting causes lower reconstruction error on images used for training. But we observe that some images are "universally" easy, and others are universally difficult. Reconstruction error alone is therefore less accurate at discriminating between difficult images used in training and previously unseen easy images. To overcome this limitation, we add a novel image difficulty score which is computed for each query image. Our image difficulty score uses the accuracy of a linear predictor computed over a given image, predicting pixel values from deep features of that image. The reconstruction error together with the difficulty score helps to discriminate between two factors of variation in the reconstruction error, namely (i) The "intrinsic" difficulty of the conditional generation task for each image, 1

