EFFECTIVE PASSIVE MEMBERSHIP INFERENCE AT-TACKS IN FEDERATED LEARNING AGAINST OVERPA-RAMETERIZED MODELS

Abstract

This work considers the challenge of performing membership inference attacks in a federated learning setting -for image classification-where an adversary can only observe the communication between the central node and a single client (a passive white-box attack). Passive attacks are one of the hardest-to-detect attacks, since they can be performed without modifying how the behavior of the central server or its clients, and assumes no access to private data instances. The key insight of our method is empirically observing that, near parameters that generalize well in test, the gradient of large overparameterized neural network models statistically behave like high-dimensional independent isotropic random vectors. Using this insight, we devise two attacks that are often little impacted by existing and proposed defenses. Finally, we validated the hypothesis that our attack depends on the overparametrization by showing that increasing the level of overparametrization (without changing the neural network architecture) positively correlates with our attack effectiveness.

1. INTRODUCTION

Our work considers the challenge of performing membership-inference (MI) attacks -for image classification-in a federated learning setting, where an adversary can only observe the communication between the central node and a single client (a passive white-box attack). We will also consider, in passing, other attack modalities (e.g., active white box attacks) but our focus will be on passive attacks. Passive attacks are one of the hardest-to-detect attacks, since they can be performed without modifying the behavior of the central server or its clients, and assumes no access to private data instances. Our results consider multiple applications, but pay special attention to medical image diagnostics, which is one of the most compelling and most sensitive applications of federated learning. Federated learning is designed to train machine-learning models based on private local datasets that are distributed across multiple clients while preventing data leakage, which is key to the development of machine learning models in medical imaging diagnostics (Sheller et al., 2020) and other industrial settings where each client (e.g., hospital, company) is unwilling (unable) to share data with other clients (e.g., other hospitals, companies) due to confidentiality laws or concerns, or in fear of leaking trade secrets. Federated learning differs from distributed data training in that the data may be heterogeneous (i.e., each client data is sampled from different training distributions) and client's data must remain private (Yang et al., 2019) . From the attacker's perspective, the hardest membership attack setting is one where the client's data is sampled from the training distribution (the i.i.d. case), since in this scenario there is nothing special about the data distribution of any specific client that an attacker could use as leverage. Our work focuses on the this i.i.d. case. As far as we know there are no known passive white-box membership inference attack specifically designed to work in a scenario without private data access in federated learning (the closest works (Nasr et al., 2019; Zari et al., 2021) are actually very different because they assume access to private data). Unfortunately, we show that for large deep learning models (overparameterized models), there is a membership inference attack that leverages the foundation of how we train these large models. Our proposed attack works, for instance, on systems secured by additive homomorphic encryption (Aono et al., 2017) , where the central server knows nothing about each client's gradients. The attack just needs access to the central model parameters (decoded by clients) at each communication round. Interestingly, our attack relies on statistical properties of overparameterized models (calibrated via conformal predictors) and will not work as well in small neural network models. We detail our attack next. Attack insight: Large overparameterized models perform surprisingly well on test data even though they can overfit the training data, creating learning phenomena such as double descent (Nakkiran et al., 2021) . In machine learning, this overfitting needs to be kept in check to avoid black-box membership attacks (Li et al., 2020; Yeom et al., 2018) , but there are simple regularization solutions to address this challenge (e.g., (Li et al., 2020; Nasr et al., 2018) ). Overparameterized models are hypothesized to perform well in test because the learning finds wide flat minima (in a robust definition of wide (e.g., (Neyshabur et al., 2018) ), which is non-trivial (Dinh et al., 2017) ), often attributed to gradient noise in gradient descent and overparameterization (Baldassi et al., 2016; Keskar et al., 2016; Martin & Mahoney, 2021; Neyshabur et al., 2014; Zhang et al., 2021 ). Let's take a closer look at the optimization of overparameterized models with d ≫ 1 parameters. In our setting, an overparameterized model has significantly more parameters than training instances. Through an extensive empirical evaluation (described in Section 4) shows that, at later gradient update rounds t ≫ 1 of the optimization (in our experiments t > 50 if trained from scratch and t ≥ 2 if fine-tuned) of medium to large neural networks -and at nearly any stage of the fine tuning of large pre-trained models-gradient vectors of different training instances are orthogonal in the same way distinct samples of independent isotropic random vectors are orthogonal (such as two high-dimensional Gaussian random vectors with zero mean and diagonal covariance matrix (isotropic)). More precisely, the vectors in V = { ∇ yi,xi,t } n i=1 become nearly orthogonal as d ≫ 1, where ∇ y,x,t = ∇ W (t) L(y, x; W (t) )/∥∇ W (t) L(y, x; W (t) )∥ 2 with loss L, model parameters W (t) ∈ R d , and training data D train = {(y i , x i )} n i=1 . That is, for large models (d ≫ 1) and for large enough t, the normalized gradients are approximately orthogonal ⟨ ∇ yi,xi,t , ∇ yj ,xj ,t ⟩ ≈ 0, i ̸ = j (see Figure 1(a-c )), where ⟨•, •⟩ is the inner product. But for small d, the normalized gradients no longer can be relied to be approximately orthogonal (as we will infer in Figure 3 ). These results match the property of independent d-dimensional isotropic random vectors, whose independent samples become increasingly orthogonal at a rate ⟨ ∇ i,t , ∇ i,t ⟩ ∈ O(1/ √ d) (Vershynin, 2018, Lemma 3.2.4). Hence, given a set of (largely unknown) orthogonal vectors O and a private subset O ′ ⊂ O, by the distributive property of inner products we have ∀u ∈ O, ⟨s O ′ , u⟩ > 0 if and only if u ∈ O ′ , where s O ′ = v∈O ′ v. This means that for a large model trained with a gradient update W (t) = W (t-1) -ηΓ (t) performed at the central server, where Γ (t) is the sum (or average) of all client gradients at communication round t, as long as we have access to the previous model W (t-1) and the new model W (t) , we obtain ηΓ (t) = W (t-1) -W (t) from which we can test whether an instance x belongs to the private data by asking whether ⟨ηΓ (t) , ∇ y,x,t-1 ⟩ > 0. Later in the paper we show a different type of attack (using subspaces and the L 2 norm) that does not need inner products. To summarize our findings: The almost orthogonality of gradients of independent training examples in overparameterized models means gradient sums cannot hide a specific gradient. This attack is difficult to defend since it seeks to attack the foundation of how overparameterized models learn to generalize over the test data. In our experiments we have tried a host of defenses with only moderate success: (a) Adding isotropic Gaussian noise ϵ (the DP-SGD defense (Abadi et al., 2016) ) is innocuous since the noise is also orthogonal to the gradients (i.e., ⟨ϵ, ∇ yi,xi,t-1 ⟩ ≈ 0 for all (y i , x i ) ∈ D train ). (b) If the noise is biased (e.g., non-isotropic noise), it will bias the gradients and give poor model generalization. (c) Quantizing all client gradients (e.g., signSGD (Bernstein et al., 2018) ) makes the attack less effective but it is not enough to prevent the attack altogether (the quantized gradients are also nearly isotropic). (d) Borrowing ideas from meta-learning (MALM (Finn et al., 2017) more exactly), we are able to reduce the effectiveness of the attack at the expense of poorer generalization error. (e) Changing how the central model performs gradient updates relies on proving the attacker will never be able to find a vector proportional to ϵ + e(G) from the model updates, where ϵ is approximately isotropic noise, G is the gradient sum of all clients, and e(G) is a

