LEARNING TO INVERT: SIMPLE ADAPTIVE ATTACKS FOR GRADIENT INVERSION IN FEDERATED LEARNING

Abstract

Gradient inversion attack enables recovery of training samples from model updates in federated learning (FL) and constitutes a serious threat to data privacy. To mitigate this vulnerability, prior work proposed both principled defenses based on differential privacy, as well as heuristic defenses based on gradient compression as countermeasures. These defenses have so far been very effective, in particular those based on gradient compression that allow the model to maintain high accuracy while greatly reducing the attack's effectiveness. In this work, we argue that such findings do not accurately reflect the privacy risk in FL, and show that existing defenses can be broken by a simple adaptive attack that trains a model using auxiliary data to learn how to invert gradients on both vision and language tasks.

1. INTRODUCTION

Federated learning (FL; (McMahan et al., 2017) ) is a popular framework for distributed model training on sensitive user data. Instead of centrally storing the training data, FL operates in a serverclient setting where the server hosts the model and has no direct access to the data. The clients can apply the model on their private data and send gradient updates back to the server. This learning regime promises data privacy as users only share gradients but never any raw data. However, recent work (Zhu et al., 2019; Zhao et al., 2020; Geiping et al., 2020) showed that despite these efforts, the server can still recover the training data from gradient updates, violating the promise of data privacy in FL. These so-called gradient inversion attacks operate by optimizing over the input space to find training samples whose gradient matches that of the observed gradient, and such attacks remain effective even when clients utilize secure aggregation (Bonawitz et al., 2016) to avoid revealing individual updates (Yin et al., 2021; Jeon et al., 2021) . As countermeasures against these gradient inversion attacks, prior work proposed both principled defenses based on differential privacy (Abadi et al., 2016) , as well as heuristics that compress the gradient update through gradient pruning (Aji & Heafield, 2017) or sign compression (Bernstein et al., 2018) . In particular, gradient compression defenses have so far enjoyed great success, severely hindering the effectiveness of existing optimization-based attacks (Zhu et al., 2019; Jeon et al., 2021) while maintaining close to the same level of accuracy for the trained model. As a result, these limitations seemingly diminish the threat of gradient inversion in practical FL applications. In this paper we argue that evaluating defenses on existing optimization-based attacks may provide a false sense of security. To this end, we propose a simple learning-based attack-which we call Learning To Invert (LTI)-that trains a model to learn how to invert the gradient update to recover client samples; see Figure 1 for an illustration. We assume that the adversary (i.e., the server) has access to an auxiliary dataset whose distribution is similar to that of the private data, and use it to generate training samples for the gradient inversion model by querying the global model for gradients. Our attack is highly adaptable to different defenses since applying a defense simply amounts to training data augmentation for the gradient inversion model. We empirically demonstrate that LTI can successfully circumvent defenses based on gradient perturbation (i.e., using differential privacy; (Abadi et al., 2016) ), gradient pruning (Aji & Heafield, 2017) and sign compression (Bernstein et al., 2018) on both vision and language tasks. et al., 2019; Geiping et al., 2020; Yin et al., 2021; Jeon et al., 2021) ) directly optimize (x, ỹ) in search for a sample that produces gradient similar to that of (x, y). Our proposed learning-based approach, which we call Learning to Invert, instead trains an inversion model g θ to reconstruct training samples from their gradient. Federated Learning Client 1 Client 2 Model 𝑓 ! Gradient ∇ ! 𝑙(𝑓 ! 𝒙 " , 𝑦 " ) Model 𝑓 ! Gradient ∇ ! 𝑙(𝑓 ! 𝒙 # , 𝑦 # ) ) 𝒙 " ) 𝒙 #

Gradient

• Vision: We evaluate on the CIFAR10 (Krizhevsky et al., 2009) classification dataset. LTI attains recovery accuracy close to that of the best optimization-based method when no defense is applied, and significantly outperforms all prior attacks under defense. • NLP: We experiment with causal language model training on the WikiText (Merity et al., 2016) dataset, where LTI attains state-of-the-art performance in all settings, with or without defense. Given the strong empirical performance of LTI and its adaptability to different learning tasks and defense mechanisms, we advocate for its use as a simple baseline for future studies on gradient inversion attacks in FL. (x,y)∈Dtrain ℓ(f w (x), y). In centralized learning this is typically done by computing a stochastic gradient 1 B B i=1 ∇ w ℓ(f w (x i ), y i ) over a randomly drawn batch of data (x 1 , y 1 ), . . . , (x B , y B ) and minimizing ℓ using gradient descent.

2. BACKGROUND

In FL, instead of centrally collecting D train to draw a random batch during training, the training set D train is distributed across multiple clients and the model f w is stored on a central server. At each iteration, the model parameter w is transmitted to each client to compute the per-sample gradients {∇ w ℓ(f w (x i ), y i )} B i=1 locally over a set of clients. The server and clients then execute a federated aggregation protocol to compute the average gradient for the gradient descent update. A major advantage of FL is data privacy since clients do not need to disclose their data explicitly, but rather only send their gradient ∇ w ℓ(f w (x i ), y i ) to the server. Techniques such as secure aggregation (Bonawitz et al., 2016) and differential privacy (Dwork et al., 2006; 2014) can further reduce the privacy leakage from sending this gradient update.



Figure1: Illustration of federated learning (FL) and gradient inversion methods. The goal of gradient inversion is to recover training data (x, y) from the observed gradient ∇ w ℓ(f w (x), y). Optimizationbased methods (e.g.,(Zhu et al., 2019; Geiping et al., 2020; Yin et al., 2021; Jeon et al., 2021)) directly optimize (x, ỹ) in search for a sample that produces gradient similar to that of (x, y). Our proposed learning-based approach, which we call Learning to Invert, instead trains an inversion model g θ to reconstruct training samples from their gradient.

The objective of federated learning(McMahan et al., 2017)  is to train a machine learning model in a distributed fashion without centralized collection of training data. In detail, let f w be the global model parameterized by w, and consider a supervised learning setting that optimizes w by minimizing a loss function ℓ over the training set D train :

