LEARNING TO INVERT: SIMPLE ADAPTIVE ATTACKS FOR GRADIENT INVERSION IN FEDERATED LEARNING

Abstract

Gradient inversion attack enables recovery of training samples from model updates in federated learning (FL) and constitutes a serious threat to data privacy. To mitigate this vulnerability, prior work proposed both principled defenses based on differential privacy, as well as heuristic defenses based on gradient compression as countermeasures. These defenses have so far been very effective, in particular those based on gradient compression that allow the model to maintain high accuracy while greatly reducing the attack's effectiveness. In this work, we argue that such findings do not accurately reflect the privacy risk in FL, and show that existing defenses can be broken by a simple adaptive attack that trains a model using auxiliary data to learn how to invert gradients on both vision and language tasks.

1. INTRODUCTION

Federated learning (FL; (McMahan et al., 2017) ) is a popular framework for distributed model training on sensitive user data. Instead of centrally storing the training data, FL operates in a serverclient setting where the server hosts the model and has no direct access to the data. The clients can apply the model on their private data and send gradient updates back to the server. This learning regime promises data privacy as users only share gradients but never any raw data. However, recent work (Zhu et al., 2019; Zhao et al., 2020; Geiping et al., 2020) showed that despite these efforts, the server can still recover the training data from gradient updates, violating the promise of data privacy in FL. These so-called gradient inversion attacks operate by optimizing over the input space to find training samples whose gradient matches that of the observed gradient, and such attacks remain effective even when clients utilize secure aggregation (Bonawitz et al., 2016) to avoid revealing individual updates (Yin et al., 2021; Jeon et al., 2021) . As countermeasures against these gradient inversion attacks, prior work proposed both principled defenses based on differential privacy (Abadi et al., 2016) , as well as heuristics that compress the gradient update through gradient pruning (Aji & Heafield, 2017) or sign compression (Bernstein et al., 2018) . In particular, gradient compression defenses have so far enjoyed great success, severely hindering the effectiveness of existing optimization-based attacks (Zhu et al., 2019; Jeon et al., 2021) while maintaining close to the same level of accuracy for the trained model. As a result, these limitations seemingly diminish the threat of gradient inversion in practical FL applications. In this paper we argue that evaluating defenses on existing optimization-based attacks may provide a false sense of security. To this end, we propose a simple learning-based attack-which we call Learning To Invert (LTI)-that trains a model to learn how to invert the gradient update to recover client samples; see Figure 1 for an illustration. We assume that the adversary (i.e., the server) has access to an auxiliary dataset whose distribution is similar to that of the private data, and use it to generate training samples for the gradient inversion model by querying the global model for gradients. Our attack is highly adaptable to different defenses since applying a defense simply amounts to training data augmentation for the gradient inversion model. We empirically demonstrate that LTI can successfully circumvent defenses based on gradient perturbation (i.e., using differential privacy; (Abadi et al., 2016) ), gradient pruning (Aji & Heafield, 2017) and sign compression (Bernstein et al., 2018) on both vision and language tasks. Figure 1 : Illustration of federated learning (FL) and gradient inversion methods. The goal of gradient inversion is to recover training data (x, y) from the observed gradient ∇ w ℓ(f w (x), y). Optimizationbased methods (e.g., (Zhu et al., 2019; Geiping et al., 2020; Yin et al., 2021; Jeon et al., 2021) ) directly optimize (x, ỹ) in search for a sample that produces gradient similar to that of (x, y). Our proposed learning-based approach, which we call Learning to Invert, instead trains an inversion model g θ to reconstruct training samples from their gradient. • Vision: We evaluate on the CIFAR10 (Krizhevsky et al., 2009) classification dataset. LTI attains recovery accuracy close to that of the best optimization-based method when no defense is applied, and significantly outperforms all prior attacks under defense. • NLP: We experiment with causal language model training on the WikiText (Merity et al., 2016) dataset, where LTI attains state-of-the-art performance in all settings, with or without defense. Given the strong empirical performance of LTI and its adaptability to different learning tasks and defense mechanisms, we advocate for its use as a simple baseline for future studies on gradient inversion attacks in FL. (x,y)∈Dtrain ℓ(f w (x), y). In centralized learning this is typically done by computing a stochastic gradient 1 B B i=1 ∇ w ℓ(f w (x i ), y i ) over a randomly drawn batch of data (x 1 , y 1 ), . . . , (x B , y B ) and minimizing ℓ using gradient descent.

2. BACKGROUND

In FL, instead of centrally collecting D train to draw a random batch during training, the training set D train is distributed across multiple clients and the model f w is stored on a central server. At each iteration, the model parameter w is transmitted to each client to compute the per-sample gradients {∇ w ℓ(f w (x i ), y i )} B i=1 locally over a set of clients. The server and clients then execute a federated aggregation protocol to compute the average gradient for the gradient descent update. A major advantage of FL is data privacy since clients do not need to disclose their data explicitly, but rather only send their gradient ∇ w ℓ(f w (x i ), y i ) to the server. Techniques such as secure aggregation (Bonawitz et al., 2016) and differential privacy (Dwork et al., 2006; 2014) can further reduce the privacy leakage from sending this gradient update.

Gradient inversion attack.

Despite the promise of data privacy in FL, recent work showed that the heuristic of sending gradient updates instead of training samples themselves in fact provides a false sense of security. Zhu et al. (2019) showed in their seminal paper that it is possible for the server to recover the full batch of training samples given aggregated gradients. These optimizationbased gradient inversion attacks operate by optimizing a set of dummy data x1 , . . . , xB and labels ỹ1 , . . . , ỹB to match their gradient to the observed gradient: min x B i=1 ∇ w ℓ(f w (x i ), ỹi ) - B i=1 ∇ w ℓ(f w (x i ), y i ) 2 2 . (1) For image tasks, since Equation 1 is differentiable in xi and ỹi and the model parameter w is known to the server, the server can optimize Equation 1 using gradient-based search. Doing so yields recovered samples (x i , ỹi ) that closely resemble actual samples (x i , y i ) in the batch. In practice this approach is highly effective, and follow-up works proposed several optimizations to further improve its recovery accuracy (Geiping et al., 2020; Yin et al., 2021; Jeon et al., 2021) . For language tasks this optimization problem is considerably more complex since the samples x 1 , . . . , x B are sequences of discrete tokens, and optimizing Equation 1 amounts to solving a discrete optimization problem. To circumvent this difficulty, Zhu et al. (2019) and Deng et al. (2021) instead optimize the token embeddings to match the observed gradient and then maps the recovered embeddings to their closest tokens in the embedding layer to recover the private text. In contrast, Gupta et al. (2022) leveraged the insight that gradient of the token embedding layer can be used to recover exactly the set of tokens present in the training sample, and then uses beam search to optimize the ordering of tokens for fluency to recover the private text. Gradient inversion under the malicious server setting. The aforementioned gradient inversion attacks operate under the honest-but-curious setting where the server faithfully executes the federated learning protocol, but attempts to extract private information from the observed gradients. Fowl et al. (2021) , Boenisch et al. (2021) and Fowl et al. (2022) consider a stronger malicious server threat model that allows the server to transmit arbitrary model parameters w to the clients. Under this threat model, it is possible to carefully craft the model parameters so that the training sample can be recovered exactly from its gradient even when the batch size B is large. While this setting is certainly realistic and relevant, our paper operates under the weaker honest-but-curious threat model.

3. LEARNING TO INVERT: LEARNING-BASED GRADIENT INVERSION ATTACKS

Motivation. The threat of gradient inversion attack has prompted prior work to employ defense mechanisms to mitigate this privacy risk in FL (Zhu et al., 2019; Jeon et al., 2021) . Intuitively, such defenses reduce the amount of information contained in the gradient about the training sample by either perturbing the gradient with noise (Abadi et al., 2016) or compressing them (Aji & Heafield, 2017; Bernstein et al., 2018) , making recovery much more difficult. However, doing so also reduces the amount of information a sample can provide for training the global model, and hence has a negative impact on the model's performance. This is certainly true for principled defenses based on differential privacy (Dwork et al., 2006) such as gradient perturbation (Abadi et al., 2016) , however, defenses based on gradient compression seemingly provide a much better privacy-utility trade-off, effectively preventing the attack with minor reduction in model performance (Zhu et al., 2019) . The empirical success of existing defenses seemingly diminish the threat of gradient inversion in FL, especially since gradient compression (Aji & Heafield, 2017; Bernstein et al., 2018) is already commonplace in practical FL applications to reduce communication cost. However, we argue that optimization-based attacks underestimate the power of the adversary: If the adversary has access to an auxiliary dataset D aux , they can train a gradient inversion model to recover D aux from its gradients computed on the global model. As we will establish later, this greatly empowers the adversary, exposing existing risks to federate learning. Threat model. We consider the setting where the adversary is an honest-but-curious server, who executes the learning protocol faithfully but aims to extract private training data from the observed gradients. We also assume that the FL protocol does not leverage secure aggregation, so per-client gradients are revealed to the server. Under these assumptions, in each FL iteration the adversary has the knowledge of model weights w and the gradients ∇ w ℓ(f w (x), y) for each sample (x, y) in the batch. Moreover, we assume the adversary has an auxiliary dataset D aux , which could be in-distribution or a mixture of in-distribution and out-of-distribution data. This assumption is similar to the setting in Jeon et al. (2021) , which assumes a generative model that is trained from the in-distribution data, and is common in the study of other privacy attacks such as membership inference (Shokri et al., 2017) . Learning to invert (LTI). Since the adversary has knowledge of the model weights, he/she is able to generate the gradient ∇ w ℓ(f w (x aux ), y aux ) for each sample (x aux , y aux ) in the auxiliary dataset. This allows the adversary to learn a gradient inversion model g θ , parameterized by θ, to predict the data point (x aux , y aux ) from the gradient of the global model ∇ w ℓ(f w (x aux ), y aux ) by solving the following learning problem: min θ (xaux,yaux)∈Daux ℓ attack (g θ (∇ w ℓ(f w (x aux ), y aux )) , (x aux , y aux )) . In practice, ℓ attack can be the cross-entropy (for discrete input) or squared-loss (for continuous-valued input) function and we find that using a multi-layer perceptron (MLP) (Bishop et al., 1995) for g θ is effective empirically. Importantly, when a defense mechanism such as gradient perturbation or gradient compression is applied, we can apply the same transformation to ∇ w ℓ(f w (x aux ), y aux ) to augment the training data for g θ to carry out an adaptive attack. We will show in section 4 that this simple approach is surprisingly effective at circumventing existing defenses. Dimensionality reduction for large models. One potential problem for LTI is that the gradients ∇ w ℓ(f w (x aux ), y aux ) can be extremely high-dimensional. For example, both ResNet18 (He et al., 2016) for vision tasks and a three-layer transformer (Vaswani et al., 2017) for language tasks have approximately 1.1 million trainable parameters. Such high-dimensional input to the model g θ can lead to memory issues, as the first layer of the MLP would have 11M × h parameters, where h denotes the size of the first hidden layer. To address this issue, we use feature hashing (Weinberger et al., 2009) to reduce the dimensionality of the input gradient. To this end we create k bins, where k is much smaller than the size of gradient m, and assign each gradient dimension i ∈ [m] to a random bin r(i) ∈ [k]. For each bin, we sum up the gradient values that are assigned to this bin. As a result, we obtain a feature vector of size k for the inversion model g θ . In other words, we project the gradient ∇ w ℓ(f w (x aux ), y aux ) to P ∇ w ℓ(f w (x aux ), y aux ) using the random projection matrix P given by: P ∈ {0, 1} k×m s.t. ∀i, P j,i = 0 (∀j ̸ = r(i)), P r(i),i = 1. If r(i) is implemented with a pseudo-uniform hashing function, P does not need to be stored in memory, reducing the memory footprint of g θ to a constant independent of the gradient dimension.

4. EXPERIMENT

We evaluate LTI on vision and language tasks against several existing defenses to show that it vastly outperforms prior gradient inversion attacks. We consider the following defense mechanisms evaluated in prior work (Zhu et al., 2019; Jeon et al., 2021 ): • None. The gradient shared between the server and clients is the full gradient without any defense. This is the most common setting that previous papers focus on. • Sign compression (Bernstein et al., 2018) applies the sign function to each dimension of the gradient independently to compress the gradient to one bit per dimension. Baselines. We compare our method with two baseline gradient inversion attacks: Inverting Gradients (IG; Geiping et al. ( 2020)), a representative optimization-based method, and Gradient Inversion with Generative Image Prior (GI-GIP; Jeon et al. ( 2021)), the state-of-the-art optimization-based method that uses a generative model to encode the data prior. We make minor modifications to these attacks to adapt them to various defenses; see appendix for details. The threat model for our attack is most similar to GI-GIP since both use an auxiliary dataset to encode the data prior. Evaluation methodology. We evaluate LTI and the aforementioned baselines on 1, 000 random images from the CIFAR10 test split. To measure reconstruction quality, we use three metrics: • Mean squared error (MSE) measures the average pixel-wise (squared) distance between the reconstructed image and the ground truth image. Lower is better. • Peak signal-to-noise ratio (PSNR) measures the ratio between the maximum image pixel value and MSE. Higher is better. • Learned perceptual image patch similarity (LPIPS) measures distance in the features space of a VGG (Simonyan & Zisserman, 2014) model trained on ImageNet. Lower is better.

4.1.1. MAIN RESULTS

Quantitative evaluation. close to 0.1. By comparison, LTI outperforms both baselines significantly and consistently across all three defense mechanisms. For example, under gradient perturbation with σ = 0.1, which prior work believed is sufficient for preventing gradient inversion attacks (Zhu et al., 2019; Jeon et al., 2021) , MSE can be as low as 0.012 for LTI. Our result therefore provides considerable additional insight for the level of empirical privacy achieved by DP-SGD (Abadi et al., 2016) , and suggests that the theoretical privacy leakage as predicted by DP ϵ may be tighter than previously thought. Qualitative evaluation. Figure 2 shows 4 random CIFAR10 test samples and their reconstructions under different defense mechanisms. Without any defense in place, all three methods recover a considerable amount of semantic information about the object of interest, with both GI-GIP and LTI faithfully reconstructing the training sample. Under the sign compression defense, IG completely fails to reconstruct all 4 samples, while GI-GIP only successfully reconstructs the second image. In contrast, LTI is able to recover the semantic information in all 4 samples. Results for gradient pruning and gradient perturbation yield similar conclusions. Additional samples are given in the appendix.

4.1.2. ABLATION STUDIES

Since LTI learns to invert gradients using the auxiliary dataset, its performance depends on the quantity and quality of data available to the adversary. We perform ablation studies to better understand this dependence by changing the auxiliary dataset size and its distribution. Varying the auxiliary dataset size. We randomly subsample the CIFAR10 training set to construct auxiliary datasets of size {500, 5000, 15000, 25000, 35000, 45000, 50000} and evaluate the performance of LTI under various defenses. Figure 3 (a) plots reconstruction MSE as a function of the auxiliary dataset size, which is monotonically decreasing as expected. Moreover, with just 5, 000 samples for training the inversion model (second point in each curve), the performance is nearly as good as when training using the full CIFAR10 training set. Notably, even if auxiliary dataset size as small as 500, reconstruction MSE is still lower than that of IG and GI-GIP in Table 1 . Corresponding figures for PSNR and LPIPS in the appendix show similar findings. Varying the auxiliary data distribution. Although access to a large set of in-distribution data may be not available in practice, it is plausible that the adversary can collect out-of-distribution samples for the auxiliary dataset. This is beneficial for the adversary since a model that learns how to invert out-of-distribution samples given their gradients may transfer to in-distribution data as well. To simulate this scenario, we divide CIFAR10 into two halves with disjoint classes, and construct the auxiliary dataset by combining a β fraction of samples from the first half and a 1 -β fraction of samples from the second half for β ∈ {0, 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1}. The target model f w is trained only on samples from the first half, and hence the auxiliary set has the exact same distribution as the target model's data when β = 1 and only has out-of-distribution data when β = 0. 1 when a defense mechanism is applied. See text for details. Figure 3 (b) shows reconstruction MSE as a function of β; corresponding figures for PSNR and LPIPS are given in the appendix. We make the following observations: 1. Even if the auxiliary dataset only contains 250 in-distribution samples (β = 0.01; second point in each curve), MSE of the inversion model is still lower than that of the best baseline in Table 1 . For example, with the sign compression defense, LTI attains an MSE of ≤ 0.02, which is much lower than the MSE of 0.116 for IG and 0.091 for GI-GIP. 2. When the auxiliary dataset contains only out-of-distribution data (β = 0), the inversion model has very high reconstruction MSE, which suggests that methods for improving out-of-distribution generalization may be necessary for further improvement.

4.2. EVALUATION ON LANGUAGE TASK

For evaluating LTI on a language task, we experiment with causal language model trainingfoot_0 for next-token prediction. The language model f w is a three-layer transformer (Vaswani et al., 2017) with frozen token embedding layer. This is a common technique for language model fine-tuning (Sun et al., 2019) , which also has privacy benefits since direct privacy leakage from the gradient magnitude of the token embedding layer can be prevented (Fowl et al., 2022; Gupta et al., 2022) . As a result, the trainable model contains about 1.1 × 10 6 parameters. We train the language model on WikiText (Merity et al., 2016) , where each training sample is limited to L = 16 tokens and the language model is trained to predict the next token x l given x :l-1 for l = 1, . . . , L using the cross-entropy loss. Baseline. We compare LTI with TAG (Deng et al., 2021) -the state-of-the-art language model gradient inversion attack without utilizing the token embedding layer gradientfoot_1 . The objective function for TAG is a slight modification of Equation 1 that uses both the ℓ 2 and ℓ 1 distance between the observed gradient and the gradient of dummy data. We also modify TAG slightly to adaptive it different defenses; see appendix for details.

Inversion model training.

We follow the setup below for training the gradient inversion model g θ . • Auxiliary dataset. We use ∼ 1.8 × 10 5 samples from the train split of Wikitext as the auxiliary dataset, and 1, 000 samples from the test split for evaluating the attack. In addition, we introduce a weaker variant of our attack that only assumes knowledge of the marginal token distribution for the language model training data. Instead of using the WikiText train split as auxiliary data, we sample random tokens according to the marginal token distribution to generate pseudo-data for training the inversion model. We show that this variant, which we denote LTI-P, can even outperform LTI with in-distribution auxiliary data due to access to infinite training data. • Inversion model architecture. We train a two-layer MLP with ReLU activation and first hidden-layer size 600 and second hidden-layer size 1, 000. The inversion model outputs L probability vectors each with size equal to the vocabulary size (∼ 50, 000), and we train it using the cross-entropy loss to predict the L tokens given the target model gradient. We use feature hashing (Weinberger et al., 2009) to reduce the target model gradient to 10% of its original dimensions as input to the inversion model. • Training details. We use Adam (Kingma & Ba, 2014) to train the inversion model over 20 epochs with batch size 64. Learning rates are selected separately for each defense from {10 -3 , 10 -4 , 10 -5 }. • Computation cost. Our experiments are conducted using NVIDIA GeForce RTX 3090 GPUs and each training run takes about 3 hours. Evaluation methodology. We evaluate LTI and the TAG baseline on 1, 000 samples from the WikiText test set. To measure the quality of reconstructed text, we use four metrics: • Accuracy(%) measures the average token-wise zero-one accuracy. Higher is better. • Rouge-1(%), Rouge-2(%) and Rouge-L(%) measure the overlap of unigram, bigram, and length of longest common subsequence between the ground truth and the reconstructed text. Higher is better. Results. Table 2 shows quantitative comparison between LTI (and its variant LTI-P) and TAG against various defenses. The overall trend is remarkably consistent: LTI and LTI-P outperform TAG in all four metrics for all defense settings, with LTI-P achieving state-of-the-art recovery accuracy by far. This result suggests that knowledge of the marginal token distribution encodes enough data prior for LTI-P to train the inversion model, and having access to infinite training data allows it to better generalize to the test set compared to LTI. In practice, it is very plausible that the marginal token distribution is known to the adversary, and hence LTI-P serves as a surprisingly simple and effective baseline for gradient inversion in NLP. Figure 4 shows 3 random test samples from WikiText and their reconstructions using LTI-P and TAG, with tokens that are correctly reconstructed highlighted in blue. Without any defense, both TAG and LTI-P yield reasonably accurate reconstructions, with LTI-P faithfully reconstructing all but 1-2 tokens. With the sign compression defense applied, TAG fails to recover any token correctly, whereas LTI-P can faithfully recover almost half of the tokens in each sample. Results for gradient pruning and gradient perturbation yield similar conclusions, with TAG recovering a larger but still relatively insignificant set of tokens. Additional samples are given in the appendix.

5. CONCLUSION AND FUTURE WORK

We demonstrated the effectiveness of LTI-a simple learning-based gradient inversion attack-under realistic federated learning settings. For both vision and language tasks, LTI can match or exceed the performance of state-of-the-art optimization-based methods when no defense is applied, and significantly outperform all prior works under defenses based on gradient perturbation and gradient compression. Given its simplicity and versatility, we advocate the use of LTI both as a strong baseline for future research as well as a diagnostic tool for evaluating privacy leakage in FL. Negative societal impact. The concept of a gradient inversion attack can lead to negative consequences if used inappropriately. Our work showed that if FL is deployed without consideration for gradient inversion attacks, an adversary can leverage its vulnerabilities to compromise the data privacy of clients even under strong empirical defenses. However, we strongly emphasize that our work should not be interpreted as a tool for adversaries, but rather serve to inform the community about the risks of data privacy breach in FL and promote future research into safe practices. Limitations. This paper serves as preliminary work towards understanding the effectiveness of learning-based gradient inversion attacks, and our method can be further improved along several different directions. 1. For large models, our current approach is to hash the gradients into a lowerdimensional space to reduce memory cost. It may be possible to leverage the model's architecture to design more effective dimensionality reduction techniques to further scale up the method. 2. Currently we only focus on the setting with batch size 1, which precludes the use of secure aggregation (Bonawitz et al., 2016 )-a common technique in FL for amplifying privacy by aggregating the gradients from multiple clients before revealing it to the server. For LTI, the complexity of MLP would increase when the batch size increases, which makes the learning harder. More advanced model architecture and loss design might help with the large batch case. 3. LTI in its current form does not leverage additional data priors such as the smoothness prior for images and text fluency prior for text. We can readily incorporate these priors by modifying the inversion model's loss function with total variation (for image data) or perplexity on a trained language model (for text data), which may further improve the performance of LTI.

A.2 AUXILIARY DATASET ABLATION STUDIES

In section 4.1.2 we showed reconstruction MSE for LTI as a function of the auxiliary dataset size and the shift factor β. For completeness, we show the corresponding PSNR and LPIPS curves in Figure 6 . Similar to Figure 3 , when reducing the auxiliary dataset size (e.g., from 50, 000 to 5, 000) or reducing the proportion of in-distribution data (e.g., from β = 1 to β = 0.1), the performance of LTI does not worsen significantly. A.3 ADDITIONAL SAMPLES Figure 7 and Figure 8 show additional samples and their reconstructions under various defense mechanisms. The result is consistent with Figure 2 and Figure 4 , where LTI shows consistently better reconstruction quality compared to baselines. 



We follow the task setup and code in https://github.com/JonasGeiping/breaching We do not compare against a more recent attack byGupta et al. (2022) since it crucially depends on access to the token embedding layer gradient.



Inversion model training. We follow the setup below for training the gradient inversion model g θ . • Auxiliary dataset. We use the train split of CIFAR10 as the auxiliary dataset for training the inversion model g θ and the test split for inverting gradients computed on the global model f w . • Inversion model architecture. We use a three-layer MLP with hidden size 3000 for our inversion model g θ . The MLP takes the flattened gradient vector as input and outputs a 3072-dimensional vector representing the flattened image. The training objective ℓ attack in Equation 2 is the mean squared error (MSE) between the output vector from MLP and the flattened ground truth image. • Training details. We use the Adam (Kingma & Ba, 2014) optimizer for training g θ . The model is trained for 200 epochs using training batch size 256. The initial learning rate is 10 -4 with learning rate drop to 10 -5 after 150 epochs. • Computation cost. Our experiments are conducted using NVIDIA GeForce RTX 2080 GPUs and each training run takes about 1.5 hours.

Figure 2: Comparison of LTI with IG and GI-GIP for reconstructing 4 random images in CIFAR10 test set. Under sign compression, only LTI can partially reconstruct the images to recover the object of interest whereas both IG and GI-GIP fail to do so on most samples.

Figure3: Ablation studies on size and distribution of the auxiliary dataset D aux . Under both severe data size limitation (left) and data distribution shift (β = 0.01; right), LTI is able to outperform both baselines in Table1when a defense mechanism is applied. See text for details.

Figure 4: Ground truth text and their reconstructions for 3 random samples from the WikiText test set. LTI-P significantly outperforms TAG both with and without defenses, especially under sign compression where TAG fails to recover any token while LTI-P is capable of recovering almost half of the tokens in each sample.

Figure 8: Additional samples from WikiText and their reconstructions.

Gradient pruning with pruning rate α (Aji & Heafield, 2017) zeroes out the bottom 1 -α fraction of coordinates of ∇

Result for gradient inversion attack on vision data. LTI achieves the best performance on all three metrics against the sign compression, gradient pruning and Gaussian perturbation defenses.4.1 EVALUATION ON VISION TASKFor evaluating LTI on a vision task, we experiment with image classification on CIFAR10(Krizhevsky et al., 2009). The target model f w is LeNet(LeCun et al., 1998) with 1.5 × 10 4 parameters trained using the cross-entropy loss.

Results for gradient inversion attack on text data. Both LTI and LTI-P significantly outperform TAG cross different settings in all 4 metrics, where LTI-P achieves the best result with only access to the marginal token distribution for generating the auxiliary dataset.

A SUPPLEMENTARY MATERIAL FOR SECTION 4

A.1 MODIFICATIONS FOR BASELINE METHODS Vision baselines. Both IG and GI-GIP use cosine distance instead of ℓ 2 distance in Equation 1 for optimizing the dummy data. For the sign compression defense, this loss function does not optimize the correct objective since the dummy data's gradient is a vector with ±1 entries but rather a real-valued vector with the same sign. We replace cosine distance by the loss m i=1 ℓ i sign 2 whereOne sanity check for this loss is that when ∇ wi ℓ(f w (x), ỹ) has the same sign as that of ∇ wi ℓ(f w (x), y), the minimum loss value of 0 is achieved. For the gradient pruning defense, optimizing the cosine distance between the dummy data gradient and the pruned ground truth gradient will force too many gradient values to 0, which is the incorrect value for the full ground truth gradient. Therefore we only compute cosine distance over the non-zero dimensions of pruned gradient.Language baselines. For TAG, we find that the loss function also needs to be modified slightly to accommodate the sign compression and gradient pruning defenses:• Sign compression. Similar to the vision baselines, the ℓ 2 and ℓ 1 distance between the dummy data gradient and the ground truth gradient sign do not optimize the correct objective. We replace ∥ • ∥ 2 2 and ∥ • ∥ 1 by m i=1 ℓ i sign 2 and ℓ i sign , respectively, where m i=1 ℓ i sign is defined in Equation 3. • Gradient pruning. We make the same modification to TAG as in the vision baselines. 

