LEARNING TO INVERT: SIMPLE ADAPTIVE ATTACKS FOR GRADIENT INVERSION IN FEDERATED LEARNING

Abstract

Gradient inversion attack enables recovery of training samples from model updates in federated learning (FL) and constitutes a serious threat to data privacy. To mitigate this vulnerability, prior work proposed both principled defenses based on differential privacy, as well as heuristic defenses based on gradient compression as countermeasures. These defenses have so far been very effective, in particular those based on gradient compression that allow the model to maintain high accuracy while greatly reducing the attack's effectiveness. In this work, we argue that such findings do not accurately reflect the privacy risk in FL, and show that existing defenses can be broken by a simple adaptive attack that trains a model using auxiliary data to learn how to invert gradients on both vision and language tasks.

1. INTRODUCTION

Federated learning (FL; (McMahan et al., 2017) ) is a popular framework for distributed model training on sensitive user data. Instead of centrally storing the training data, FL operates in a serverclient setting where the server hosts the model and has no direct access to the data. The clients can apply the model on their private data and send gradient updates back to the server. This learning regime promises data privacy as users only share gradients but never any raw data. However, recent work (Zhu et al., 2019; Zhao et al., 2020; Geiping et al., 2020) showed that despite these efforts, the server can still recover the training data from gradient updates, violating the promise of data privacy in FL. These so-called gradient inversion attacks operate by optimizing over the input space to find training samples whose gradient matches that of the observed gradient, and such attacks remain effective even when clients utilize secure aggregation (Bonawitz et al., 2016) to avoid revealing individual updates (Yin et al., 2021; Jeon et al., 2021) . As countermeasures against these gradient inversion attacks, prior work proposed both principled defenses based on differential privacy (Abadi et al., 2016) , as well as heuristics that compress the gradient update through gradient pruning (Aji & Heafield, 2017) or sign compression (Bernstein et al., 2018) . In particular, gradient compression defenses have so far enjoyed great success, severely hindering the effectiveness of existing optimization-based attacks (Zhu et al., 2019; Jeon et al., 2021) while maintaining close to the same level of accuracy for the trained model. As a result, these limitations seemingly diminish the threat of gradient inversion in practical FL applications. In this paper we argue that evaluating defenses on existing optimization-based attacks may provide a false sense of security. To this end, we propose a simple learning-based attack-which we call Learning To Invert (LTI)-that trains a model to learn how to invert the gradient update to recover client samples; see Figure 1 for an illustration. We assume that the adversary (i.e., the server) has access to an auxiliary dataset whose distribution is similar to that of the private data, and use it to generate training samples for the gradient inversion model by querying the global model for gradients. Our attack is highly adaptable to different defenses since applying a defense simply amounts to training data augmentation for the gradient inversion model. We empirically demonstrate that LTI can successfully circumvent defenses based on gradient perturbation (i.e., using differential privacy; (Abadi et al., 2016) ), gradient pruning (Aji & Heafield, 2017) and sign compression (Bernstein et al., 2018) on both vision and language tasks.

