PANNING FOR GOLD IN FEDERATED LEARNING: TARGETED TEXT EXTRACTION UNDER ARBITRARILY LARGE-SCALE AGGREGATION

Abstract

As federated learning (FL) matures, privacy attacks against FL systems in turn become more numerous and complex. Attacks on language models have progressed from recovering single sentences in simple classification tasks to recovering larger parts of user data. Current attacks against federated language models are sequence-agnostic and aim to extract as much data as possible from an FL update -often at the expense of fidelity for any particular sequence. Because of this, current attacks fail to extract any meaningful data under large-scale aggregation. In realistic settings, an attacker cares most about a small portion of user data that contains sensitive personal information, for example sequences containing the phrase "my credit card number is ...". In this work, we propose the first attack on FL that achieves targeted extraction of sequences that contain privacycritical phrases, whereby we employ maliciously modified parameters to allow the transformer itself to filter relevant sequences from aggregated user data and encode them in the gradient update. Our attack can effectively extract sequences of interest even against extremely large-scale aggregation.

1. INTRODUCTION

Industrial machine learning models are often trained on large sets of user data. In a traditional centralized training paradigm, this is done by aggregating user data into a large repository. Unfortunately, when user data contains personal information in the form of text, images, or other media, dataset aggregation leads to significant security, regulatory, and liability risks. Against this backdrop, federated learning (FL) has emerged as a popular way to train models with decentralized data, that is without the need for a central party to host a dataset. By exchanging only model gradients, user devices collaboratively train a model without the direct exchange of plaintext data. In many applications, FL is slower than centralized training (Bonawitz et al., 2019) , but the privacy benefits outweigh the costs, especially in next-word text prediction which requires training on private text from smartphones (Hard et al., 2019) . Privacy through federated learning is sometimes taken for granted. In reality, the actual privacy achieved by federated learning systems depends on a large number of factors and parameters -model size, architecture, number of users, the aggregation scheme, and more. Attacks against privacy in FL probe this boundary, empirically discovering pitfalls that should be considered and avoided when designing federated protocols (Phong et al., 2017; Melis et al., 2019; Geiping et al., 2020) . In this work, we study the security of federated learning systems involving transformer architectures (Vaswani et al., 2017) which form the backbone of many recent advancements in natural language processing (Brown et al., 2020; Dosovitskiy et al., 2021; Jumper et al., 2021) , and especially applications in text, which represent a key point of interest in many modern applications of federated learning (Paulik et al., 2021; Dimitriadis et al., 2022) . Our main threat model of interest is the untrusted server scenario, also known as the malicious server scenario, in which the server may make changes to model parameters in order to break user privacy. This is in contrast to the honest-butcurious threat model, in which no malicious changes are permitted to the model training protocol. Untrusted server scenarios are of critical importance from a user-centric privacy perspective.

