CANIFE: CRAFTING CANARIES FOR EMPIRICAL PRI-VACY MEASUREMENT IN FEDERATED LEARNING

Abstract

Federated Learning (FL) is a setting for training machine learning models in distributed environments where the clients do not share their raw data but instead send model updates to a server. However, model updates can be subject to attacks and leak private information. Differential Privacy (DP) is a leading mitigation strategy which involves adding noise to clipped model updates, trading off performance for strong theoretical privacy guarantees. Previous work has shown that the threat model of DP is conservative and that the obtained guarantees may be vacuous or may overestimate information leakage in practice. In this paper, we aim to achieve a tighter measurement of the model exposure by considering a realistic threat model. We propose a novel method, CANIFE, that uses canaries-carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round. We apply this attack to vision models trained on CIFAR-10 and CelebA and to language models trained on Sent140 and Shakespeare. In particular, in realistic FL scenarios, we demonstrate that the empirical per-round epsilon obtained with CANIFE is 4 -5× lower than the theoretical bound.

1. INTRODUCTION

Federated Learning (FL) has recently become a popular paradigm for training machine learning models across a large number of clients, each holding local data samples (McMahan et al., 2017a) . The primary driver of FL's adoption by the industry is its compatibility with the "privacy by design" principle, since the clients' raw data are not communicated to other parties during the training procedure (Kairouz et al., 2019; Huba et al., 2022; Xu et al., 2022) . Instead, clients train the global model locally before sending back updates, which are aggregated by a central server. However, model updates, in their individual or aggregate form, leak information about the client local samples (Geiping et al., 2020; Gupta et al., 2022) . Differential Privacy (DP) (Dwork et al., 2006; Abadi et al., 2016) is a standard mitigation to such privacy leakage. Its adaptation to the FL setting, DP-FEDAVG (McMahan et al., 2017b) , provides user-level guarantees by adding Gaussian noise to the aggregated clipped model updates received by the server. In practice, training with strong privacy guarantees comes at the expense of model utility (Bassily et al., 2014; Kairouz et al., 2019) , notwithstanding efforts to close this gap, either with public pre-training and partial model updates (Xu et al., 2022) , accountants with better compositionality properties (Mironov, 2017) or DP variants such as DP-FTRL (Kairouz et al., 2021) . Hence, it is common in practical deployments of DP-FL to train with a high privacy budget ε resulting in loose privacy guarantees (Ramaswamy et al., 2020) . Such large privacy budgets often provide vacuous guarantees on the information leakage, for instance, against membership inference attacks ( Mahloujifar et al., 2022) . Encouragingly, recent work has shown that the information recovered in practice using state-of-the-art attacks is less than what theoretical bounds may allow (Nasr et al., 2021) . This suggests that DP is conservative and that a tighter measurement of the model exposure may be achieved by considering more realistic threat models. In this paper, we propose to complement DP-FL training with a novel attack method, CANaries In Federated Environments (CANIFE), to measure empirical privacy under a realistic threat model. We assume that a rogue client wants to reconstruct data samples from the model updates. To make its job easier, this adversary is allowed to craft an outlier training sample, the canary. The training round proceeds normally, after which the rogue client performs a statistical test to detect the canary in the global noisy model update provided to the server by any secure aggregation protocol (see Figure 2 ). Finally, we translate the attack results into a per round measure of empirical privacy (Jagielski et al., 2020; Nasr et al., 2021) and propose a method using amplification by subsampling to compute the empirical privacy incurred during training as depicted in Figure 1 for standard FL benchmarks. Critically, our privacy attack is designed to approximate the worst-case data sample, not the worstcase update vector. The rogue client seeks to undermine the privacy guarantee by manipulating its input, which is consistent with FL environments using secure sandboxing to protect the integrity of the training process (Frey, 2021) . We additionally model the server as the honest party, not allowing it to poison the global model in order to reconstruct training samples, in contrast with a recent line of work (Fowl et al., 2021; Boenisch et al., 2021; Wen et al., 2022; Fowl et al., 2022) . In summary, our contributions are as follows: • We propose CANIFE (Section 3), a novel and practical privacy attack on FL that injects crafted canary samples. It augments the standard DP-FL training with a tight measure of the model's privacy exposure given a realistic yet conservative threat model. • CANIFE is compatible with natural language and image modalities, lightweight and requires little representative data and computation to be effective. As a sanity check, we demonstrate that CANIFE tightly matches DP guarantees in a toy setup (Section 4.1) before exploring how it behaves in the federated setting (Section 4.2). • Our work highlights the gap between the practical privacy leakage and the DP guarantees in various scenarios. For instance, on the CelebA benchmark, we obtain an empirical measure ε ≈ 6 for a model trained with a formal privacy guarantee of ε = 50. The privacy parameter ε is called the privacy budget and it determines an upper bound on the information an adversary can obtain from the output of an (ε, δ)-DP algorithm. The parameter δ defines the probability of failing to guarantee the differential privacy bound for any two adjacent datasets.

2. BACKGROUND

In this work, we are interested in user-level differential privacy which takes D and D ′ to be adjacent if D ′ can be formed by adding or removing all samples associated with a single user from D.



Figure 1: Empirical privacy measurements over the course of FL training for LEAF benchmarks Sent140, CelebA and Shakespeare with ε ∈ {10, 30, 50}. We observe a notable gap between the theoretical ε obtained with DP-FEDSGD and the empirical ε obtained with CANIFE.

et al., 2006; Dwork & Roth, 2014)  defines a standard notion of privacy that guarantees the output of an algorithm does not depend significantly on a single sample or user. Definition 1 (Differential Privacy). A randomised algorithm M : D → R satisfies (ε, δ)-differential privacy if for any two adjacent datasets D, D ′ ∈ D and any subset of outputs S ⊆ R, P(M(D) ∈ S) ≤ e ε P(M(D ′ ) ∈ S) + δ.

