A CLOSER LOOK AT THE CALIBRATION OF DIFFERENTIALLY PRIVATE LEARNERS

Abstract

We systematically study the calibration of classifiers trained with differentially private stochastic gradient descent (DP-SGD) and observe miscalibration across a wide range of vision and language tasks. Our analysis identifies per-example gradient clipping in DP-SGD as a major cause of miscalibration, and we show that existing approaches for improving calibration with differential privacy only provide marginal improvements in calibration error while occasionally causing large degradations in accuracy. As a solution, we show that differentially private variants of post-processing calibration methods such as temperature scaling and Platt scaling are surprisingly effective and have negligible utility cost to the overall model. Across 7 tasks, temperature scaling and Platt scaling with DP-SGD result in an average 3.1-fold reduction in the in-domain expected calibration error and only incur at most a minor percent drop in accuracy.

1. INTRODUCTION

Modern deep learning models tend to memorize their training data in order to generalize better (Zhang et al., 2021; Feldman, 2020) , posing great privacy challenges in the form of training data leakage or membership inference attacks (Shokri et al., 2017; Hayes et al., 2017; Carlini et al., 2021) . To address these concerns, differential privacy (DP) has become a popular paradigm for providing rigorous privacy guarantees when performing data analysis and statistical modeling based on private data. In practice, a commonly used DP algorithm to train machine learning (ML) models is DP-SGD (Abadi et al., 2016) . The algorithm involves clipping per-example gradients and injecting noises into parameter updates during the optimization process. Despite that DP-SGD can give strong privacy guarantees, prior works have identified that this privacy comes at a cost of other aspects of trustworthy ML, such as degrading accuracy and causing disparate impact (Bagdasaryan et al., 2019; Feldman, 2020; Sanyal et al., 2022) . These tradeoffs pose a challenge for privacy-preserving ML, as it forces practitioners to make difficult decisions on how to weigh privacy against other key aspects of trustworthiness. In this work, we expand the study of privacy-related tradeoffs by characterizing and proposing mitigations for the privacy-calibration tradeoff. The tradeoff is significant as accessing model uncertainty is important for deploying models in safety-critical scenarios like healthcare and law where explainability (Cosmides & Tooby, 1996) and risk control (Van Calster et al., 2019) are needed in addition to privacy (Knolle et al., 2021) . The existence of such a tradeoff may be surprising, as we might expect differentially private training to improve calibration by preventing models from memorizing training examples and promoting generalization (Dwork et al., 2015; Bassily et al., 2016; Kulynych et al., 2022) . Moreover, training with modern pre-trained architectures show a strong positive correlation between calibration and classification error (Minderer et al., 2021) and using differentially private training based on pre-trained models are increasingly performant (Tramer & Boneh, 2021; Li et al., 2022b; De et al., 2022) . However, we find that DP training has the surprising effect of consistently producing over-confident prediction scores in practice (Bu et al., 2021 ). We show an example of this phenomenon in a simple 2D logistic regression problem (Fig. 1 ). We find a polarization phenomenon, where the DP-trained model achieves similar accuracy to its non-private counterpart, but its confidences are clustered around either 0 or 1. As we will see later, the polarization insight conveyed by this motivating example transfers to more realistic settings. Our first contribution quantifies existing privacy-calibration tradeoffs for state-of-the-art models that leverage DP training and pre-trained backbones such as RoBERTa (Liu et al., 2019b) and vision transformers (ViT) (Dosovitskiy et al., 2020) . Although there have been some studies of miscalibration for differentially private learning (Bu et al., 2021; Knolle et al., 2021) , they focus on simple tasks (e.g., MNIST, SNLI) with relatively small neural networks trained from scratch. Our work shows that miscalibration problems persist even for state-of-the-art private models with accuracies approaching or matching their non-private counterparts. Through controlled experiments, we show that these calibration errors are unlikely solely due to the regularization effects of DP-SGD, and are more likely caused by the per-example gradient clipping operation in DP-SGD. Our second contribution shows that the privacy-calibration tradeoff can be easily addressed through differentially private variants of temperature scaling (DP-TS) and Platt scaling (DP-PS). To enable these modifications, we provide a simple privacy accounting analysis, proving that DP-SGD based recalibration on a held-out split does not incur additional privacy costs. Through extensive experiments, we show that DP-TS and DP-PS effectively prevent DP-trained models from being overconfident and give a 3.1-fold reduction in in-domain calibration error on average, substantially outperforming more complex interventions that have been claimed to improve calibration (Bu et al., 2021; Knolle et al., 2021) . 

2. RELATED WORK

Differentially Private Deep Learning. DP-SGD (Song et al., 2013; Abadi et al., 2016 ) is a popular algorithm for training deep learning models with DP. Recent works have shown that fine-tuning high-quality pre-trained models with DP-SGD results in good downstream performance (Tramer & Boneh, 2021; Li et al., 2022b; De et al., 2022; Li et al., 2022a) . Existing works have studied how ensuring differential privacy through mechanisms such as DP-SGD leads to tradeoffs with other properties, such as accuracy (Feldman, 2020) and fairness (Bagdasaryan et al., 2019; Tran et al., 2021; Sanyal et al., 2022; Esipova et al., 2022) (measured by the disparity in accuracies across groups). Our miscalibration findings are closely related to the above privacy-fairness tradeoff that has already received substantial attention. For example, per-example gradient clipping is shown to exacerbate accuracy disparity (Tran et al., 2021; Esipova et al., 2022) . Some fairness notions also require calibrated predictions such as calibration over demographic groups (Pleiss et al., 2017; Liu et al., 2019a) or a rich class of structured "identifiable" subpopulations (Hébert-Johnson et al., 2018; Kim et al., 2019) . Our work expands the understanding of tradeoffs between privacy and other aspects of trustworthiness by characterizing privacy-calibration tradeoffs. Calibration. Calibrated probability estimates match the true empirical frequencies of an outcome, and calibration is often used to evaluate the quality of uncertainty estimates provided by ML models. Recent works have observed that highly-accurate models that leverage pre-training are often wellcalibrated (Hendrycks et al., 2019; Desai & Durrett, 2020; Minderer et al., 2021; Kadavath et al., 2022) . However, we find that even pre-trained models are poorly calibrated when they are fine-tuned using DP-SGD. Our work is not the first to study calibration under learning with DP, but we provide



Figure 1: DP-SGD gives rise to miscalibration for logistic regression. (a) Logistic Regression model (blue line) with ϵ = 8 on Gaussian data {(x i , y i )} n i=1 where (x, y) ∈ R p ×{1, -1}, (x-b)|y ∼ N (0, I 2×2 ), b = (1.5, 0) if y = 1 else b = (0, 1.5), and y is Rademacher distributed. (b) Reliability diagram and confidence histogram. DP-SGD trained classifier, which shows poor calibration with a large concentration of extreme confidence values (Left); the baseline is a standard, non-private logistic regression model trained by SGD, which is much better calibrated (Right).

