DIFFERENTIALLY PRIVATE LEARNING NEEDS BETTER FEATURES (OR MUCH MORE DATA)

Abstract

We demonstrate that differentially private machine learning has not yet reached its "AlexNet moment" on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets. To exceed the performance of handcrafted features, we show that private learning requires either much more private data, or access to features learned on public data from a similar domain. Our work introduces simple yet strong baselines for differentially private learning that can inform the evaluation of future progress in this area.

1. INTRODUCTION

Machine learning (ML) models have been successfully applied to the analysis of sensitive user data such as medical images (Lundervold & Lundervold, 2019 ), text messages (Chen et al., 2019) or social media posts (Wu et al., 2016) . Training these ML models under the framework of differential privacy (DP) (Dwork et al., 2006b; Chaudhuri et al., 2011; Shokri & Shmatikov, 2015; Abadi et al., 2016) can protect deployed classifiers against unintentional leakage of private training data (Shokri et al., 2017; Song et al., 2017; Carlini et al., 2019; 2020 ). Yet, training deep neural networks with strong DP guarantees comes at a significant cost in utility (Abadi et al., 2016; Yu et al., 2020; Bagdasaryan et al., 2019; Feldman, 2020) . In fact, on many ML benchmarks the reported accuracy of private deep learning still falls short of "shallow" (non-private) techniques. For example, on CIFAR-10, Papernot et al. (2020b) train a neural network to 66.2% accuracy for a large DP budget of ε = 7.53, the highest accuracy we are aware of for this privacy budget. Yet, without privacy, higher accuracy is achievable with linear models and non-learned "handcrafted" features, e.g., (Coates & Ng, 2012; Oyallon & Mallat, 2015) . This leads to the central question of our work: Can differentially private learning benefit from handcrafted features? We answer this question affirmatively by introducing simple and strong handcrafted baselines for differentially private learning, that significantly improve the privacy-utility guarantees on canonical vision benchmarks. Our contributions. We leverage the Scattering Network (ScatterNet) of Oyallon & Mallat (2015)a non-learned SIFT-like feature extractor (Lowe, 1999)-to train linear models that improve upon the privacy-utility guarantees of deep learning on MNIST, Fashion-MNIST and CIFAR-10 (see Table 1 ). For example, on CIFAR-10 we exceed the accuracy reported by Papernot et al. (2020b) while simultaneously improving the provable DP-guarantee by 130×. On MNIST, we match the privacy-utility guarantees obtained with PATE (Papernot et al., 2018) without requiring access to any public data. We find that privately training deeper neural networks on handcrafted features also significantly improves over end-to-end deep learning, and even slightly exceeds the simpler linear models on CIFAR-10. Our results show that private deep learning remains outperformed by handcrafted priors on many tasks, and thus has yet to reach its "AlexNet moment" (Krizhevsky et al., 2012) . We find that models with handcrafted features outperform end-to-end deep models, despite having more trainable parameters. This is counter-intuitive, as the guarantees of private learning degrade Table 1 : Test accuracy of models with handcrafted ScatterNet features compared to prior results with end-to-end CNNs for various DP budgets (ε, δ = 10 -5 ). Lower ε values provide stronger privacy. The end-to-end CNNs with maximal accuracy for each privacy budget are underlined. We select the best ScatterNet model for each DP budget ε ≤ 3 with a hyper-parameter search, and show the mean and standard deviation in accuracy for five runs. with dimensionality in the worst case (Bassily et al., 2014) . 1 We explain the benefits of handcrafted features by analyzing the convergence rate of non-private gradient descent. First, we observe that with low enough learning rates, training converges similarly with or without privacy (both for models with and without handcrafted features). Second, we show that handcrafted features significantly boost the convergence rate of non-private learning at low learning rates. As a result, when training with privacy, handcrafted features lead to more accurate models for a fixed privacy budget. Considering these results, we ask: what is the cost of private learning's "AlexNet moment"? That is, which additional resources do we need in order to outperform our private handcrafted baselines? Following McMahan et al. (2018) , we first consider the data complexity of private end-to-end learning. On CIFAR-10, we use an additional 500,000 labeled Tiny Images from Carmon et al. (2019) to show that about an order of magnitude more private training data is needed for end-to-end deep models to outperform our handcrafted features baselines. The high sample-complexity of private deep learning could be detrimental for tasks that cannot leverage "internet-scale" data collection (e.g., most medical applications). We further consider private learning with access to public data from a similar domain. In this setting, handcrafted features can be replaced by features learned from public data via transfer learning (Razavian et al., 2014) . While differentially private transfer learning has been studied in prior work (Abadi et al., 2016; Papernot et al., 2020a) , we find that its privacy-utility guarantees have been underestimated. We revisit these results and show that with transfer learning, strong privacy comes at only a minor cost in accuracy. For example, given public unlabeled ImageNet data, we train a CIFAR-10 model to 92.7% accuracy for a DP budget of ε = 2. Our work demonstrates that higher quality features-whether handcrafted or transferred from public data-are of paramount importance for improving the performance of private classifiers in low (private) data regimes. Code to reproduce our experiments is available at https://github.com/ftramer/ Handcrafted-DP.

2. STRONG SHALLOW BASELINES FOR DIFFERENTIALLY PRIVATE LEARNING

We consider the standard central model of differential privacy (DP): a trusted party trains an ML model f on a private dataset D ∈ D, and publicly releases the model. The learning algorithm A



A number of recent works have attempted to circumvent this worst-case dimensionality dependence by leveraging the empirical observation that model gradients lie in a low-dimensional subspace(Kairouz et al., 2020; Zhou et al., 2020b).

