SELF-SUPERVISED PRETRAINING FOR DIFFERENTIALLY PRIVATE LEARNING

Abstract

We demonstrate self-supervised pretraining (SSP) is a scalable solution to deep learning with differential privacy (DP) regardless of the size of available public datasets in image classification. When facing the lack of public datasets, we show the features generated by SSP on only one single image enable a private classifier to obtain a much better utility than the non-learned handcrafted features under the same privacy budget. When a moderate or large size public dataset is available, the features produced by SSP greatly outperform the features trained with labels on various complex private datasets under the same private budget. We also compared multiple DP-enabled training frameworks to train a private classifier on the features generated by SSP.

1. INTRODUCTION

Machine learning (ML) has been applied ubiquitously in the analysis of sensitive data such as medical images (Tajbakhsh et al., 2016) , financial records (Fischer & Krauss, 2018) , or social media channels (Agrawal & Awekar, 2018) . Many attacks (Shokri et al., 2017; Carlini et al., 2021) are developed to successfully extract meaningful training data out of standard ML models. According to recent governmental regulations, e.g., GDPR and CCPA, ML models have to protect sensitive training data. Differential privacy (DP) (Chaudhuri et al., 2011; Bu et al., 2020; Abadi et al., 2016) has emerged as an effective framework to train models resilient to private training data leakage. Unfortunately, training models with strong DP guarantees significantly hurts the model utility (i.e., accuracy) (Papernot et al., 2018; Abadi et al., 2016) . Although non-learned handcrafted features such as ScatterNet (Oyallon & Mallat, 2015; Oyallon et al., 2018) make a private linear model (Tramer & Boneh, 2021) achieve the state-of-the-art (SOTA) utility of < 70% under the privacy budget of (ϵ ≤ 3, δ = 10 -5 ) on a private CIFAR-10 dataset, it is difficult to learn better features in the DP domain, since the clipped and perturbed gradients during DP training provide only a noisy estimate of the update direction. In contrast, it is straightforward that pretrained features learned from large public labeled (Luo et al., 2021) datasets can greatly mitigate the utility gap between private and non-private models. However, sometimes there is no available public dataset for training a feature extractor due to legal causes or ethical issues (Flanders, 2009) . In this paper, we aim to demonstrate that self-supervised pretraining (SSP) is a scalable solution to improving the utility of deep learning with DP regardless of the size of available public datasets in image classification. Any updates on the learnable parameters of a differentially private model increase privacy overhead. It is easier to achieve both high utility and small privacy loss via the features generated by a well-trained feature extractor that can fully take advantage of SOTA network architectures and public datasets. Even when no large public dataset is available, we show a feature extractor built upon data-efficient HarmonicNet (Ulicny et al., 2019) and trained by selfsupervised SimCLRv2 (Chen et al., 2020) on only one single image (YM. et al., 2020) can make a private linear classifier obtain much better utility than the non-learned handcrafted features (Tramer & Boneh, 2021) under the same privacy budget. With a larger public dataset, the features generated by SSP substantially outperform the features trained with labels on various complex private datasets, as shown in Table 1 . To better explore the trade-off between utility and privacy, we compared SOTA DP-enabled training frameworks, i.e., DP stochastic gradient descent (DPSGD) (Abadi et al., 2016) , DP direct feedback alignment (DPDFA) (Ohana et al., 2021; Lee & Kifer, 2020) , DP stochastic gradient Langevin dynamics (DPSGLD) (Bu et al., 2021) , and Private Aggregation of Teacher En-

