DENOISING MASKED AUTOENCODERS HELP ROBUST CLASSIFICATION

Abstract

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images. In DMAE, we corrupt each image by adding Gaussian noises to each pixel value and randomly masking several patches. A Transformer-based encoder-decoder model is then trained to reconstruct the original image from the corrupted one. In this learning paradigm, the encoder will learn to capture relevant semantics for the downstream tasks, which is also robust to Gaussian additive noises. We show that the pre-trained encoder can naturally be used as the base classifier in Gaussian smoothed models, where we can analytically compute the certified radius for any data point. Although the proposed method is simple, it yields significant performance improvement in downstream classification tasks. We show that the DMAE ViT-Base model, which just uses 1/10 parameters of the model developed in recent work (Carlini et al., 2022), achieves competitive or better certified accuracy in various settings. The DMAE ViT-Large model significantly surpasses all previous results, establishing a new state-of-the-art on ImageNet dataset. We further demonstrate that the pre-trained model has good transferability to the CIFAR-10 dataset, suggesting its wide adaptability. Models and code are available at https://github.com/quanlin-wu/dmae. , Salman et al. (2020); Carlini et al. (2022) took the first step to train Gaussian smoothed classifiers with the help of self-supervised learning. Both approaches use a compositional model architecture for f and decompose the prediction process into two stages. In the first stage, a denoising

1. INTRODUCTION

Deep neural networks have demonstrated remarkable performance in many real applications (He et al., 2016; Devlin et al., 2019; Silver et al., 2016) . However, at the same time, several works observed that the learned models are vulnerable to adversarial attacks (Szegedy et al., 2013; Biggio et al., 2013) . Taking image classification as an example, given an image x that is correctly classified to label y by a neural network, an adversary can find a small perturbation such that the perturbed image, though visually indistinguishable from the original one, is predicted into a wrong class with high confidence by the model. Such a problem raises significant challenges in practical scenarios. Given such a critical issue, researchers seek to learn classifiers that can provably resist adversarial attacks, which is usually referred to as certified defense. One of the seminal approaches in this direction is the Gaussian smoothed model. A Gaussian smoothed model g is defined as g(x) = E η f (x + η), in which η ∼ N (0, σ 2 I) and f is an arbitrary classifier, e.g., neural network. Intuitively, the smoothed classifier g can be viewed as an ensemble of the predictions of f that takes noise-corrupted images x + η as inputs. Cohen et al. (2019) derived how to analytically compute the certified radius of the smoothed classifier g, and follow-up works improved the training methods of the Gaussian smoothed model with labeled data (Salman et al., 2019; Zhai et al., 2021; Jeong & Shin, 2020; Horváth et al., 2022; Jeong et al., 2021) .

