DENOISING MASKED AUTOENCODERS HELP ROBUST CLASSIFICATION

Abstract

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images. In DMAE, we corrupt each image by adding Gaussian noises to each pixel value and randomly masking several patches. A Transformer-based encoder-decoder model is then trained to reconstruct the original image from the corrupted one. In this learning paradigm, the encoder will learn to capture relevant semantics for the downstream tasks, which is also robust to Gaussian additive noises. We show that the pre-trained encoder can naturally be used as the base classifier in Gaussian smoothed models, where we can analytically compute the certified radius for any data point. Although the proposed method is simple, it yields significant performance improvement in downstream classification tasks. We show that the DMAE ViT-Base model, which just uses 1/10 parameters of the model developed in recent work (Carlini et al., 2022), achieves competitive or better certified accuracy in various settings. The DMAE ViT-Large model significantly surpasses all previous results, establishing a new state-of-the-art on ImageNet dataset. We further demonstrate that the pre-trained model has good transferability to the CIFAR-10 dataset, suggesting its wide adaptability. Models and code are available at https://github.

1. INTRODUCTION

Deep neural networks have demonstrated remarkable performance in many real applications (He et al., 2016; Devlin et al., 2019; Silver et al., 2016) . However, at the same time, several works observed that the learned models are vulnerable to adversarial attacks (Szegedy et al., 2013; Biggio et al., 2013) . Taking image classification as an example, given an image x that is correctly classified to label y by a neural network, an adversary can find a small perturbation such that the perturbed image, though visually indistinguishable from the original one, is predicted into a wrong class with high confidence by the model. Such a problem raises significant challenges in practical scenarios. Given such a critical issue, researchers seek to learn classifiers that can provably resist adversarial attacks, which is usually referred to as certified defense. One of the seminal approaches in this direction is the Gaussian smoothed model. A Gaussian smoothed model g is defined as g(x) = E η f (x + η), in which η ∼ N (0, σ 2 I) and f is an arbitrary classifier, e.g., neural network. Intuitively, the smoothed classifier g can be viewed as an ensemble of the predictions of f that takes noise-corrupted images x + η as inputs. Cohen et al. (2019) derived how to analytically compute the certified radius of the smoothed classifier g, and follow-up works improved the training methods of the Gaussian smoothed model with labeled data (Salman et al., 2019; Zhai et al., 2021; Jeong & Shin, 2020; Horváth et al., 2022; Jeong et al., 2021) . Recently, Salman et al. (2020); Carlini et al. (2022) took the first step to train Gaussian smoothed classifiers with the help of self-supervised learning. Both approaches use a compositional model architecture for f and decompose the prediction process into two stages. In the first stage, a denoising ) . Given an unlabeled image, we corrupt the image by adding Gaussian noise to each pixel value and randomly masking several patches. The goal of the task is to train a model to reconstruct the clean image from the corrupted one. Similar to MAE, DMAE also intends to reconstruct the masked information; hence, it can capture relevant features of the image for downstream tasks. Furthermore, DMAE takes noisy patches as inputs and outputs denoised ones, making the learned features robust with respect to additive noises. We expect that the semantics and robustness of the representation can be learned simultaneously, enabling efficient utilization of the model parameters. Although the proposed DMAE method is simple, it yields significant performance improvement on downstream tasks. We pre-train DMAE ViT-Base and DMAE ViT-Large, use the encoder to initialize the Gaussian smoothed classifier, and fine-tune the parameters on ImageNet. We show that the DMAE ViT-Base model with 87M parameters, one-tenth as many as the model used in Carlini et al. ( 2022), achieves competitive or better certified accuracy in various settings. Furthermore, the DMAE ViT-Large model (304M) significantly surpasses the state-of-the-art results in all tasks, demonstrating a single-stage model is enough to learn robust representations with proper selfsupervised tasks. We also demonstrate that the pre-trained model has good transferability to other datasets. We empirically show that decent improvement can be obtained when applying it to the CIFAR-10 dataset. Model checkpoints will be released in the future. et al., 2018; Zhang et al., 2019) . However, as the generation process of adversarial examples is pre-



et al. (2013);Biggio et al. (2013)  observed that standardly trained neural networks are vulnerable to adversarial attacks. Since then, many works have investigated how to improve the robustness of the trained model. One of the most successful methods is adversarial training, which adds adversarial examples to the training set to make the learned model robust against such attacks (Madry

Certified accuracy (top-1) of different models onImageNet. Following Carlini et al.  (2022), for each noise level σ, we select the best certified accuracy from the original papers. * * denotes the best result, and * denotes the second best at each ℓ 2 radius. †Carlini et al. (2022) uses a diffusion model with 552M parameters and a BEiT-Large model with 305M parameters. It can be seen that our DMAE ViT-B/ViT-L models achieve the best performance in most of the settings. model is used to purify the noise-corrupted inputs. Then in the second stage, a classifier is applied to predict the label from the denoised image. Since the first-stage denoising model and the second-

