PART-BASED MODELS IMPROVE ADVERSARIAL ROBUSTNESS

Abstract

We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. We believe that the richer form of annotation helps guide neural networks to learn more robust features without requiring more samples or larger models. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts and then classify the segmented object. Empirically, our part-based models achieve both higher accuracy and higher adversarial robustness than a ResNet-50 baseline on all three datasets. For instance, the clean accuracy of our part models is up to 15 percentage points higher than the baseline's, given the same level of robustness. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations. The code is publicly available at https://github.com/chawins/adv-part-model.

1. INTRODUCTION

As machine learning models are increasingly deployed in security or safety-critical settings, robustness becomes an essential property. Adversarial training (Madry et al., 2018) is the state-of-the-art method for improving the adversarial robustness of deep neural networks. Recent work has made substantial progress in robustness by scaling adversarial training to very large datasets. For instance, some defenses rely on aggressive data augmentation (Rebuffi et al., 2021) while others utilize a large quantity of extra data (Carmon et al., 2019) or even larger models (Gowal et al., 2021a) . These works fall in line with a recent trend of deep learning on "scaling up," i.e., training large models on massive datasets (Kaplan et al., 2020) . Unfortunately, progress has begun to stagnate here as we have reached a point of diminishing returns: for example, Gowal et al. (2021a) show that an exponential increase in model size and training samples will only yield a linear increase in robustness. Our work presents a novel alternative to improve adversarial training: we propose to utilize additional supervision that allows for a richer learning signal. We hypothesize that an auxiliary human-aligned learning signal will guide the model to learn more robust and more generalized features. To demonstrate this idea, we propose to classify images with a part-based model that makes predictions by recognizing the parts of the object in a bottom-up manner. We make use of images that are annotated with part segmentation masks. We propose a simple two-stage model that combines a segmentation model with a classifier. An image is first fed into the segmenter which outputs a pixel-wise segmentation of the object parts in a given input; this mask is then passed to a tiny classifier which predicts the class label based solely on this segmentation mask. The entire part-based model is trained end-to-end with a combination of segmentation and classification losses. Fig. 1 illustrates our model. The idea is that this approach may guide the model to attend more to global shape than to local fine-grained texture, hopefully yielding better robustness. We then combine this part-based architecture with adversarial training to encourage it to be robust against adversarial examples. We show that our model achieves strong levels of robustness on three realistic datasets: Part-ImageNet (He et al., 2021 ), Cityscapes (Meletis et al., 2020 ), and PASCAL-Part (Chen et al., 2014) . Our part-based models outperform the ResNet-50 baselines on both clean and adversarial accuracy simultaneously. For any given value of clean accuracy, our part models achieve more than 10

