PART-BASED MODELS IMPROVE ADVERSARIAL ROBUSTNESS

Abstract

We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. We believe that the richer form of annotation helps guide neural networks to learn more robust features without requiring more samples or larger models. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts and then classify the segmented object. Empirically, our part-based models achieve both higher accuracy and higher adversarial robustness than a ResNet-50 baseline on all three datasets. For instance, the clean accuracy of our part models is up to 15 percentage points higher than the baseline's, given the same level of robustness. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations. The code is publicly available at https://github.com/chawins/adv-part-model.

1. INTRODUCTION

As machine learning models are increasingly deployed in security or safety-critical settings, robustness becomes an essential property. Adversarial training (Madry et al., 2018) is the state-of-the-art method for improving the adversarial robustness of deep neural networks. Recent work has made substantial progress in robustness by scaling adversarial training to very large datasets. For instance, some defenses rely on aggressive data augmentation (Rebuffi et al., 2021) while others utilize a large quantity of extra data (Carmon et al., 2019) or even larger models (Gowal et al., 2021a) . These works fall in line with a recent trend of deep learning on "scaling up," i.e., training large models on massive datasets (Kaplan et al., 2020) . Unfortunately, progress has begun to stagnate here as we have reached a point of diminishing returns: for example, Gowal et al. (2021a) show that an exponential increase in model size and training samples will only yield a linear increase in robustness. Our work presents a novel alternative to improve adversarial training: we propose to utilize additional supervision that allows for a richer learning signal. We hypothesize that an auxiliary human-aligned learning signal will guide the model to learn more robust and more generalized features. To demonstrate this idea, we propose to classify images with a part-based model that makes predictions by recognizing the parts of the object in a bottom-up manner. We make use of images that are annotated with part segmentation masks. We propose a simple two-stage model that combines a segmentation model with a classifier. An image is first fed into the segmenter which outputs a pixel-wise segmentation of the object parts in a given input; this mask is then passed to a tiny classifier which predicts the class label based solely on this segmentation mask. The entire part-based model is trained end-to-end with a combination of segmentation and classification losses. Fig. 1 illustrates our model. The idea is that this approach may guide the model to attend more to global shape than to local fine-grained texture, hopefully yielding better robustness. We then combine this part-based architecture with adversarial training to encourage it to be robust against adversarial examples. We show that our model achieves strong levels of robustness on three percentage points higher adversarial accuracy compared to the baseline on Part-ImageNet (see Fig. 2 ). This improvement can be up to 25 percentage points in the other datasets we evaluate on (see Fig. 4 ). Alternatively, given the same level of adversarial robustness, our part models outperform the baseline by up to 15 percentage points on clean accuracy (see Table 1 ). Our part-based models also improve non-adversarial robustness, without any specialized training or data augmentation. Compared to a ResNet-50 baseline, our part models are more robust to synthetic corruptions (Hendrycks & Dietterich, 2019) as well as less biased toward non-robust "texture features" (Geirhos et al., 2019) . Additionally, since our part models can distinguish between the background and the foreground of an image, they are less vulnerable to distribution shifts in the background (Xiao et al., 2021) . These three robustness properties are all highly desirable and enabled by the part-level supervision. We believe that our part-based model is the first promising example of how a richer supervised training signal can substantially improve the robustness of neural networks.

2.1. ADVERSARIAL ROBUSTNESS

Adversarial training (Madry et al., 2018) has become a standard method for training robust neural networks against adversarial examples. Many improvements on this technique have been proposed (Zhang et al., 2019; Xie et al., 2019; Pang et al., 2019; Huang et al., 2020; Qin et al., 2019; Rice et al., 2020; Wong et al., 2020; Hendrycks et al., 2019; Kireev et al., 2021) . Among these, TRADES (Zhang et al., 2019) improves the trade-off between robustness and clean accuracy of adversarial training. More recent state-of-the-art methods focus on improving the adversarial robustness through scales. Carmon et al. (2019) and Gowal et al. (2021a) rely on a large number of unlabeled training data while others utilize large generative models for data augmentation (Rebuffi et al., 2021) or synthetically generating more training samples (Gowal et al., 2021b; Sehwag et al., 2022) . These works follow a recent trend of "large-scale learning from weak signals," which stemmed from recent progress on vision language models such as CLIP (Radford et al., 2021) . The improvement from scaling up, however, has started to reach its limit (Gowal et al., 2021a) . We take a different route to improve robustness. Our part-based models utilize supervision and high-quality part segmentation annotations to improve robustness without using more training samples or complex data augmentation.

2.2. PART-BASED MODELS

Part models generally refer to hierarchical models that recognize objects from their parts in a bottomup manner, e.g., Deformable Part Models (Endres et al., 2013; Felzenszwalb et al., 2010; Chen et al., 2014; Girshick et al., 2015; Cho et al., 2015) . Historically, they are most often used in human recognition (Chen & Yuille, 2014; Gkioxari et al., 2015; Xia et al., 2017; Ruan et al., 2019) and have shown success in fine-grained classification (Zhang et al., 2018; Bai et al., 2019) as well as pose estimation (Lorenz et al., 2019; Georgakis et al., 2019) . We revisit part-based models from the robustness perspective and design a general model that can be trained end-to-end without any feature engineering. Our technique is also agnostic to a particular type of object. Several works have explored part-based models in the context of adversarial robustness. Freitas et al. (2020) detect adversarial examples by using a Mask R-CNN to extract object parts and verify that



Figure 1: Our part-based model consists of (1) the part segmenter and (2) a tiny classifier. We train it for the object classification task end-to-end using part-level segmentation labels to improve its robustness.

Figure 2: Accuracy-robustness trade-off of our part model and the ResNet-50 baseline on the Part-ImageNet dataset.

