BALANCING ROBUSTNESS AND SENSITIVITY USING FEATURE CONTRASTIVE LEARNING

Abstract

It is generally believed that robust training of extremely large networks is critical to their success in real-world applications. However, when taken to the extreme, methods that promote robustness can hurt the model's sensitivity to rare or underrepresented patterns. In this paper, we discuss this trade-off between robustness and sensitivity by introducing two notions: contextual feature utility and contextual feature sensitivity. We propose Feature Contrastive Learning (FCL) that encourages the model to be more sensitive to the features that have higher contextual utility. Empirical results demonstrate that models trained with FCL achieve a better balance of robustness and sensitivity, leading to improved generalization in the presence of noise.

1. INTRODUCTION

Deep learning has shown unprecedented success in numerous domains (Krizhevsky et al., 2012; Szegedy et al., 2015; He et al., 2016; Hinton et al., 2012; Sutskever et al., 2014; Devlin et al., 2018) , and robustness plays a key role in the success of neural networks. When we seek robustness, we are interested in having the same model prediction for small perturbations of the inputs. However such invariance to small perturbations can prove detrimental in some cases. As an extreme example, it is sometimes possible that a small perturbation to the input changes the human perceived class label, but the model is insensitive to this change (Tramèr et al., 2020) . In this paper, we focus on balancing this tradeoff between robustness and sensitivity by developing a contrastive learning method that promotes the change in model prediction for certain perturbations, and inhibits the change for certain other perturbations. Note that we are only referring to non-adversarial robustness in this paper, i.e., we are not making any effort to improve robustness to carefully designed adversarial perturbations (Goodfellow et al., 2014) . To develop algorithms that balance robustness and sensitivity, we first formalize two measures: utility and sensitivity. Utility refers to the change in the loss function when we perturb a specific input feature. In other words, whether an input feature is useful for the model's prediction. Sensitivity, on the other hand, is the change in the learned embedding representation (before computing the loss) when we perturb a specific input feature. In contrast to classical feature selection approaches (Guyon & Elisseeff, 2003; Yu & Liu, 2004 ) that identify relevant and important features, our notions of sensitivity and utility are context dependent and change from one image to another. Our goal is to ensure that if an input feature has high utility, the model will also be sensitive to it, and if it has low utility then the model won't. To explore and illustrate the notions of utility and sensitivity, we introduce a synthetic MNIST dataset, as shown in Figure 1 . In the standard MNIST, the goal is to classify 10 digits based on their appearance. We modify it by adding a small random digit in the corner of some of the images and increasing the number of classes by five. For digits 5-9 we never change the class labels even in the presence of a corner digit, whereas digits 0-4 move to extended class labels 10-14 in the presence of any corner digit. The small corner digits can have high or low utility depending on the context. If the digit in the center is in 5-9 the corner digit has no bearing on the class, and will have low utility. However, if the digit in the center of the image is in 0-4, the presence of a corner digit is essential to determining the label, and thus has high utility. We would like to promote model sensitivity to the small corner digits when they are informative, in order to improve predictions, but demote it when they are not, in order to improve robustness. Figure 1 : Synthetic MNIST data. We synthesize new images by adding a scaled down version of a random digit to a random corner. Images synthesized from digits 5-9 keep their label (Figure 1b ) while images synthesized from digits 0-4 are considered to be of a different class (Figure 1c ). In this setup corner pixels are informative only in a certain context. Feature attribution methods. Our notions of utility and sensitivity are related to feature attribution methods. Given an instance x and a model f , feature based explanation aims to attribute the prediction of f (x) to each feature. There have been two different approaches to understand the role of features. In the former, we compute the derivative of f (x) with respect to each feature, which is similar to the sensitivity measure proposed in this paper (Shrikumar et al., 2017; Smilkov et al., 2017; Simonyan et al., 2013; Sundararajan et al., 2016) . The latter methods measure the importance by removing a feature or comparing it with a reference point (Samek et al., 2016; Fong & Vedaldi, 2017; Dabkowski & Gal, 2017; Ancona et al., 2018; Yeh et al., 2019; Zeiler & Fergus, 2014; Zintgraf et al., 2017) . For example, the idea of prediction difference analysis is to study the regions in the input image that provide the best evidence for a specific class or object by studying how the prediction changes in the absence of a specific feature. While many of the existing methods look at the interpretability of the model predictions, our work proposes loss functions in the training stage to adjust the sensitivity according to their utility in a context-dependent manner. Robustness. It is widely believed that imposing robustness constraints or regularization to neural networks can improve their performance. Taking the idea of robustness to the extreme, adversarial training algorithms aim to make neural networks robust to any perturbation within an -ball (Goodfellow et al., 2014; Madry et al., 2017) . The certified defense methods pose an even stronger constraint in training, i.e., the improved robustness has to be verifiable (Wong & Kolter, 2018; Zhang et al., 2019b) . Despite being successful in boosting accuracy under adversarial attacks, they come at the cost of significantly degrading clean accuracy (Madry et al., 2017; Zhang et al., 2019a; Wang & Zhang, 2019) . Several theoretical works have demonstrated that a trade-off between adversarial robustness and generalization exists (Tsipras et al., 2018; Schmidt et al., 2018) . Recent papers (Laugros et al., 2019; Gulshad et al., 2020 ) also discuss the particular relationship between adversarial robustness and natural perturbation robustness, and find that they are usually poorly correlated. For example, Laugros et al. (2019) shows models trained for adversarial robustness that are not more robust than standard models on common perturbation benchmarks and the converse holds as well. (Gulshad et al., 2020 ) also found a similar trend while natural robustness can commonly improve adversarial robustness slightly. While adversarial robustness is important in its own way, this paper mainly focus on natural perturbation robustness. In fact, our goal of "making models sensitive to important features" implies that the model should not be adversarially robust on high utility features. With the goal of improving generalization instead of adversarial robustness, several other works enforce a weaker notion of robustness. A simple approach is to add Gaussian noise to the input features in the training phase. Lopes et al. (Lopes et al., 2019) recently showed that Gaussian data augmentation with randomly chosen patches can improve generalization. (Xie et al., 2020) showed that adversarial training with a dual batch normalization approach can improve the performance of neural networks. It is worth noting a closely related work (Kim et al., 2020) , which also employs contrastive learning for robustness (See Section 3 for details); however, it differs in three main aspects: a) their paper focuses on adversarial robustness while ours focuses on robustness to natural perturbations b) their contrastive learning always suppresses the distance between the original and an adversarially perturbed image while our proposal encourages to differ for high-utility perturbation pairs and suppress for the low-utility pairs c) their perturbation is based on an unsupervised loss, while we rely on class labels to identify low and high utility features with respect to the classification task. In summary, all the previous works in robust training aim to make the model insensitive to perturbation, while we argue that a good model (with better generalization performance) should be robust to

