ADVERSARIALLY-TRAINED DEEP NETS TRANSFER BETTER: ILLUSTRATION ON IMAGE CLASSIFICATION

Abstract

Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. This process consists of taking a neural network pre-trained on a large feature-rich source dataset, freezing the early layers that encode essential generic image properties, and then fine-tuning the last few layers in order to capture specific information related to the target situation. This approach is particularly useful when only limited or weakly labeled data are available for the new task. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models, especially if only limited data are available for the new domain task. Further, we observe that adversarial training biases the learnt representations to retaining shapes, as opposed to textures, which impacts the transferability of the source models. Finally, through the lens of influence functions, we discover that transferred adversarially-trained models contain more human-identifiable semantic information, which explains -at least partly -why adversarially-trained models transfer better.

1. INTRODUCTION

While deep neural networks (DNNs) achieve state-of-the-art performance in many fields, they are known to require large quantities of reasonably high-quality labeled data, which can often be expensive to obtain. As such, transfer learning has emerged as a powerful methodology that can significantly ease this burden by enabling the user to adapt a pre-trained DNN to a range of new situations and domains (Bengio, 2012; Yosinski et al., 2014) . Models that are pre-trained on ImageNet (Deng et al., 2009) have excellent transfer learning capabilities after fine-tuning only a few of the last layers (Kornblith et al., 2019) on the target domain. Early work in transfer learning was motivated by the observation that humans apply previously learned knowledge to solve new problems with ease (Caruana, 1995) . With this motivation, learning aims to extract knowledge from one or more source tasks and apply the knowledge to a target task (Pan & Yang, 2009) . The main benefits include a reduction in the number of required labeled data points in the target domain (Gong et al., 2012; Pan & Yang, 2009) and a reduction in training costs as compared to training a model from scratch. However, in practice, transfer learning remains an "art" that requires domain expertise to tune the many knobs of the transfer process. An important consideration, for example, is which concepts or features are transferable from the source domain to the target domain. The features which are unique to a domain cannot be transferred, and so an important goal of transfer learning is to hunt for features shared across domains. It has recently been shown that adversarially-trained models (henceforth denoted as robust models) capture more robust features that are more aligned with human perception, compared to the seemingly patternless features (to humans, at least) of standard models (Ilyas et al., 2019) . Unfortunately, 2019) hypothesize that non-robust features that are lost during adversarially training may have a significant positive impact on generalization within a given dataset or domain. This inherently different feature representation between models constructed with adversarial training and models trained with standard methods would also explain why accuracy and robustness are at odds (Tsipras et al., 2019) . This leads to the question of whether models that use robust representations generalize better across domains. This is the main question we address. In this work, we demonstrate that robust models transfer better to new domains than natural models. To demonstrate this, we conduct an extensive number of transfer learning experiments across multiple domains (i.e., datasets), with various numbers of fine-tuned convolutional blocks and random subset sizes from the target dataset, where the critical variable is the constraint used to adversarially train the source model. (Described in detail in Sections 3 and Appendix A.3) Importantly, note that we do not use an adversarial training procedure for the actual transfer learning process. Our findings indicate that robust models have outstanding transfer learning characteristics across all configurations, where we measure the performance in terms of model accuracy on target datasets for varying numbers of training images and epochs. Figure 1 provides a summary of our approach. Our focus in this work is to show that robust source models learn representations that transfer better to new datasets on image recognition tasks. While adversarial training was proposed to combat adversarial attacks, our experiments discover an unintended but useful application. Adversarial training retains the robust features that are independent of the idiosyncrasies present in the source training data. Thus, these models exhibit worse generalization performance on the source domain, but better performance when transferred. This observation is novel, and we undertake extensive empirical studies to make the following contributions: • We discover that adversarially-trained source models obtain higher test accuracy than natural source models after fine-tuning with fewer training examples on the target datasets and over fewer training epochs. • We notice that the similarity between the source and target datasets affects the optimal number of fine-tuned blocks and the robustness constraint. • We show that adversarial training biases the learned representations to retain shapes instead of textures, impacting the source models' transferability. • We interpret robust representations using influence functions and observe that adversarially-trained source models better capture class-level semantic properties of the images, consistent with human concept learning and understanding.



Figure1: We demonstrate that adversarially-trained (i.e., robust) DNNs transfer better and faster to new domains with the process shown in (a): A ResNet50 is trained adversarially or non-adversarially (i.e., naturally) on the source dataset. Then, we fine-tune both of these source models on the target dataset. We hypothesize the robust features in robust models that encode more humanly perceptible representations, such as textures, strokes and lines, as seen in (b), are responsible for this phenomenon. See Appendix A.1 for details on how we generated the images in (b).

