ADVERSARIALLY-TRAINED DEEP NETS TRANSFER BETTER: ILLUSTRATION ON IMAGE CLASSIFICATION

Abstract

Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. This process consists of taking a neural network pre-trained on a large feature-rich source dataset, freezing the early layers that encode essential generic image properties, and then fine-tuning the last few layers in order to capture specific information related to the target situation. This approach is particularly useful when only limited or weakly labeled data are available for the new task. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models, especially if only limited data are available for the new domain task. Further, we observe that adversarial training biases the learnt representations to retaining shapes, as opposed to textures, which impacts the transferability of the source models. Finally, through the lens of influence functions, we discover that transferred adversarially-trained models contain more human-identifiable semantic information, which explains -at least partly -why adversarially-trained models transfer better.

1. INTRODUCTION

While deep neural networks (DNNs) achieve state-of-the-art performance in many fields, they are known to require large quantities of reasonably high-quality labeled data, which can often be expensive to obtain. As such, transfer learning has emerged as a powerful methodology that can significantly ease this burden by enabling the user to adapt a pre-trained DNN to a range of new situations and domains (Bengio, 2012; Yosinski et al., 2014) . Models that are pre-trained on ImageNet (Deng et al., 2009) have excellent transfer learning capabilities after fine-tuning only a few of the last layers (Kornblith et al., 2019) on the target domain. Early work in transfer learning was motivated by the observation that humans apply previously learned knowledge to solve new problems with ease (Caruana, 1995). With this motivation, learning aims to extract knowledge from one or more source tasks and apply the knowledge to a target task (Pan & Yang, 2009) . The main benefits include a reduction in the number of required labeled data points in the target domain (Gong et al., 2012; Pan & Yang, 2009) and a reduction in training costs as compared to training a model from scratch. However, in practice, transfer learning remains an "art" that requires domain expertise to tune the many knobs of the transfer process. An important consideration, for example, is which concepts or features are transferable from the source domain to the target domain. The features which are unique to a domain cannot be transferred, and so an important goal of transfer learning is to hunt for features shared across domains. It has recently been shown that adversarially-trained models (henceforth denoted as robust models) capture more robust features that are more aligned with human perception, compared to the seemingly patternless features (to humans, at least) of standard models (Ilyas et al., 2019) . Unfortunately, * Equal contribution 1

