DOES ADVERSARIAL TRANSFERABILITY INDICATE KNOWLEDGE TRANSFERABILITY?

Abstract

Despite the immense success that deep neural networks (DNNs) have achieved, adversarial examples, which are perturbed inputs that aim to mislead DNNs to make mistakes, have recently led to great concerns. On the other hand, adversarial examples exhibit interesting phenomena, such as adversarial transferability. DNNs also exhibit knowledge transfer, which is critical to improving learning efficiency and learning in domains that lack high-quality training data. To uncover the fundamental connections between these phenomena, we investigate and give an affirmative answer to the question: does adversarial transferability indicate knowledge transferability? We theoretically analyze the relationship between adversarial transferability and knowledge transferability, and outline easily checkable sufficient conditions that identify when adversarial transferability indicates knowledge transferability. In particular, we show that composition with an affine function is sufficient to reduce the difference between the two models when they possess high adversarial transferability. Furthermore, we provide empirical evaluation for different transfer learning scenarios on diverse datasets, showing a strong positive correlation between the adversarial transferability and knowledge transferability, thus illustrating that our theoretical insights are predictive of practice.

1. INTRODUCTION

Knowledge transferability and adversarial transferability are two fundamental properties when a learned model transfers to other domains. Knowledge transferability, also known as learning transferability, has attracted extensive studies in machine learning. Long before it was formally defined, the computer vision community has exploited it to perform important visual manipulations (Johnson et al., 2016) , such as style transfer and super-resolution, where pretrained VGG networks (Simonyan & Zisserman, 2014) are utilized to encode images into semantically meaningful features. After the release of ImageNet (Russakovsky et al., 2015) , pretrained ImageNet models (e.g., on TensorFlow Hub or PyTorch-Hub) has quickly become the default option for the transfer source, because of its broad coverage of visual concepts and compatibility with various visual tasks (Huh et al., 2016) . Adversarial transferability, on the other hand, is a phenomenon that adversarial examples can not only attack the model they are generated against, but also affect other models (Goodfellow et al., 2014; Papernot et al., 2016) . Thus, adversarial transferability is extensively exploited to inspire black-box attacks (Ilyas et al., 2018; Liu et al., 2016) . Many theoretical analyses have been conducted to establish sufficient conditions of adversarial transferability (Demontis et al., 2019; Ma et al., 2018) . Knowledge transferability and adversarial transferability both reveal some nature of machine learning models and the corresponding data distributions. Particularly, the relation between these two phenomena interests us the most. We begin by showing that adversarial transferability can indicate knowledge transferability. This tie can potentially provide a similarity measure between data distributions, an identifier of important features focused by a complex model, and an affinity map between complicated tasks. Thus, we believe our results have further implications in model interpretability and verification, fairness, robust and efficient transfer learning, and etc. To the best of our knowledge, this is the first work studying the fundamental relationship between adversarial transferability and knowledge transferability both theoretically and empirically. Our main contributions are as follows. • We formally define two quantities, τ 1 and τ 2 , to measure adversarial transferability from different aspects, which enables in-depth understanding of adversarial transferability from a geometric point of view in the feature representation space. • We derive an upper bound for knowledge transferability with respect to adversarial transferability. We rigorously depict their underlying relation and show that adversarial transferability can indicate knowledge transferability. • We conduct thorough controlled experiments for diverse knowledge transfer scenarios (e.g. knowledge transfer among data distributions, attributes, and tasks) on benchmark datasets including STL-10, CIFAR-10, CelebA, Taskonomy-data, and four language datasets. Our empirical results show strong positive correlation between adversarial and knowledge transferability, which validates our theoretical prediction.

2. RELATED WORK

Knowledge transferability has been widely applied in scenarios where the available data for certain domain is limited, and has achieved great success (Van Opbroek et al., 2014; Wurm et al., 2019; Wang et al., 2017; Kim & Park, 2017; Maqueda et al., 2018; Devlin et al., 2018) . Several studies have been conducted to understand the factors that affect knowledge transferability (Yosinski et al., 2014; Long et al., 2015b; Wang et al., 2019; Xu et al., 2019; Shinya et al., 2019) . Empirical observations show that the correlation between learning tasks (Achille et al., 2019; Zamir et al., 2018) , the similarity of model architectures, and data distribution are all correlated with different knowledge transfer effects. Adversarial Transferability has been observed by several works (Papernot et al., 2016; Goodfellow et al., 2014; Joon Oh et al., 2017) . Since the early work, a lot of studies have been conducted, aiming to further understand the phenomenon and design more transferable adversarial attacks. Regardless of the threat model, a lot of attack methods have been proposed to boost adversarial transferability (Zhou et al., 2018; Demontis et al., 2019; Dong et al., 2019; Xie et al., 2019) . Naseer et al. (2019) propose to produce adversarial examples that transfer cross-domain via a generative adversarial network. In addition to the efficacy, efficiency (Ilyas et al., 2018) and practicality (Papernot et al., 2017) are also optimized. Beyond the above empirical studies, there is some work dedicated to analyzing this phenomenon, showing different conditions that may enhance adversarial transferability (Athalye et al., 2018; Tramèr et al., 2017; Ma et al., 2018; Demontis et al., 2019) . Building upon these observations, it is clear that there exist certain connections between adversarial transferability and other knowledge transfer scenarios, and here we aim to provide the first theoretic justification to verify it and design systematic empirical studies to measure such correlation.

3. ADVERSARIAL TRANSFERABILITY VS. KNOWLEDGE TRANSFERABILITY

In this section, we establish connections between adversarial examples and knowledge transferability rigorously. We first formally state the problem studied in this section. Then, we move on to subsection 3.1 to introduce two metrics that encode information about adversarial attacks. Finally, we present our theoretical results about the relationship between adversarial and knowledge transferability in subsection 3.2. Notations. We use blackboard bold to denote sets, e.g., R. We use calligraphy to denote distributions, e.g., D. The support of a distribution D is denoted as supp(D). We use bold lower case letters to denote vectors, e.g., x ∈ R n . We use bold uppercase letter to denote a matrix, e.g., A. We use A † to denote the Moore-Penrose inverse of matrix A. We use • to denote the composition of functions, i.e., g • f (x) = g(f (x)). We use • 2 to denote Euclidean norm induced by standard inner product •, • . Given a function f , we use f (x) to denote its evaluated value at x, and we use f to represent this function in function space. We use •, • D to denote inner product induced by distribution D, i.e., f 1 , f 2 D = E x∼D f 1 (x), f 2 (x) . Accordingly, we use • D to denote a norm induced by inner product •, • D , i.e., f D = f, f D . For a matrix function F : supp(D) → R d×m , we define its L 2 (D)-norm in accordance with matrix 2-norm as F D,2 = E x∼D F (x) 2 2 . We define projection operator proj(•, r) to project a matrix to a hyperball of spectral norm radius r, i.e., proj(A, r) = A, if A 2 ≤ r rA/ A 2 if A 2 > r .

