RADEMACHER COMPLEXITY OVER H∆H CLASS FOR ADVERSARIALLY ROBUST DOMAIN ADAPTATION

Abstract

In domain adaptation, a model is trained on a dataset generated from a source domain and its generalization is evaluated on a possibly different target domain. Understanding the generalization capability of the learned model is a longstanding question. Recent studies demonstrated that the adversarial robust learning under ℓ ∞ attack is even harder to generalize to different domains. To thoroughly study the fundamental difficulty behind adversarially robust domain adaptation, we propose to analyze a key complexity measure that controls the cross-domain generalization: the adversarial Rademacher complexity over H∆H class. For linear models, we show that adversarial Rademacher complexity over H∆H class is always greater than the non-adversarial one, which reveals the intrinsic hardness of adversarially robust domain adaptation. We also establish upper bounds on this complexity measure, and extend them to the ReLU neural network class as well. Finally, based on our adversarially robust domain adaptation theory, we explain how adversarial training helps transferring the model performance to different domains. We believe our results initiate the study of the generalization theory of adversarially robust domain adaptation, and could shed lights on distributed adversarially robust learning from heterogeneous sources -a scenario typically encountered in federated learning applications.

1. INTRODUCTION

Domain adaptation is a key learning scenario where one tries to generalize the model learnt on a source domain to a target domain. How to predict target accuracy using source accuracy has been a longstanding research topic in both theory Ben-David et al. (2006) ; Quinonero-Candela et al. (2008) ; Ben-David et al. (2010) ; Mansour et al. (2009) ; Cortes et al. (2015) ; Zhang et al. (2019; 2020) and application community Long et al. (2015) ; Saito et al. (2018) ; You et al. (2019) . From a theoretical perspective, this problem can be attacked by establishing bounds on the generalization of the source-domain-learnt model on target domain, using different complexity measures including the VC-dimension Ben-David et al. (2006; 2010) 2009)). Let hypothesis space H be a set of real (vector)-valued functions defined over input space X and label space Y: H = {h w : X → Y} each parameterized by w ∈ W ⊆ R d , and ℓ : Y × Y → R + be the loss function. Given a dataset {x 1 , ..., x n } sampled i.i.d. from distribution D defined over X , the empirical Rademacher complexity of H∆H over this dataset is defined as follows: RD (ℓ • H∆H) = E σ sup hw,h w ′ ∈H 1 n n i=1 σ i ℓ(h w (x i ), h w ′ (x i )) , where σ 1 , . . . , σ n are i.i.d. Rademacher random variables with P{σ i = 1} = P{σ i = -1} = 1 2 . Intuitively, above quantity measures how well the loss vector realized by two hypotheses within H correlates with random vectors. The better correlation will imply a richer hypothesis class. However, unlike the classical Rademacher complexity whose loss vector is computed between predictions made by a hypothesis and true labels, Eq. ( 1) is defined merely over predictions made by two 2021), robust model learnt on source domain will lose its robustness catastrophically on a different domain. That is, the gap between robust risks on the old domain and new domains can be dramatically huge, compared to the standard risk. This observation naturally leads to the question Why is the robust risk harder to adapt to different domains?, which we aim to examine in this paper. To answer this question, inspired by the Rademacher complexity over H∆H function class, we properly extend this complexity measure to the adversarial learning setting, and propose the adversarial Rademacher complexity over the H∆H class. We show that, the adversarial version complexity is always greater than its non-adversarial counterpart, similar to the results proven in Yin et al. ( 2019) in the single domain setting. Relying on this new complexity measure, for the first time, we characterize the generalization bound of adversarially robust learning between source and target domain. 2021) also show that, the model trained adversarially on the source domain, usually entails better standard accuracy on target domain, compared to the normally trained model. In this paper, by further exploring our generalization bound, we show that given large enough adversarial budget, small source adversarially robust risk will almost guarantee small target domain standard risk, with the residual error controlled by ϵ. This connection between source robust risk and target standard risk theoretically supports the advantage of performing robust training in domain adaptation tasks. Our contributions are summarized as follow: • We study the Rademacher complexity over H∆H class, and propose the adversarial variant of it, which is a new complexity measure towards better understanding the domain adaptation in adversarial learning. In both linear classification and regression settings, we first show that adversarial Rademacher complexity over H∆H class is greater than its non-adversarial counterpart. We also show that adversarial complexity is smaller than its non-adversarial counterpart plus residual terms polynomially depending on data dimension, model norm and adversarial budget. • We generalize our results to ReLU neural networks, where we derive the similar upper bounds of adversarial H∆H Rademacher complexity of a 2-layer ReLU neural network. • We also establish the connection between robust learning and standard domain adaptation, which helps explain the widely-observed phenomena that adversarially trained models can have good generalization performance on different domains. • We support our theoretical analysis by providing experiments illustrating how adversarial training can help domain adaptation, especially with ℓ 1 regularization. We also highlight numerically the difficulty of transferring adversarial robustness across domains.

2. PROBLEM SETUP

We adapt the following notations throughout this paper. We use lower case bold letter to denote vector, e.g., w, and use upper case bold letter to denote matrix, e.g., M. We use ∥w∥ p and ∥M∥ p to denote ℓ p -norm of vector w and matrix M respectively. We define the (p, q)-group norm as the ∥M∥ p,q := ∥(∥m 1 ∥ p , . . . , ∥m n ∥ p ) ⊤ ∥ q where the m i s are the columns of M.



; Zhang et al. (2020) and Rademacher complexity Mansour et al. (2009); Zhang et al. (2019). In particular, the latter works Mansour et al. (2009); Zhang et al. (2019) rely on the Rademacher complexity over a so-called H∆H function class to bound the gap between source and target generalization risks: Definition 1 ( Mansour et al. (

hypotheses. Authors ofMansour et al. (2009);Zhang et al. (2019)  have shown that this complexity measure controls the domain adaptation generalization bound. Unfortunately, none of those works give the precise analysis of RD (ℓ • H∆H). To our best knowledge,Kuroki et al. (2019)  is the only prior work to analyze RD (ℓ • H∆H) on linear classifier class, but their analysis is not tight. Due to the importance of such complexity measure, we are interested in characterizing how large this complexity measure can be in terms of model dimension and data diversity, even on some toy model, e.g., linear model. Hence, the first question we investigate in this paper is: for linear models, what quantities control the Rademacher complexity over H∆H function class? Meanwhile, in modern machine learning, practitioners are not only interested in transferring standard model accuracy to another domain, but also in transferring robustness. Consider adversarially robust risk over domain D: R adv-label D (h w , y D ) = E x∼D max ∥δ∥∞≤ϵ ℓ(h w (x + δ), y D (x)) , where y D (•) is the labeling function. In the adversarially robust domain adaptation problem, we are curious about the robust risk when the same model h w is tested on the new domain D ′ . Unfortunately, as shown empirically Shafahi et al. (2019); Hong et al. (2021); Fan et al. (

Recent studies Salman et al. (2020); Deng et al. (

