RADEMACHER COMPLEXITY OVER H∆H CLASS FOR ADVERSARIALLY ROBUST DOMAIN ADAPTATION

Abstract

In domain adaptation, a model is trained on a dataset generated from a source domain and its generalization is evaluated on a possibly different target domain. Understanding the generalization capability of the learned model is a longstanding question. Recent studies demonstrated that the adversarial robust learning under ℓ ∞ attack is even harder to generalize to different domains. To thoroughly study the fundamental difficulty behind adversarially robust domain adaptation, we propose to analyze a key complexity measure that controls the cross-domain generalization: the adversarial Rademacher complexity over H∆H class. For linear models, we show that adversarial Rademacher complexity over H∆H class is always greater than the non-adversarial one, which reveals the intrinsic hardness of adversarially robust domain adaptation. We also establish upper bounds on this complexity measure, and extend them to the ReLU neural network class as well. Finally, based on our adversarially robust domain adaptation theory, we explain how adversarial training helps transferring the model performance to different domains. We believe our results initiate the study of the generalization theory of adversarially robust domain adaptation, and could shed lights on distributed adversarially robust learning from heterogeneous sources -a scenario typically encountered in federated learning applications.

1. INTRODUCTION

Domain adaptation is a key learning scenario where one tries to generalize the model learnt on a source domain to a target domain. How to predict target accuracy using source accuracy has been a longstanding research topic in both theory Ben-David et al. (2006) 2009)). Let hypothesis space H be a set of real (vector)-valued functions defined over input space X and label space Y: H = {h w : X → Y} each parameterized by w ∈ W ⊆ R d , and ℓ : Y × Y → R + be the loss function. Given a dataset {x 1 , ..., x n } sampled i.i.d. from distribution D defined over X , the empirical Rademacher complexity of H∆H over this dataset is defined as follows: RD (ℓ • H∆H) = E σ sup hw,h w ′ ∈H 1 n n i=1 σ i ℓ(h w (x i ), h w ′ (x i )) , where σ 1 , . . . , σ n are i.i.d. Rademacher random variables with P{σ i = 1} = P{σ i = -1} = 1 2 . Intuitively, above quantity measures how well the loss vector realized by two hypotheses within H correlates with random vectors. The better correlation will imply a richer hypothesis class. However, unlike the classical Rademacher complexity whose loss vector is computed between predictions made by a hypothesis and true labels, Eq. ( 1) is defined merely over predictions made by two



; Quinonero-Candela et al. (2008); Ben-David et al. (2010); Mansour et al. (2009); Cortes et al. (2015); Zhang et al. (2019; 2020) and application community Long et al. (2015); Saito et al. (2018); You et al. (2019). From a theoretical perspective, this problem can be attacked by establishing bounds on the generalization of the source-domain-learnt model on target domain, using different complexity measures including the VC-dimension Ben-David et al. (2006; 2010); Zhang et al. (2020) and Rademacher complexity Mansour et al. (2009); Zhang et al. (2019). In particular, the latter works Mansour et al. (2009); Zhang et al. (2019) rely on the Rademacher complexity over a so-called H∆H function class to bound the gap between source and target generalization risks: Definition 1 ( Mansour et al. (

