GRADIENT ESTIMATION FOR UNSEEN DOMAIN RISK MINIMIZATION WITH PRE-TRAINED MODELS Anonymous

Abstract

Domain generalization aims to build generalized models that perform well on unseen domains when only source domains are available for model optimization. Recent studies have demonstrated that large-scale pre-trained models could play an important role in domain generalization by providing their generalization power. However, large-scale pre-trained models are not fully equipped with target task-specific knowledge due to a discrepancy between the pre-training objective and the target task. Although the task-specific knowledge could be learned from source domains by fine-tuning, this hurts the generalization power of the pretrained models because of gradient bias toward the source domains. To address this issue, we propose a new domain generalization method that estimates unobservable gradients that reduce potential risks in unseen domains, using a largescale pre-trained model. Our proposed method allows the pre-trained model to learn task-specific knowledge further while preserving its generalization ability with the estimated gradients. Experimental results show that our proposed method outperforms baseline methods on DOMAINBED, a standard benchmark in domain generalization. We also provide extensive analyses to demonstrate that the estimated unobserved gradients relieve the gradient bias, and the pre-trained model learns the task-specific knowledge without sacrificing its generalization power.

1. INTRODUCTION

Many machine learning studies assume that training and test data are independent and identically distributed (i.i.d). However, this i.i.d assumption does not always hold in real-world scenarios where distribution shifts between training and test data occur frequently. Thus, traditional machine learning models often show poor performance on unseen domains shifted from source (training) domains (Quinonero-Candela et al., 2008; Torralba & Efros, 2011) . To tackle this problem, domain generalization has attracted much attention recently. The main goal of domain generalization is to build generalized models that also perform the target task (e.g., classification) well on unseen domains (e.g., painted images) when only source domains (e.g., realistic images) are accessible during model optimization. Early domain generalization studies (Muandet et al., 2013; Ganin et al., 2016; Li et al., 2018b) 2022) utilize a frozen pre-trained model as a feature extractor. These studies have proven the usefulness of pre-trained models in domain generalization. However, the pre-trained models used in those studies cannot learn task-specific knowledge further since they are frozen during model optimization to preserve their generalization ability. To learn the task-specific knowledge, one can choose fine-tuning that updates all the parameters of pre-trained models by optimizing the models on the source domains.



have focused on learning domaininvariant representations across the source domains. However, Gulrajani & Lopez-Paz (2021) have recently shown that simple empirical risk minimization (ERM) (Vapnik, 1999) outperforms the previous methods on DOMAINBED, a benchmark for domain generalization, with pre-trained ResNet-50 (He et al., 2016). Moreover, Yu et al. (2021) provide empirical evidence that large-scale pretrained models could play an important role in domain generalization by providing their generalization power. Motivated by this, several studies have begun to leverage the generalization power of large-scale pre-trained models. Cha et al. (2022) employ a pre-trained model for regularization, considering it as an approximation of the oracle model on any domain, and Li et al. (

