SIMPLE: SPECIALIZED MODEL-SAMPLE MATCHING FOR DOMAIN GENERALIZATION

Abstract

In domain generalization (DG), most existing methods aspire to fine-tune a specific pretrained model through novel DG algorithms. In this paper, we propose an alternative direction, i.e., to efficiently leverage a pool of pretrained models without fine-tuning. Through extensive empirical and theoretical evidence, we demonstrate that (1) pretrained models have possessed generalization to some extent while there is no single best pretrained model across all distribution shifts, and (2) out-of-distribution (OOD) generalization error depends on the fitness between the pretrained model and unseen test distributions. This analysis motivates us to incorporate diverse pretrained models and to dispatch the best matched models for each OOD sample by means of recommendation techniques. To this end, we propose SIMPLE, a specialized model-sample matching method for domain generalization. First, the predictions of pretrained models are adapted to the target domain by a linear label space transformation. A matching network aware of model specialty is then proposed to dynamically recommend proper pretrained models to predict each test sample. The experiments on DomainBed show that our method achieves significant performance improvements (up to 12.2% for individual dataset and 3.9% on average) compared to state-of-the-art (SOTA) methods and further achieves 6.1% gain via enlarging the pretrained model pool. Moreover, our method is highly efficient and achieves more than 1000× training speedup compared to the conventional DG methods with fine-tuning a pretrained model. Code and supplemental materials are available at https://seqml.github.io/simple.

1. INTRODUCTION

Distribution shift is a common problem in real-world applications, which breaks the independent and identically distributional (i.i.d.) assumption of machine learning algorithms (Wang et al., 2022) . Mismatches between training and test distributions, which are quite common in reality, can largely deteriorate model performance and make machine learning models infeasible for practical applications (González & Abu-Mostafa, 2015) . Therefore, enhancing the generalization ability of models has attracted increasing attention (Cha et al., 2021; Zhang et al., 2022) . For its practical significance, various methods have been proposed, e.g., domain alignment (Ganin et al., 2016; Gong et al., 2019; Arjovsky et al., 2019 ), meta-learning (Finn et al., 2017; Dou et al., 2019; Du et al., 2020) , and ensemble learning (Mancini et al., 2018; Cha et al., 2021; Arpit et al., 2021) . The effectiveness of DG algorithms is generally verified by fine-tuning a pre-trained ResNet (He et al., 2016) model with these algorithms (Gulrajani & Lopez-Paz, 2020). It has demonstrated that these algorithms improve upon empirical risk minimization (ERM) baseline on ResNet-50 backbone (Arpit et al., 2021; Wiles et al., 2021) . Meanwhile, recent studies show that neural architectures and pretraining methods have a large impact on the model robustness to distribution shifts (Radford et al., 2021; Wiles et al., 2021) . For example, vision transformers are more robust to texture and style shifts compared with ResNet-based models (Zhang et al., 2022) , which are instead superior to transformer-based models on dense image classification tasks (Liu et al., 2022) . In terms of pretraining, using pretraining datasets other than ImageNet-1k improves the generalization performance in one test domain, yet leads to performance degradation in another (Kim et al., 2022) .

