TOWARDS ESTIMATING TRANSFERABILITY USING HARD SUBSETS

Abstract

As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine-tuning. In this work, we propose HASTE (HArd Subset TransfErability), a new strategy to estimate the transferability of a source model to a particular target task using only a harder subset of target data. By leveraging the model's internal and output representations, we introduce two techniques -one class-agnostic and another class-specific -to identify harder subsets and show that HASTE can be used with any existing transferability metric to improve their reliability. We further analyze the relation between HASTE and the optimal average log-likelihood as well as negative conditional entropy and empirically validate our theoretical bounds. Our experimental results across multiple source model architectures, target datasets, and transfer learning tasks show that HASTE-modified metrics are consistently better or on par with the state-of-the-art transferability metrics. Our code is available here.

1. INTRODUCTION

Transfer learning (Pan & Yang, 2009; Torrey & Shavlik, 2010; Weiss et al., 2016) aims to improve the performance of models on target tasks by utilizing the knowledge from source tasks. With the increasing development of large-scale pre-trained models (Devlin et al., 2019; Chen et al., 2020a; b; Radford et al., 2021b) , and the availability of multiple model choices (e.g model hubs of Pytorch, Tensorflow, Hugging Face) for transfer learning, it is critical to estimate their transferability without training on the target task and determine how effectively transfer learning algorithms will transfer knowledge from the source to the target task. To this end, transferability estimation metrics (Zamir et al., 2018b; Achille et al., 2019; Tran et al., 2019b; Pándy et al., 2022; Nguyen et al., 2020) have been recently proposed to quantify how easy it is to use the knowledge learned from these models with minimal to no additional training using the target dataset. Given multiple pre-trained source models and target datasets, estimating transferability is essential because it is non-trivial to determine which source model transfers best to a target dataset, and that training multiple models using all source-target combinations can be computationally expensive. Recent years have seen a few different approaches (Zamir et al., 2018b; Achille et al., 2019; Tran et al., 2019b; Pándy et al., 2022; Nguyen et al., 2020) for estimating a given transfer learning task from a source model. However, existing such methods often require performing the transfer learning task for parameter optimization (Achille et al., 2019; Zamir et al., 2018b) or making strong assumptions on the source and target datasets (Tran et al., 2019b; Zamir et al., 2018b) . In addition, they are limited to estimating transferability on specific source architectures (Pándy et al., 2022) or achieve lower performance when there are large domain differences between the source and target dataset (Nguyen et al., 2020) . This has recently led to the questioning of the applicability of such metrics beyond specific settings (Agostinelli et al., 2022a) . Prior works in other contexts (Khan et al., 2018; Agarwal et al., 2022; Zhang et al., 2021b; Khan et al., 2018; Soviany et al., 2022; D'souza et al., 2021) show that machine learning (ML) models find some samples easier to learn while others are much harder. In this work, we observe and leverage a similar phenomenon in transfer learning tasks (Figure 1a ), where images belonging to the harder subset of the target dataset achieve lower prediction accuracy than images from the easy subset. The key principle is that easy samples do not contribute much when comparing the performance of a pre-trained model on multiple datasets or ranking the performance of different models on a given dataset. Additionally, in Figure 1b , we observe qualitatively that easy examples of the target dataset (Caltech101) comprise images that are in-distribution as compared to the source dataset (ImageNet), whereas images from the harder subset contain out-of-distribution clip art images that are not present in the source dataset and, hence, may be more challenging in the transfer learning process. Present work. In this work, we incorporate the aforementioned observation and propose a novel framework, HASTEfoot_0 (HArd Subset TransfErability), to estimate transferability by only using the hardest subset of the target dataset. More specifically, we introduce two complementary techniques -class-agnostic and class-specific -to identify harder subsets from the target dataset using the model's internal and output representations (Section 4.1). Further, we theoretically and empirically show that HASTE transferability metrics inherit the properties of its baseline metric and achieve tighter lower and upper bounds (Section 4.2). We perform experiments across a range of transfer learning tasks like source architecture selection (Section 5.1), target dataset selection (Section 5.2), and ensemble model selection (Section 5.4), as well as on other tasks such as semantic segmentation (Section 5.3) and language models (Section 5.5). Our results show that HASTE scores better correlate with the actual transfer accuracy than their corresponding counterparts (Nguyen et al., 2020; Tran et al., 2019a; Pándy et al., 2022) . Finally, we establish that our findings are agnostic to the choice of source architecture for identifying harder subsets, scale to transfer learning tasks for different data domains and that utilizing the hardest subsets can be highly beneficial for estimating transferability.

2. RELATED WORK

This work lies at the intersection of transfer learning and diverse metrics to estimate transferability from a source model to a target dataset. We discuss related works for each of these topics below. Transfer Learning (TL). It can be organized into three broad categories: i) Inductive Transfer (Erhan et al., 2010; Yosinski et al., 2014) , which leverages inductive bias, ii) Transductive Transfer, which is commonly known as Domain Adaptation (Wang & Deng, 2018; Wilson & Cook, 2020) , and iii) Task Transfer (Zamir et al., 2018a; Pal & Balasubramanian, 2019) , which transfers between different tasks instead of models. Amongst this, the most common form of a transfer learning task is fine-tuning a pre-trained source model for a given target dataset. For instance, recent works have demonstrated the use of large-scale pre-trained models such as CLIP (Radford et al., 2021a) and VirTex (Desai & Johnson, 2021) for learning representations for different source tasks. Transferability Metrics. Despite the development of a plethora of source models, achieving an optimal transfer for a given target task is still a nascent research area as it is non-trivial to identify the source model or dataset for efficient TL. Transferability metrics are used as proxy scores to



Code: https://anonymous.4open.science/r/haste/



Figure 1: Analyzing the impact of hard subsets in transfer learning. Column (a): Results show the accuracy of different bins of a target dataset (Caltech101) based on their hardness. Across two source models (VGG-19 and ResNet-18) trained on the ImageNet dataset, we observe that the accuracy for images in the hardest subset (B1) is lower as compared to the easier subset (B5).Column (b): Top-10 images from hard and easy subsets show that harder subsets comprise images (cliparts) that are out-of-distribution when compared to the source dataset images. See Figures5-9for more qualitative images for different source-target pairs.

