REVISIT FINETUNING STRATEGY FOR FEW-SHOT LEARN-ING TO TRANSFER THE EMDEDDINGS Anonymous authors Paper under double-blind review

Abstract

Few-Shot Learning (FSL) aims to learn a simple and effective bias on limited novel samples. Recently, many methods have been focused on re-training a randomly initialized linear classifier to adapt it to the novel features extracted by a pre-trained feature extractor (called Linear-Probing-based methods). These methods typically assumed the pre-trained feature extractor was robust enough, i.e., finetuning was not needed, and hence the pre-trained feature extractor does not be adapted to the novel samples. However, the unadapted pre-trained feature extractor distorts the features of novel samples because the robustness assumption may not hold, especially on the out-of-distribution samples. To extract the undistorted features, we designed Linear-Probing-Finetuning with Firth-Bias (LP-FT-FB) to yield an accurate bias on the limited samples for better finetuning the pre-trained feature extractor, providing stronger transferring ability. In LP-FT-FB, we further proposed inverse Firth Bias Reduction (i-FBR) to regularize the over-parameterized feature extractor on which FBR does not work well. The proposed i-FBR effectively alleviates the over-fitting problem of the feature extractor in the process of finetuning and helps extract undistorted novel features. To show the effectiveness of the designed LP-FT-FB, we conducted comprehensive experiments on the commonly used FSL datasets under different backbones for in-domain and cross-domain FSL tasks. The experimental results show that the proposed FT-LP-FB outperforms the SO-TA FSL methods. The code is available at https://github.com/whzyf951620/ LinearProbingFinetuningFirthBias.

1. INTRODUCTION

Few-shot Learning (FSL) has recently developed quickly in the limited data regime. FSL aims to learn a suitable inductive bias on the given limited samples of novel classes. At the very start, the whole model consisting of the feature extractor and the classifier is pre-trained on the samples of base classes, and then is finetuned on the limited novel samples to obtain an inductive bias. The performance of the finetuned model drops significantly due to the over-fitting problem of the pre-trained model. 



To address the overfitting problem, meta-learning-based methods such as Prototypical Networks Snell et al. (2017) and MAML Finn et al. (2017) were proposed to learn the learning strategies for a suitable inductive bias. Then, Chen et al. proposed Baseline++ Chen et al. (2019) to show that a simple Linear Probing (LP) strategy can also get comparable performance to meta-learning-based methods. LP re-trained a linear classifier to adapt to the novel samples without updating the whole model. Following Baseline++, many researchers studied the LP-based FSL methods, such as S2M2 Mangla et al. (2020), RFS Tian et al. (2020), and EMD Zhang et al. (2020), are proposed to obtain a more powerful fully-trained feature extractor. LP-based FSL methods assumed that a pre-trained feature extractor is robust enough to novel samples Yang et al. (2021); Tian et al. (2020) and hence does not need to be finetuned.

