ADAPTIVE PARAMETRIC PROTOTYPE LEARNING FOR CROSS-DOMAIN FEW-SHOT CLASSIFICATION

Abstract

Cross-domain few-shot classification induces a much more challenging problem than its in-domain counterpart due to the existence of domain shifts between the training and test tasks. In this paper, we develop a novel Adaptive Parametric Prototype Learning (APPL) method under the meta-learning convention for cross-domain few-shot classification. Different from existing prototypical fewshot methods that use the averages of support instances to calculate the class prototypes, we propose to learn class prototypes from the concatenated features of the support set in a parametric fashion and meta-learn the model by enforcing prototype-based regularization on the query set. In addition, we fine-tune the model in the target domain in a transductive manner using a weighted-movingaverage self-training approach on the query instances. We conduct experiments on multiple cross-domain few-shot benchmark datasets. The empirical results demonstrate that APPL yields superior performance than many state-of-the-art cross-domain few-shot learning methods.

1. INTRODUCTION

Benefiting from the development of deep neural networks, significant advancement has been achieved on image classification with large amounts of annotated data. However, obtaining large amounts of annotated data is time-consuming and labour-intensive, while it is difficult to generalize trained models to new categories of data. As a solution, few-shot learning (FSL) has been proposed to classify instances from unseen classes using only a few labeled instances. FSL methods usually use a base dataset with labeled images to train a prediction model in the training phase. The model is then fine-tuned on the prediction task of novel categories with a few labeled instances (i.e. support set), and finally evaluated on the test data (i.e. query set) from the same novel categories in the testing phase. FSL has been widely studied in the in-domain settings where the training and test tasks are from the same domain (Finn et al., 2017; Snell et al., 2017; Lee et al., 2019) . However, when the training and test tasks are in different domains, it poses a much more challenging cross-domain few-shot learning problem than its in-domain counterpart due to the domain shift problem. Recently, several methods have made progress to address cross-domain few-shot learning, including the ones based on data augmentation, data generation (Wang & Deng, 2021; Yeh et al., 2020; Islam et al., 2021) and self-supervised learning (Phoo & Hariharan, 2020) techniques. However, such data generation and augmentation methods increase the computational cost and cannot scale well to scenarios with higher-shots (Wang & Deng, 2021) . Some other works either require large amounts of labeled data from multiple source domains (Hu et al., 2022) or the availability of substantial unlabeled data from the target domain during the source training phase (Phoo & Hariharan, 2020; Islam et al., 2021; Yao, 2021) . Such requirements are hard to meet and hence hamper their applicability in many domains. Although some existing prototypical-based few-shot methods have also been applied to address cross-domain few-shot learning due to their simplicity and computational efficiency (Snell et al., 2017; Satorras & Estrach, 2018) , these standard methods lack sufficient capacity in handing large cross-domain shifts and adapting to target domains. In this paper, we propose a novel Adaptive Parametric Prototype Learning (APPL) method under the meta-learning convention for cross-domain few-shot image classification. APPL introduces a parametric prototype calculator network (PCN) to learn class prototypes from concatenated feature vectors of the support instances by ensuring the inter-class discriminability and intra-class cohe-sion with prototype regularization losses. The PCN is meta-learned on the source domain using the labeled query instances. In the target domain, we deploy a weighted-moving-average (WMA) self-training approach to leverage the unlabeled query instances to fine-tune the prototype-based prediction model in a transductive manner. With PCN and prototype regularizations, the proposed method is expected to have better generalization capacity in learning class prototypes in the feature embedding space, and hence effectively mitigate the domain shift and adapt to the target domain with WMA self-training. Comprehensive experiments are conducted on eight cross-domain fewshot learning benchmark datasets. The empirical results demonstrate the efficacy of the proposed APPL for cross-domain few-shot classification, by comparing with existing state-of-the-art methods. The contributions of our proposed method are as follows: 1. We propose a novel adaptive prototype calculator network called Prototype Calculator Network (PCN). Our key contribution is that we use a parameterization mechanism to generate more representative prototypes and propose two loss functions to learn discriminative class prototypes by enforcing both inter-class discriminability and intra-class cohesion in the extracted feature space. 2. We propose a WMA self-training strategy that is tailored for the CDFSL problem. Compared to existing methods, it overcomes the barrier of requiring large amounts of additional data for the target domain and reduces domain shift by generating better pseudo-labels. It ensures that the produced pseudo-labels are stable and clean (not noisy) by jointly employing three mechanisms: Weighted-Moving-Average updating of prediction vectors, a rectified annealing schedule for the WMA and selectively sampling only the confident pseudo-labels to adapt the model. 3. Our proposed method work outperforms existing methods on both low-shot (5-shot) and high-shot (20-shot and 50-shot) classification tasks.

2.1. FEW-SHOT LEARNING

Most FSL studies have focused on the in-domain settings. The FSL approaches can be grouped into three main categories: metric-based and meta-learning approaches (Finn et al., 2017; Snell et al., 2017; Lee et al., 2019) , transfer learning approaches (Guo et al., 2019; Jeong & Kim, 2020; Ge & Yu, 2017; Yosinski et al., 2014; Dhillon et al., 2019) and augmentation and generative approaches (Zhang et al., 2018; Lim et al., 2019; Hariharan & Girshick, 2017; Schwartz et al., 2018; Reed et al., 2018) . In particular, the representative meta-learning approach, MAML (Finn et al., 2017) , learns good initialization parameters from various source tasks that make the model easy to adapt to new tasks. The non-parametric metric-based approach, MatchingNet (Vinyals et al., 2016) , employs attention and memory in order to train a network that learns from few labeled samples. ProtoNet (Snell et al., 2017) learns a metric space where each class is represented by the average of the available support instances and classifies query instances based on their distances to the class prototypes. A few meta-learning works, such as RelationNet (Sung et al., 2018) , GNN (Satorras & Estrach, 2018) and Transductive Propagation Network (TPN) (Liu et al., 2019) , exploit the similarities between support and query instances to classify the query instances. MetaOpt uses meta-learning to train a feature encoder that obtains discriminative features for a linear classifier (Lee et al., 2019) . Transfer learning methods initially train a model on base tasks and then use various fine-tuning methods to adapt the model to novel tasks (Guo et al., 2019; Jeong & Kim, 2020; Ge & Yu, 2017; Yosinski et al., 2014; Dhillon et al., 2019) . Generative and augmentation approaches generate additional samples to increase the size of available data during training (Zhang et al., 2018; Lim et al., 2019; Hariharan & Girshick, 2017; Schwartz et al., 2018; Reed et al., 2018) .

2.2. CROSS-DOMAIN FEW-SHOT LEARNING

Recently cross-domain few-shot learning (CDFSL) has started receiving more attentions (Guo et al., 2020; Phoo & Hariharan, 2020) . Tseng et al. (2020) propose a feature-wise transformation (FWT) layer that is used jointly with standard few-shot learning methods for cross-domain few-shot learning. The FWT layer uses affine transformations to augment the learned features in order to help

