ROBUST META-LEARNING WITH NOISE VIA EIGEN-REPTILE

Abstract

Recent years have seen a surge of interest in meta-learning techniques for tackling the few-shot learning (FSL) problem. However, the meta-learner's initial model is prone to meta-overfit, as there are only a few available samples with sampling noise. Besides, when handling the data sampled with label noise for FSL, meta-learner could be extremely sensitive to label noise. To address these two challenges that FSL with sampling and label noise. In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step. We present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical taskspecific parameters to alleviate gradient noise. Specifically, the main direction is computed by a special mechanism for the parameter's large size. Furthermore, to obtain a more accurate main direction for Eigen-Reptile in the presence of label noise, we propose Introspective Self-paced Learning (ISPL) that constructs a plurality of prior models to determine which sample should be abandoned. We have proved the effectiveness of Eigen-Reptile and ISPL, respectively, theoretically and experimentally. Moreover, our experiments on different tasks demonstrate that the proposed methods outperform or achieve highly competitive performance compared with the state-of-the-art methods with or without noisy labels.

1. INTRODUCTION

Meta-learning, also known as learning to learn, is the key for few-shot learning (FSL) (Vinyals et al., 2016; Wang et al., 2019a) . One of the meta-learning methods is the gradient-based method, which usually optimizes meta-parameters as initialization that can fast adapt to new tasks with few samples. However, fewer samples mean a higher risk of meta-overfitting, as the ubiquitous sampling noise in mini-batch cannot be ignored. Moreover, existing gradient-based meta-learning methods are fragile with few samples. For instance, a popular recent method, Reptile (Nichol et al., 2018) , updates the meta-parameters towards the inner loop direction, which is from the current initialization to the last task-specific parameters. Nevertheless, as shown by the bold line of Reptile in Figure 1 , with the gradient update at the last step, the update direction of meta-parameters has a significant disturbance, as sampling noise leads the meta-parameters to overfit on the few trained samples at gradient steps. Many prior works have proposed different solutions for the meta-overfitting problem, such as using dropout (Bertinetto et al., 2018; Lee et al., 2020) , and modifying the loss function (Jamal & Qi, 2019) etc., which stay at the model level. This paper casts the meta-overfitting problem as a gradient noise problem that from sampling noise while performing gradient update (Wu et al., 2019) . Neelakantan et al. (2015) .etc have proved that adding additional gradient noise can improve the generalization of neural networks with large samples. However, it can be seen from the model complexity penalty that the generalization of the neural network will increase when the number of samples is larger. To a certain extent adding gradient noise is equivalent to increasing the sample size. As for FSL, there are only a few samples of each task. In that case, the model will not only remembers the contents that need to be identified but also overfits on the noise (Zhang et al., 2016) . High-quality manual labeling data is often time-consuming and expensive. Low-cost approaches to collect low-quality annotated data, such as from search engines, will introduce label noise. Moreover, training meta-learner requires a large number of tasks, so that it is not easy to guarantee the quality of data. Conceptually, the initialization learned by existing meta-learning algorithms can severely degrade in the presence of noisy labels. Intuitively, as shown in FSL with noisy labels of Figure 1 , noisy labels cause a large random disturbance in the update direction. It means that label noise (Frénay & Verleysen, 2013) leads the meta-learner to overfit on wrong samples, which can be seen as further aggravating the influence of gradient noise. Furthermore, conventional algorithms about learning with noisy labels require much data for each class (Hendrycks et al., 2018; Patrini et al., 2017) . Therefore, these algorithms cannot be applied to noisy FSL problem, since few available samples per class. So it is crucial to propose a method to address the problem of noisy FSL. In this paper, we propose Eigen-Reptile (ER). In particular, as shown in Figure 1 , Eigen-Reptile updates the meta-parameters with the main direction of task-specific parameters that can effectively alleviate gradient noise. Due to the large scale of neural network parameters, it is unrealistic to compute historical parameters' eigenvectors. We introduce the process of fast computing the main direction into FSL, which computes the eigenvectors of the inner loop step scale matrix instead of the parameter scale matrix. Furthermore, we propose Introspective Self-paced Learning (ISPL), which constructs multiple prior models with randomly sampling. Then prior models will discard high loss samples from the dataset. We combine Eigen-Reptile with ISPL to address the noisy FSL problem, as ISPL can improve the main direction computed with noisy labels. Experimental results show that Eigen-Reptile significantly outperforms the baseline model by 5.35% and 3.66% on corrupted Mini-ImageNet of 5-way 1-shot and clean Mini-ImageNet of 5-way 5shot, respectively. Moreover, the proposed algorithms outperform or are highly competitive with state-of-the-art methods on few-shot classification tasks. The main contributions of this paper can be summarized as follows: • We cast the meta-overfitting issue (overfitting on sampling and label noise) as a gradient noise issue under the meta-learning framework. • We propose Eigen-Reptile that can alleviate gradient noise effectively. Besides, we propose ISPL, which improves the performance of Eigen-Reptile in the presence of noisy labels. • The proposed methods outperform or achieve highly competitive performance compared with the state-of-the-art methods on few-shot classification tasks.

2. RELATED WORK

There are three main types of meta-learning approaches: metric-based meta-learning approaches (Ravi & Larochelle, 2016; Hochreiter et al., 2001; Andrychowicz et al., 2016; Liu et al., 2018; Santoro et al., 2016) , model-based meta-learning approaches (Vinyals et al., 2016; Koch et al., 2015; Mordatch, 2018; Sung et al., 2018; Snell et al., 2017; Oreshkin et al., 2018; Shyam et al., 2017) and



Figure1: Inner loop steps of Reptile, Eigen-Reptile. Reptile updates meta-parameters towards the last task-specific parameters, which is biased. Eigen-Reptile considers all samples more fair. Note that the main direction is the eigenvector corresponding to the largest eigenvalue.

