ROBUST META-LEARNING WITH NOISE VIA EIGEN-REPTILE

Abstract

Recent years have seen a surge of interest in meta-learning techniques for tackling the few-shot learning (FSL) problem. However, the meta-learner's initial model is prone to meta-overfit, as there are only a few available samples with sampling noise. Besides, when handling the data sampled with label noise for FSL, meta-learner could be extremely sensitive to label noise. To address these two challenges that FSL with sampling and label noise. In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step. We present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical taskspecific parameters to alleviate gradient noise. Specifically, the main direction is computed by a special mechanism for the parameter's large size. Furthermore, to obtain a more accurate main direction for Eigen-Reptile in the presence of label noise, we propose Introspective Self-paced Learning (ISPL) that constructs a plurality of prior models to determine which sample should be abandoned. We have proved the effectiveness of Eigen-Reptile and ISPL, respectively, theoretically and experimentally. Moreover, our experiments on different tasks demonstrate that the proposed methods outperform or achieve highly competitive performance compared with the state-of-the-art methods with or without noisy labels.

1. INTRODUCTION

Meta-learning, also known as learning to learn, is the key for few-shot learning (FSL) (Vinyals et al., 2016; Wang et al., 2019a) . One of the meta-learning methods is the gradient-based method, which usually optimizes meta-parameters as initialization that can fast adapt to new tasks with few samples. However, fewer samples mean a higher risk of meta-overfitting, as the ubiquitous sampling noise in mini-batch cannot be ignored. Moreover, existing gradient-based meta-learning methods are fragile with few samples. For instance, a popular recent method, Reptile (Nichol et al., 2018) , updates the meta-parameters towards the inner loop direction, which is from the current initialization to the last task-specific parameters. Nevertheless, as shown by the bold line of Reptile in Figure 1 , with the gradient update at the last step, the update direction of meta-parameters has a significant disturbance, as sampling noise leads the meta-parameters to overfit on the few trained samples at gradient steps. Many prior works have proposed different solutions for the meta-overfitting problem, such as using dropout (Bertinetto et al., 2018; Lee et al., 2020) , and modifying the loss function (Jamal & Qi, 2019) etc., which stay at the model level. This paper casts the meta-overfitting problem as a gradient noise problem that from sampling noise while performing gradient update (Wu et al., 2019) . Neelakantan et al. (2015) .etc have proved that adding additional gradient noise can improve the generalization of neural networks with large samples. However, it can be seen from the model complexity penalty that the generalization of the neural network will increase when the number of samples is larger. To a certain extent adding gradient noise is equivalent to increasing the sample size. As for FSL, there are only a few samples of each task. In that case, the model will not only remembers the contents that need to be identified but also overfits on the noise (Zhang et al., 2016) . High-quality manual labeling data is often time-consuming and expensive. Low-cost approaches to collect low-quality annotated data, such as from search engines, will introduce label noise. Moreover, training meta-learner requires a large number of tasks, so that it is not easy to guarantee the quality of data. Conceptually, the initialization learned by existing meta-learning algorithms can severely

