FEW-SHOT INCREMENTAL LEARNING USING HYPERTRANSFORMERS

Abstract

Incremental few-shot learning methods make it possible to learn without forgetting from multiple few-shot tasks arriving sequentially. In this work we approach this problem using the recently published HyperTransformer (HT): a hypernetwork that generates task-specific CNN weights directly from the support set. We propose to re-use these generated weights as an input to the HT for the next task of the continual-learning sequence. Thus, the HT uses the weights themselves as the representation of the previously learned tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. Instead, we show that the HT works akin to a recurrent model, relying on the weights from the previous task and a support set from a new task. We demonstrate that a single HT equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for two continual learning scenarios: incremental-task learning and incremental-class learning.

1. INTRODUCTION

Incremental few-shot learning combines the challenges of both few-shot learning and continual learning together: it seeks a way to learn from very limited demonstrations presented continually to the learner. This combination is desirable since it represents a more genuine model of how biological systems including humans acquire new knowledge: we often do not need a large amount of information to learn a novel concept and after learning about it we retain that knowledge for a long time. In addition, achieving this would dramatically simplify learning of important practical applications, such as robots continually adapting to a novel environment layout from an incoming stream of demonstrations. Another example is privacy-preserving learning, where users run the model sequentially on their private data, sharing only the weights that are continually absorbing the data without ever exposing it. In this paper, we propose an INCREMENTAL HYPERTRANSFORMER (IHT) aimed at exploring the capability of the HT to update the CNN weights with information about new tasks, while retaining the knowledge about previously learned tasks. In other words, given the weights θ t-1 generated after seeing some previous tasks {τ } t-1 τ =0 and a new task t, the IHT generates the weights θ t that are suited for all the tasks {τ } t τ =0 . In order for the IHT to be able to absorb a continual stream of tasks, we modified the loss function from a cross-entropy that was used in the HT to a more flexible prototypical loss (Snell et al., 2017) . As the tasks come along, we maintain and update a set of prototypes in the embedding space, one for each class of any given task. The prototypes are then used to predict the class and task attributes for a given input sample.



We focus on a recently published few-shot learning method called HYPERTRANSFORMER (HT;Zhmoginov et al. 2022), which trains a large hypernetwork(Ha et al., 2016)  by extracting knowledge from a set of training few-shot learning tasks. The HT is then able to directly generate weights of a much smaller Convolutional Neural Network (CNN) model focused on solving a particular task from just a few examples provided in the support set. It works by decoupling the task-domain knowledge (represented by a transformer; Vaswani et al. 2017) from the learner itself (a CNN), which only needs to know about the specific task that is being solved.

