TASK-AGNOSTIC ONLINE META-LEARNING IN NON-STATIONARY ENVIRONMENTS

Abstract

Online meta-learning has recently emerged as a marriage between batch metalearning and online learning, for achieving the capability of quick adaptation on new tasks in a lifelong manner. However, most existing approaches focus on the restrictive setting where the distribution of the online tasks remains fixed with known task boundaries. In this work we relax these assumptions and propose a novel algorithm for task-agnostic online meta-learning in non-stationary environments. More specifically, we first propose two simple but effective detection mechanisms of task switches and distribution shift based on empirical observations, which serve as a key building block for more elegant online model updates in our algorithm: the task switch detection mechanism allows reusing of the best model available for the current task at hand, and the distribution shift detection mechanism differentiates the meta model update so as to preserve the knowledge for in-distribution tasks and quickly learn the new knowledge for out-of-distribution tasks. Motivated by the recent advance in online learning, our online meta model updates are based only on the current data, which eliminates the need of storing previous data as required in most existing methods. This crucial choice is also well supported by our theoretical analysis of dynamic regret in online meta-learning, where a sublinear regret can be achieved by updating the meta model at each round using the current data only. Empirical studies on three different benchmarks clearly demonstrate the significant advantage of our algorithm over related baseline approaches.

1. INTRODUCTION

Two key aspects of human intelligence are the abilities to quickly learn complex tasks and continually update their knowledge base for faster learning of future tasks. Meta-learning (Koch et al., 2015; Ravi & Larochelle, 2016; Finn et al., 2017) and online learning (Hannan, 1957; Shalev-Shwartz & Singer, 2007; Cesa-Bianchi & Lugosi, 2006) are two main research directions that try to equip learning agents with these abilities. In particular, meta-learning aims to facilitate quick learning of new unseen tasks by building a prior over model parameters based on the knowledge of related tasks, whereas online learning deals with the problem where the task data is sequentially revealed to a learning agent. To achieve the capability of fast adaptation on new tasks in a lifelong manner, online meta-learning (Finn et al., 2017; Harrison et al., 2020; Yao et al., 2020) has attracted much attention recently. Considering the setup where online tasks arrive one at a time, the objective of online meta-learning is to continuously update the meta prior based on which the new task can be learnt more quickly after the agent encounters more tasks. In online meta-learning, the agent typically maintains two separate models, i.e., the meta-model to capture the underlying common knowledge across tasks and the online task model for solving the current task in hand. Most of the existing studies (Finn et al., 2017; Acar et al., 2021) in online meta-learning follow a "resetting" strategy: quickly adapt the online task model from the meta model using the current data, update the meta model and reset the online task model back to the updated meta model at the beginning of the next task. This strategy generally works well when the task boundaries are known and the task distribution remains stationary. However, in many real-world data streams the task boundaries are not directly visible to the agent (Rajasegaran et al., 2022; Caccia et al., 2020; Harrison et al., 2020) , and the task distributions can dynamically change during the online learning stage. Therefore, in this work we seek to solve the online meta-learning problem in such more realistic settings. Needless to say, how to efficiently solve the online meta-learning problem without knowing the task boundaries in the non-stationary environments is nontrivial due to the following key questions: (1) How to update the meta model and the online task model? Clearly, the "resetting" strategy at the moment of new data arriving is not desirable, as adapting from the previous task model is preferred when the new data belongs to the same task with the previous data. On the other hand, the meta model update should be distinct between in-distribution (IND) tasks, where the current knowledge should be preserved, and out-of-distribution tasks (OOD), where the new knowledge should be learnt quickly. (2) How to make the system lightweight for fast online learning? The nature of online meta-learning precludes sophisticated learning algorithms, as the agent should be able to quickly adapt to different tasks typically without access to the previous data. And dealing with the environment non-stationarity should not significantly increase the computational cost, considering that the environment could change fast during online learning. The main contribution of this work is a novel online meta-learning algorithm in non-stationary environments without knowing the task boundaries, which appropriately addresses the problems above. More specifically, we first propose two simple but effective mechanisms to detect the task switches using the classification loss and detect the distribution shift using the Helmholtz free energy (Liu et al., 2020) , respectively, as motivated by empirical observations. Based on these detection mechanisms, our algorithm provides a finer treatment on the online model updates, which brings in the following benefits: (1) (task knowledge reuse) The detection of task switches enables our algorithm to reuse the best model available for each task, avoiding the "resetting" to the meta model at each step as in most previous studies; (2) (judicious meta model update) The detection of distribution shift allows our algorithm to update the meta model in a way that the new knowledge can be quickly learnt for out-of-distribution tasks whereas the previous knowledge can be preserved for in-distribution tasks; (3) (efficient memory usage) Motivated by the advance in online learning (Mokhtari et al., 2016; Hazan et al., 2016) where updating the model online with the current data is sufficient to guarantee a sublinear regret, our algorithm does not reuse/store any of the previous data and updates the meta model at each online episode based only on the current data, which clearly differs from most existing studies (Finn et al., 2019; Yao et al., 2020; Rajasegaran et al., 2022) in online meta-learning. This design choice is also well supported by our theoretical analysis which shows that updating the meta model at each round with only the current data can lead to desirable sublinear dynamic regret growth. Extensive experiments in three different benchmarks clearly show that our algorithm significantly outperforms existing methods. Related Work: Meta-learning. Also known as learning to learn, meta-learning (Finn et al., 2017; Vinyals et al., 2016; Li et al., 2017 ) is a powerful tool for leveraging past experience from related tasks to quickly learn good task-specific models for new unseen tasks. As a pioneering method that drives recent success in meta-learning, model-agnostic meta-learning (MAML) (Finn et al., 2017) seeks to find good meta-initialization such that one or a few gradient descent steps from the meta-initialization leads to a good task-specific model for a new task. Several variants of MAML have been introduced (Finn & Levine, 2018; Finn et al., 2018; Raghu et al., 2019; Rajeswaran et al., 2019; Nichol & Schulman, 2018; Nichol et al., 2018; Mi et al., 2019; Zhou et al., 2019) . Other approaches are essentially model based (Santoro et al., 2016; Bertinetto et al., 2018; Ravi & Larochelle, 2016; Munkhdalai & Yu, 2017) and metric space based (Koch et al., 2015; Vinyals et al., 2016; Snell et al., 2017; Sung et al., 2018) . Online Learning. In online learning (Hannan, 1957; Cesa-Bianchi & Lugosi, 2006; Hazan et al., 2007) , the cost functions are sequentially revealed to an agent which is required to select an action before seeing each cost. One of the most studied approach is follow the leader (FTL) (Hannan, 1957) , which updates the parameters at each step using all previously seen loss functions. Regularized versions of FTL have also been introduced to improve stability (Abernethy et al., 2009; Shalev-Shwartz et al., 2012) . Similar in spirit to our work in terms of computational resources, online gradient descent (OGD) (Zinkevich, 2003) takes a gradient descent step at each round using only the revealed loss. However, traditional online learning methods do not efficiently leverage past experience and optimize for zero-shot performance without any adaptation. In this work, we study the online meta-learning problem, in which the goal is to optimize for quick adaptation on future tasks as the agent continually sees more tasks. Continual Learning. Continual learning (CL; a.k.a lifelong learning) focuses on overcoming "catastrophic forgetting" (McCloskey & Cohen, 1989; Ratcliff, 1990) when learning from a sequence of non-stationary data distributions. Existing approaches are rehearsal-based (Lopez-Paz & Ranzato,

