ERROR SENSITIVITY MODULATION BASED EXPERI-ENCE REPLAY: MITIGATING ABRUPT REPRESENTA-TION DRIFT IN CONTINUAL LEARNING

Abstract

Humans excel at lifelong learning, as the brain has evolved to be robust to distribution shifts and noise in our ever-changing environment. Deep neural networks (DNNs), however, exhibit catastrophic forgetting and the learned representations drift drastically as they encounter a new task. This alludes to a different errorbased learning mechanism in the brain. Unlike DNNs, where learning scales linearly with the magnitude of the error, the sensitivity to errors in the brain decreases as a function of their magnitude. To this end, we propose ESMER which employs a principled mechanism to modulate error sensitivity in a dual-memory rehearsalbased system. Concretely, it maintains a memory of past errors and uses it to modify the learning dynamics so that the model learns more from small consistent errors compared to large sudden errors. We also propose Error-Sensitive Reservoir Sampling to maintain episodic memory, which leverages the error history to pre-select low-loss samples as candidates for the buffer, which are better suited for retaining information. Empirical results show that ESMER effectively reduces forgetting and abrupt drift in representations at the task boundary by gradually adapting to the new task while consolidating knowledge. Remarkably, it also enables the model to learn under high levels of label noise, which is ubiquitous in real-world data streams.

1. INTRODUCTION

The human brain has evolved to engage with and learn from an ever-changing and noisy environment, enabling humans to excel at lifelong learning. This requires it to be robust to varying degrees of distribution shifts and noise to acquire, consolidate, and transfer knowledge under uncertainty. DNNs, on the other hand, are inherently designed for batch learning from a static data distribution and therefore exhibit catastrophic forgetting (McCloskey & Cohen, 1989) of previous tasks when learning tasks sequentially from a continuous stream of data. The significant gap between the lifelong learning capabilities of humans and DNNs suggests that the brain relies on fundamentally different error-based learning mechanisms. Among the different approaches to enabling continual learning (CL) in DNNs (Parisi et al., 2019) , methods inspired by replay of past activations in the brain have shown promise in reducing forgetting in challenging and more realistic scenarios (Hayes et al., 2021; Farquhar & Gal, 2018; van de Ven & Tolias, 2019) . They, however, struggle to approximate the joint distribution of tasks with a small buffer, and the model may undergo a drastic drift in representations when there is a considerable distribution shift, leading to forgetting. In particular, when a new set of classes is introduced, the new samples are poorly dispersed in the representation space, and initial model updates significantly perturb the representations of previously learned classes (Caccia et al., 2021) . This is even more pronounced in the lower buffer regime, where it is increasingly challenging for the model to recover from the initial disruption. Therefore, it is critical for a CL agent to mitigate the abrupt drift in representations and gradually adapt to the new task.

Dual Memories

Episodic Memory

Memory of Errors

Working Model

Stable Model loss loss

Figure 1 : ESMER employs a principled mechanism for modulating the error sensitivity in a dualmemory rehearsal-based system. It includes a stable model which accumulates the structural knowledge in the working model and an episodic memory. Additionally, a memory of errors is maintained which informs the contribution of each sample in the incoming batch towards learning such that the working model learns more from low errors. The stable model is utilized to retain the relational structure of the learned classes. Finally, we employ error sensitive reservoir sampling which uses the error memory to prioritize the representation of low-loss samples in the buffer. To this end, we look deeper into the dynamics of error-based learning in the brain. Evidence suggests that different characteristics of the error, including its size, affect how the learning process occurs in the brain (Criscimagna-Hemminger et al., 2010) . In particular, sensitivity to errors decreases as a function of their magnitude, causing the brain to learn more from small errors compared to large errors (Marko et al., 2012; Castro et al., 2014) . This sensitivity is modulated through a principled mechanism that takes into account the history of past errors (Herzfeld et al., 2014) which suggests that the brain maintains an additional memory of errors. The robustness of the brain to high degrees of distribution shifts and its proficiency in learning under uncertainty and noise may be attributed to the principled modulation of error sensitivity and the consequent learning from low errors. DNNs, on the other hand, lack any mechanism to modulate the error sensitivity and learning scales linearly with the error size. This is particularly troublesome for CL, where abrupt distribution shifts initially cause a considerable spike in errors. These significantly larger errors associated with the samples from unobserved classes dominate the gradient updates and cause disruption of previously learned representations, especially in the absence of sufficient memory samples. To this end, we propose an Error Sensitivity Modulation based Experience Replay (ESMER) method that employs a principled mechanism to modulate sensitivity to errors based on its consistency with memory of past errors (Figure 1 ). Concretely, our method maintains a memory of errors along the training trajectory and utilizes it to adjust the contribution of each incoming sample to learning based on how far they are from the mean statistics of error memory. This allows the model to learn more from small consistent errors compared to large sudden errors, thus gradually adapting to the new task and mitigating the abrupt drift in representations at the task boundary. To keep the error memory stable, task boundary information is utilized to prevent sudden changes. Additionally, we propose an Error-Sensitive Reservoir Sampling approach for maintaining the buffer, which utilizes the error memory to pre-select low-loss samples from the current batch as candidates for being represented in the buffer. It ensures that only the incoming samples that have been well learned are added to the buffer that are better suited to retain information and do not cause representation drift when replaying them. The proposed sampling approach also ensures higher-quality representative samples in memory by filtering out outliers and noisy labels, which can degrade performance. Another salient component of the learning machinery of the brain is the efficient use of multiple memory systems that operate on different timescales (Hassabis et al., 2017; Kumaran et al., 2016) . Furthermore, replay of previous neural activation patterns is considered to facilitate memory formation and consolidation (Walker & Stickgold, 2004) . These components may play a role in facilitating

availability

https://github.com/NeurAI-Lab/

