ERROR SENSITIVITY MODULATION BASED EXPERI-ENCE REPLAY: MITIGATING ABRUPT REPRESENTA-TION DRIFT IN CONTINUAL LEARNING

Abstract

Humans excel at lifelong learning, as the brain has evolved to be robust to distribution shifts and noise in our ever-changing environment. Deep neural networks (DNNs), however, exhibit catastrophic forgetting and the learned representations drift drastically as they encounter a new task. This alludes to a different errorbased learning mechanism in the brain. Unlike DNNs, where learning scales linearly with the magnitude of the error, the sensitivity to errors in the brain decreases as a function of their magnitude. To this end, we propose ESMER which employs a principled mechanism to modulate error sensitivity in a dual-memory rehearsalbased system. Concretely, it maintains a memory of past errors and uses it to modify the learning dynamics so that the model learns more from small consistent errors compared to large sudden errors. We also propose Error-Sensitive Reservoir Sampling to maintain episodic memory, which leverages the error history to pre-select low-loss samples as candidates for the buffer, which are better suited for retaining information. Empirical results show that ESMER effectively reduces forgetting and abrupt drift in representations at the task boundary by gradually adapting to the new task while consolidating knowledge. Remarkably, it also enables the model to learn under high levels of label noise, which is ubiquitous in real-world data streams.

1. INTRODUCTION

The human brain has evolved to engage with and learn from an ever-changing and noisy environment, enabling humans to excel at lifelong learning. This requires it to be robust to varying degrees of distribution shifts and noise to acquire, consolidate, and transfer knowledge under uncertainty. DNNs, on the other hand, are inherently designed for batch learning from a static data distribution and therefore exhibit catastrophic forgetting (McCloskey & Cohen, 1989) of previous tasks when learning tasks sequentially from a continuous stream of data. The significant gap between the lifelong learning capabilities of humans and DNNs suggests that the brain relies on fundamentally different error-based learning mechanisms. Among the different approaches to enabling continual learning (CL) in DNNs (Parisi et al., 2019) , methods inspired by replay of past activations in the brain have shown promise in reducing forgetting in challenging and more realistic scenarios (Hayes et al., 2021; Farquhar & Gal, 2018; van de Ven & Tolias, 2019) . They, however, struggle to approximate the joint distribution of tasks with a small buffer, and the model may undergo a drastic drift in representations when there is a considerable distribution shift, leading to forgetting. In particular, when a new set of classes is introduced, the new samples are poorly dispersed in the representation space, and initial model updates significantly perturb the representations of previously learned classes (Caccia et al., 2021) . This is even more pronounced in the lower buffer regime, where it is increasingly challenging for the model to recover from the initial disruption. Therefore, it is critical for a CL agent to mitigate the abrupt drift in representations and gradually adapt to the new task.

availability

https://github.com/NeurAI-Lab/

