CONTINUAL PROTOTYPE EVOLUTION: LEARNING ONLINE FROM NON-STATIONARY DATA STREAMS

Abstract

Attaining prototypical features to represent class distributions is well established in representation learning. However, learning prototypes online from streams of data proves a challenging endeavor as they rapidly become outdated, caused by an ever-changing parameter space in the learning process. Additionally, continual learning does not assume the data stream to be stationary, typically resulting in catastrophic forgetting of previous knowledge. As a first, we introduce a system addressing both problems, where prototypes evolve continually in a shared latent space, enabling learning and prediction at any point in time. In contrast to the major body of work in continual learning, data streams are processed in an online fashion, without additional task-information, and an efficient memory scheme provides robustness to imbalanced data streams. Besides nearest neighbor based prediction, learning is facilitated by a novel objective function, encouraging cluster density about the class prototype and increased inter-class variance. Furthermore, the latent space quality is elevated by pseudo-prototypes in each batch, constituted by replay of exemplars from memory. We generalize the existing paradigms in continual learning to incorporate data incremental learning from data streams by formalizing a two-agent learner-evaluator framework, and obtain state-of-the-art performance by a significant margin on eight benchmarks, including three highly imbalanced data streams.

1. INTRODUCTION

The prevalence of data streams in contemporary applications urges systems to learn in a continual fashion. Autonomous vehicles, sensory robot data, and video streaming yield never-ending streams of data, with abrupt changes in the observed environment behind every vehicle turn, robot entering a new room, or camera cut to a subsequent scene. Alas, learning from streaming data is far from trivial due to these changes, as neural networks tend to forget the knowledge they previously acquired. The data stream presented to the network is not identically and independently distributed (iid), emanating a trade-off between neural stability to retain the current state of knowledge and neural plasticity to swiftly adopt the new knowledge (Grossberg, 1982) . Finding the balance in this stability-plasticity dilemma addresses the catastrophic forgetting (French, 1999) induced by the non-iid intrinsics of the data stream, and is considered the main hurdle for continually learning systems. Although a lot of progress has been established in the literature, often strong assumptions apply, impeding applicability for real-world systems. The static training and testing paradigms prevail, whereas a true continual learner should enable both simultaneously and independently. Therefore, we propose the two-agent learner-evaluator framework to redefine perspective on existing paradigms in the field. Within this framework, we introduce data incremental learning, enabling completely task-free learning and evaluation. Furthermore, we introduce Continual Prototype Evolution (CoPE), a new online data incremental learner wherein prototypes perpetually represent the most salient features of the class population, shifting the catastrophic forgetting problem from the full network parameter space to the lowerdimensional latent space. As a first, our prototypes evolve continually with the data stream, enabling learning and evaluation at any point in time. Similar to representativeness heuristics in human cognition (Kahneman & Tversky, 1972) , the class prototypes are the cornerstone for nearest neighbor classification. Additionally, the system is robust to highly imbalanced data streams by the combination of replay with a balancing memory population scheme. We find batch information in the latent space to have a significant advantage in the challenging non-stationary and online processing regime, which we incorporate in the novel pseudo-prototypical proxy loss.

2. THE LEARNER-EVALUATOR FRAMEWORK

To date, the paradigms of task, class, and domain incremental learning (van de Ven & Tolias, 2018) dominate the continual learning literature. However, strong and differing assumptions often lead to confusion and overlap between implementations of these definitions. Furthermore, the concept of a static training and testing phase is still ubiquitous, whereas continual learning systems should enable both phases continually and independently. Therefore, we propose a generalizing framework which disentangles the continually learning system into two agents: the learner and the evaluator. Figure 1 presents an overview of the framework. The learning agent learns predicting function f θ : X → Y parameterized by θ, mapping the input space X to the target output space Y. The learner receives data samples (x i , y i ) from stream S and has simultaneous access to the horizon D, i.e. the observable subset of stream S which can be processed for multiple iterations. Data sample i is constituted by input feature x i ∈ X and corresponding (self-)supervision signal y i for which the output space for classification is defined as a discrete set of observed classes Y i ← Y i-1 ∪ {y i }. To manage memory usage and to enable multiple updates and stochasticity in the optimization process, updates for θ are typically performed based on a small-scale processing batch B ⊆ D. The data and size of the horizon D are determined by the specific setup or application, ranging from standard offline learning with D = S to online continual learning with D = B. Furthermore, the learner might need additional resources after observing data from B ⊆ D, such as stored samples or model copies, confined by the operational memory M. The evaluating agent acts independently from the learner by evaluating f θ with horizon D eval from the evaluation stream S eval , with small-scale processing batches B eval ⊆ D eval . This stream can contain yet unobserved concepts by the learner in S to measure zero-shot performance. The framework provides leeway for the concept distributions in S eval being either static or dynamically evolving, determining how performance of the learner is measured. On the one hand, static concept distributions can measure the degree to which the knowledge of learned concepts is preserved, as commonly used in continual learning. On the other hand, evolving concept distributions measure performance for the current distribution in horizon D eval only, where concepts might drift from their original representation, also known as concept drift (Schlimmer & Granger, 1986) . Evaluation can occur asynchronously on-demand or periodically with periodicity ρ determining the resolution of the evaluation samples. Task, class, and domain incremental learning are based on the composition in the learner for the observable stream subset in horizon D t , which is incrementally replaced by a new subset of data



Figure 1: Overview of the learner-evaluator framework, overcoming the static training and testing paradigms by explicitly modelling continual optimization and evaluation from data streams in the learner and evaluator agents. The framework generalizes to both continual learning and concept drift with resources transparently defined as the horizon D and operational memory M.

