A COGNITIVE-INSPIRED MULTI-MODULE ARCHITEC-TURE FOR CONTINUAL LEARNING

Abstract

Artificial neural networks (ANNs) exhibit a narrow scope of expertise on stationary independent data. However, data in the real world is continuous and dynamic, and ANNs must adapt to novel scenarios while also retaining the learned knowledge to become lifelong learners. The ability of humans to excel at these tasks can be attributed to multiple factors ranging from cognitive computational structures, cognitive biases, and the multi-memory systems in the brain. We incorporate key concepts from each of these to design a cognitive-inspired continual learning method. Cognitive Continual Learner (CCL) includes multiple modules, implicit and explicit knowledge representation dichotomy, inductive bias, and a multi-memory system. CCL shows improvement across different settings and also shows a reduced task recency bias. To test versatility of continual learning methods on a challenging distribution shift, we introduce a novel domain-incremental dataset DN4IL. In addition to improved performance on existing benchmarks, CCL also demonstrates superior performance on this dataset. 1

1. INTRODUCTION

Deep learning has seen rapid progress in recent years, and supervised learning agents have achieved superior performance in perception tasks. However, unlike a supervised setting, where data is static, and independent and identically distributed, real-world data is changing dynamically. Continual learning (CL) aims at learning multiple tasks when data is streamed sequentially (Parisi et al., 2019) . This is crucial in real-world deployment settings, as the model needs to adapt quickly to novel data (plasticity), while also retaining previously learned knowledge (stability). Artificial neural networks (ANN), however, are still not effective continual learners as they often fail to generalize to small changes in distribution and also suffer from forgetting old information when presented with new data (catastrophic forgetting) (McCloskey & Cohen, 1989) . Humans, on the other hand, show a better ability to acquire new skills while also retaining previously learned skills to a greater extent. This intelligence can be attributed to different factors in human cognition. Multiple theories have been proposed to formulate an overall cognitive architecture, which is a broad domain-generic cognitive computation model that captures the essential structure and process of the mind. Some of these theories hypothesize that, instead of a single standalone module, multiple modules in the brain share information to excel at a particular task. CLARION (Connectionist learning with rule induction online) (Sun & Franklin, 2007 ) is one such theory that postulates an integrative cognitive architecture, consisting of a number of distinct subsystems. It predicates a dual representational structure (Chaiken & Trope, 1999) , where the top level encodes conscious explicit knowledge, while the other encodes indirect implicit information. The two systems interact, share knowledge, and cooperate in solving tasks. Delving into these underlying architectures and formulating a new design can help in the quest of building intelligent agents. Multiple modules can be instituted instead of a single feedforward network. An explicit module that learns from the standard visual input and an implicit module that shares indirect contextual knowledge. The implicit module can be further divided into more sub-modules, each providing different information. Inductive biases and semantic memories can act as different kinds of implicit knowledge. Inductive biases are pre-stored templates or knowledge that provide some meaningful disposition toward adapting to the continuously evolving world (Chollet, 2019) . Furthermore, theories (Kumaran et al., 2016) postulate that after rapidly learning information, a gradual consolidation of knowledge transpires in the brain for slow learning of structured information. Thus, the new design incorporates multiple concepts of cognition architectures, the dichotomy of implicit and explicit representations, inductive biases, and multi-memory systems theory. To this end, we propose Cognitive Continual Learner (CCL), a multi-module architecture for CL. The explicit working module processes the standard input data. Two different sub-modules are introduced for the implicit module. The inductive bias learner embeds relevant prior information, and as networks are shown to be biased toward textural information (unlike humans that are more biased toward global semantics) (Geirhos et al., 2018) , we propose to utilize the global shape information as the prior. Shape is already present in the visual data but in an indirect way, and extracting this implicit information and sharing with the explicit module will help to learn more generic and high-level representations. Further, to emulate the consolidation of information in the slow-fast multi-memory system, a gradual accumulation of knowledge from the explicit working module is embedded in the second semantic memory sub-module. We show that interacting and leveraging information between these modules can help alleviate catastrophic forgetting while also increasing the robustness to distribution shift. CCL achieves superior performance across all CL settings on various datasets. CCL outperforms the SOTA CL methods on Seq-CIFAR10, Seq-CIFAR100 in the class incremental settings. Furthermore, in more realistic general class incremental settings where the task boundary is blurry and classes are not disjoint, CCL shows significant gains. The addition of inductive bias and semantic memory helps to achieve a better balance between the plasticity-stability trade-off. The prior in the form of shape helps produce generic representations, and this results in CCL exhibiting a reduced task-recency bias. Furthermore, CCL also shows higher robustness against natural corruptions. Finally, to test the capability of the CL methods against distribution shift, we introduce a domain incremental learning dataset, DN4IL, which is a carefully designed subset of the DomainNet dataset (Peng et al., 2019) . CCL shows considerable robustness across all domains on these challenging data, thus establishing the efficacy of our cognitive-inspired CL architecture. Our contributions are as follows: • Cognitive Continual Learner (CCL), a novel method that incorporates aspects of cognitive architectures, multi-memory systems, and inductive bias into the CL framework. • Introducing DN4IL, a challenging domain incremental learning dataset for CL. • Benchmarks across different CL settings: class incremental, task incremental, generalized class incremental, and domain incremental learning. • Analyses on the plasticity-stability trade-off, task recency bias, and robustness to natural corruptions.

2.1. COGNITIVE ARCHITECTURES

Cognitive architectures refer to computational models that encapsulate the overall structure of the cognitive process in the brain. The underlying infrastructure of such a model can be leveraged to develop better intelligent systems. Global workspace theory (GWT) (Juliani et al., 2022) postulates that human cognition is composed of a multitude of special-purpose processors and is not a single standalone module. Different sub-modules might encode different contextual information which, when activated, can transfer knowledge to the conscious central workspace to influence and help make better decisions. Furthermore, CLARION (Sun & Franklin, 2007 ) posits a dual-system cognitive architecture with two levels of knowledge representation. The explicit module encodes direct knowledge that is externally accessible. The implicit module encodes indirect knowledge that is not directly accessible, but can be obtained through some intermediate interpretive or transformational steps. These two modules interact with each other by transferring knowledge between each other. Inspired by these theories, we formulate a method that incorporates some of the key aspects of cognitive architecture into the CL method. A working module, which encodes the direct sensory data, forms the explicit module. A second module that encodes indirect and interpretive information forms the implicit module. The implicit module further includes multiple sub-modules to encode different types of knowledge.



Code and the DN4IL dataset will be made accessible upon acceptance.

