EMERGENT SYMBOLS THROUGH BINDING IN EXTERNAL MEMORY

Abstract

A key aspect of human intelligence is the ability to infer abstract rules directly from high-dimensional sensory data, and to do so given only a limited amount of training experience. Deep neural network algorithms have proven to be a powerful tool for learning directly from high-dimensional data, but currently lack this capacity for data-efficient induction of abstract rules, leading some to argue that symbol-processing mechanisms will be necessary to account for this capacity. In this work, we take a step toward bridging this gap by introducing the Emergent Symbol Binding Network (ESBN), a recurrent network augmented with an external memory that enables a form of variable-binding and indirection. This binding mechanism allows symbol-like representations to emerge through the learning process without the need to explicitly incorporate symbol-processing machinery, enabling the ESBN to learn rules in a manner that is abstracted away from the particular entities to which those rules apply. Across a series of tasks, we show that this architecture displays nearly perfect generalization of learned rules to novel entities given only a limited number of training examples, and outperforms a number of other competitive neural network architectures.

1. INTRODUCTION

Human intelligence is characterized by a remarkable capacity to detect the presence of simple, abstract rules that govern high-dimensional sensory data, such as images or sounds, and then apply these to novel data. This capacity has been extensively studied by psychologists in both the visual domain, in tasks such as Raven's Progressive Matrices (Raven & Court, 1938) , and the auditory domain, in tasks that employ novel, artificial languages (Marcus et al., 1999) . In recent years, deep neural network algorithms have reemerged as a powerful tool for learning directly from high-dimensional data, though many studies have now demonstrated that these models suffer from similar limitations as those faced by the earlier generation of neural networks: requiring enormous amounts of training data and tending to generalize poorly outside the distribution of those training data (Lake & Baroni, 2018; Barrett et al., 2018) . This stands in sharp contrast to the ability of human learners to infer abstract structure from a limited number of training examples and then systematically generalize that structure to problems involving novel entities. It has long been argued that the human ability to generalize in this manner depends crucially on a capacity for variable-binding, that is, the ability to represent a problem in terms of abstract symbollike variables that are bound to concrete entities (Holyoak & Hummel, 2000; Marcus, 2001) . This in turn can be broken down into two components: 1) a mechanism for indirection, the ability to bind two representations together and then use one representation to refer to and retrieve the other (Kriete et al., 2013) , and 2) a representational scheme whereby one of the bound representations codes for abstract variables, and the other codes for the values of those variables. In this work, we present a novel architecture designed around the goal of having a capacity for abstract variable-binding. This is accomplished through two important design considerations. First, the architecture possesses an explicit mechanism for indirection, in the form of a two-column external memory. Second, the architecture is separated into two information-processing streams, one that maintains learned embeddings of concrete entities (in our case, images), and one in which a recurrent controller learns to represent and operate over task-relevant variables. These two streams only interact in the form of bindings in the external memory, allowing the controller to learn to perform tasks in a manner that is abstracted away from the particular entities involved. We refer to this architecture as the Emergent Symbol Binding Network (ESBN), due to the fact that this arrangement allows abstract, symbol-like representations to emerge during the learning process, without the need to incorporate symbolic machinery. We evaluate this architecture on a suite of tasks involving relationships among images that are governed by abstract rules. Across these tasks, we show that the ESBN is capable of learning abstract rules from a limited number of training examples and systematically generalizing these rules to novel entities. By contrast, the other architectures that we evaluate are capable of learning these rules in some cases, but fail to generalize them successfully when trained on a limited number of problems involving a limited number of entities. We conclude from these results that a capacity for variable-binding is a necessary component for human-like abstraction and generalization, and that the ESBN is a promising candidate for how to incorporate such a capacity into neural network algorithms. We consider a series of tasks, each involving the application of an abstract rule to a set of images. For all tasks, we employ the same set of n = 100 images, in which each image is a distinct Unicode character (the specific characters used are shown in A.7). We construct training sets in which m images are withheld (where 0 ≤ m ≤ n -o, and o is the minimum number of images necessary to create a problem in a given task) consisting of problems that employ only the remaining (nm) images, and then test on problems that employ only the m withheld images, thus requiring generalization to novel entities. In the easiest generalization regime (m = 0) the test set contains problems composed of the same entities as observed during training (though the exact order of these entities differs). In the most extreme generalization regime, we evaluate models that have only been trained on the minimum number of entities for a given task, and then must generalize what they learn to the majority of the n images in the complete set. This regime poses an extremely challenging test of the ability to learn to perform these tasks from limited training experience, in a manner that is abstracted away from the specific entities observed during training.

2. TASKS

The first task that we study is a same/different discrimination task (Figure 1a ). In this task, two images are presented, and the task is to determine whether they are the same or different. Though this task may appear quite simple, it has been shown that the ability to generalize this simple rule to novel entities is actually a significant challenge for deep neural networks (Kim et al., 2018) , a pattern that we also observe in our results. The second task that we consider is a relational match-to-sample (RMTS) task (Figure 1b ), essentially a higher-order version of a same/different task. In this task, a source pair of objects is



Figure 1: Abstract rule learning tasks. Each task involves generalizing rules to objects not seen during training. (a) Same/different discrimination task. (b) Relational match-to-sample task (answer is 2). (c) Distribution-of-three task (answer is 2). (d) Identity rules task (ABA pattern, answer is 1).

